[论文翻译]DoWhy 一个端到端的因果推理库


原文地址:https://arxiv.org/abs/2011.04216

代码地址:https://github.com/microsoft/dowhy

DoWhy: An End-to-End Library for Causal Inference

Amit Sharma, Emre Kıcıman
Microsoft Research

\addbibresourcebibliografia.bib

1INTRODUCTION

Many questions in data science are fundamentally causal questions, such as the impact of a marketing campaign or a new product feature, the reasons for customer churn, which drug may work best for which patient, and so on. As the field of data science has grown, many practitioners are realizing the value of causal inference in providing insights from data. However, unlike the streamlined experience for supervised machine learning with libraries like Tensorflow ([tensorflow]) and PyTorch ([pytorch]), it is non-trivial to build a causal inference analysis. Software libraries that implement state-of-the art causal inference methods can accelerate the adoption of causal inference among data analysts in both industry and academia. However, we find that for data scientists and machine learning engineers familiar with non-causal methods and unpracticed in the use of causal methods, one of the biggest challenges is the practice of modeling assumptions (i.e., translating domain knowledge into a causal graph) and the implications of these assumptions for causal identification and estimation. What is the right model? Another challenge is in the shift in verification and testing practicalities. Unlike supervised machine learning models that can be validated using held-out test data, causal tasks often have no ground truth answer available. Thus, checking core assumptions and applying sensitivity tests is critical to gaining confidence in results. But how to check those assumptions? Therefore, we built DoWhy, an end-to-end library for causal analysis that builds on the latest research in modeling assumptions and robustness checks ([athey2017state, kddtutorial]), and provides an easy interface for analysts to follow the best practices of causal inference. Specifically, DoWhy’s API is organized around the four key steps that are required for any causal analysis: Model, Identify, Estimate, and Refute. Model encodes prior knowledge as a formal causal graph, identify uses graph-based methods to identify the causal effect, estimate uses statistical methods for estimating the identified estimand, and finally refute tries to refute the obtained estimate by testing robustness to initial model’s assumptions. The focus on all the four steps, going from data to the final causal estimate (along with a measure of its robustness) is the key differentiator for DoWhy, compared to many existing libraries for causal inference in Python and R that only focus on estimation (the third step). These libraries expect an analyst to have already figured out how to build a reasonable causal model from data and domain knowledge, and to have identified the correct estimand. More critically, they also assume that the analyst may perform their own sensitivity and robustness checks, but provide no guidance on their own; which makes it hard to verify and build robust causal analyses. Under the hood, DoWhy builds on two of the most powerful frameworks for causal inference: graphical models ([pearl2009causality]) and potential outcomes ([imbens2015causal]). It uses graph-based criteria and do-calculus for modeling assumptions and identifying a non-parametric causal effect. For estimation, it switches to methods based primarily on potential outcomes. DoWhy is also built to be interoperable with other libraries that implement the estimation step. It currently supports calling EconML ([econml]) and CausalML ([causalml]) estimators. To summarize, DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts. DoWhy is available open-source on Github, https://github.com/microsoft/dowhy, and has a growing community, including over 2300 stars and 31 contributors. Many people have made key contributions that are improving the usability and functionality of the library such as an integrated Pandas interface for DoWhy’s four steps, and we welcome more community contributions. The library makes three key contributions:

  1. Provides a principled way of modeling a given problem as a causal graph so that all assumptions are explicit, and identifying a desired causal effect.
  2. Provides a unified interface for many popular causal inference estimation methods, combining the two major frameworks of graphical models and potential outcomes.
  3. Automatically tests for the validity of causal assumptions if possible and assesses the robustness of the estimate to violations.
    数据科学中的许多问题基本上都是因果问题,例如营销活动或新产品功能的影响、客户流失的原因、哪种药物最适合哪个患者等等。随着数据科学领域的发展,许多从业者开始意识到因果推断在提供数据洞察方面的价值。然而,与使用 Tensorflow ( [ tensorflow ] ) 和 PyTorch ( [ pytorch ]等库的监督式机器学习的简化体验不同)),建立因果推断分析并非易事。实现最先进的因果推理方法的软件库可以加速因果推理在工业界和学术界的数据分析师中的采用。然而,我们发现对于熟悉非因果方法且未实践因果方法使用的数据科学家和机器学习工程师来说,最大的挑战之一是建模假设的实践(即,将领域知识转化为因果图)和这些假设对因果识别和估计的影响。 什么是正确的模型? 另一个挑战是验证和测试实用性的转变。与可以使用保留的测试数据验证的监督机器学习模型不同,因果任务通常没有可用的基本事实答案。因此,检查核心假设和应用敏感性测试对于获得对结果的信心至关重要。 但是如何检查这些假设呢? 因此,我们构建了 DoWhy,这是一个用于因果分析的端到端库,它基于建模假设和稳健性检查的最新研究([ athey2017state , kddtutorial ]),并为分析师提供了一个简单的界面,以遵循因果推理的最佳实践。具体来说,DoWhy 的 API 围绕任何因果分析所需的四个关键步骤进行组织:模型、识别、估计和驳斥。 模型将先验知识编码为正式的因果图,识别使用基于图的方法识别因果效应,估计使用统计方法估计识别的估计量,最后反驳试图通过测试对初始模型假设的稳健性来反驳所获得的估计。与 Python 和 R 中的许多现有因果推断库相比,DoWhy 的关键区别在于从数据到最终因果估计(以及对其稳健性的衡量)的所有四个步骤(仅关注估计)。第三步)。这些库希望分析师已经弄清楚如何根据数据和领域知识构建合理的因果模型,并确定正确的估计量。更关键的是,他们还假设分析师可能会执行他们自己的敏感性和稳健性检查,但不会自己提供指导;这使得很难验证和构建可靠的因果分析。在幕后,DoWhy 建立在两个最强大的因果推理框架之上:[ pearl2009causality ])和潜在结果([ imbens2015causal ])。它使用基于图的标准和 do-calculus 对假设进行建模并识别非参数因果效应。对于估计,它切换到主要基于潜在结果的方法。DoWhy 还可以与其他实现估计步骤的库互操作。目前支持调用 EconML ( [ econml ] ) 和 CausalML ( [ causalml ]) 估计量。总而言之,DoWhy 为因果推理方法提供了统一的接口,并自动测试了许多假设,从而使非专家也可以进行推理。DoWhy 在 Github 上开源, https://github.com/microsoft/dowhy,并且拥有不断增长的社区,包括超过 2300 颗星和 31 位贡献者。许多人做出了重要贡献,以提高库的可用性和功能,例如 DoWhy 的四个步骤的集成 Pandas 界面,我们欢迎更多社区贡献。该库做出了三个主要贡献:
  4. 提供将给定问题建模为因果图的原则方法,以便所有假设都是明确的,并确定所需的因果效应。
  5. 为许多流行的因果推理估计方法提供统一接口,结合图形模型和潜在结果两大框架。
  6. 如果可能,自动测试因果假设的有效性,并评估估计对违规的稳健性。

2 DOWHY AND THE FOUR STEPS OF CAUSAL INFERENCE DOWHY 和因果推理的四个步骤

DoWhy is based on a simple unifying language for causal inference. Causal inference may seem tricky, but almost all methods follow four key steps. Figure 1 shows a schematic of the DoWhy analysis pipeline.
DoWhy 基于一种简单的因果推理统一语言。因果推断可能看起来很棘手,但几乎所有方法都遵循四个关键步骤。图 1显示了 DoWhy 分析管道的示意图。

The four-step analysis pipeline in DoWhy.
Figure 1: The four-step analysis pipeline in DoWhy.
图 1: DoWhy 中的四步分析流程。

I. Model the causal question. DoWhy creates an underlying causal graphical model ([pearl2009causality]) for each problem. This serves to make each causal assumption explicit. This graph need not be complete—an analyst may provide a partial graph, representing prior knowledge about some of the variables. DoWhy automatically considers the rest of the variables as potential confounders. II. Identify the causal estimand. Based on the causal graph, DoWhy finds all possible ways of identifying a desired causal effect based on the graphical model. It uses graph-based criteria and do-calculus to find potential ways find expressions that can identify the causal effect. Supported identification criteria are,
对因果问题建模。DoWhy为每个问题创建一个潜在的因果图模型([ pearl2009causality ])。这有助于使每个因果假设明确。该图不必是完整的——分析师可以提供部分图,表示有关某些变量的先验知识。DoWhy 自动将其余变量视为潜在的混杂因素。 二、确定因果估计。基于因果图,DoWhy 根据图形模型找到所有可能的方法来识别所需的因果效应。它使用基于图的标准和 do-calculus 来寻找潜在的方法来找到可以识别因果效应的表达式。支持的识别标准是,

  • Back-door criterion
  • Front-door criterion
  • Instrumental Variables
  • Mediation (Direct and indirect effect identification)
  • 后门准则
  • 前门准则
  • 工具变量
  • 中介(直接和间接效果识别)

III. Estimate the causal effect. DoWhy supports methods based on both back-door criterion and instrumental variables. It also provides a non-parametric confidence intervals and a permutation test for testing the statistical significance of obtained estimate. Supported estimation methods include,
估计因果效应。DoWhy 支持基于后门标准和工具变量的方法。它还提供了一个非参数置信区间和一个置换检验,用于检验获得的估计值的统计显着性。支持的估计方法包括,

  • Methods based on estimating the treatment assignment: Propensity-based Stratification, Propensity Score Matching, Inverse Propensity Weighting
  • Methods based on estimating the outcome model: Linear Regression, Generalized Linear Models
  • Methods based on the instrumental variables identification: Binary Instrument/Wald Estimator, Two-stage least squares, Regression discontinuity
  • Methods for front-door criterion and general mediation: Two-stage linear regression
  • 基于估计治疗分配的方法:基于倾向的分层、倾向评分匹配、逆倾向加权
  • 基于估计结果模型的方法:线性回归、广义线性模型
  • 基于工具变量识别的方法:二元仪器/瓦尔德估计、两阶段最小二乘法、回归不连续性
  • 前门准则和一般调解的方法:两阶段线性回归

In addition, DoWhy support integrations with the EconML and CausalML packages for estimating the conditional average treatment effect (CATE). All estimators from these libraries can be directly called from DoWhy.
此外,DoWhy 支持与 EconML 和 CausalML 包的集成,用于估计条件平均治疗效果 (CATE)。这些库中的所有估算器都可以直接从 DoWhy 调用。

IV. Refute the obtained estimate. Having access to multiple refutation methods to validate an effect estimate from a causal estimator is a key benefit of using DoWhy. Supported refutation methods include:四、反驳得到的估计。 使用多种反驳方法来验证因果估计器的效果估计是使用 DoWhy 的一个主要好处。支持的反驳方法包括:

  • Add Random Common Cause: Does the estimation method change its estimate after we add an independent random variable as a common cause to the dataset? (Hint: It should n