DoWhy 一个端到端的因果推理库



DoWhy: An End-to-End Library for Causal Inference

Amit Sharma, Emre Kıcıman
Microsoft Research



Many questions in data science are fundamentally causal questions, such as the impact of a marketing campaign or a new product feature, the reasons for customer churn, which drug may work best for which patient, and so on. As the field of data science has grown, many practitioners are realizing the value of causal inference in providing insights from data. However, unlike the streamlined experience for supervised machine learning with libraries like Tensorflow ([tensorflow]) and PyTorch ([pytorch]), it is non-trivial to build a causal inference analysis. Software libraries that implement state-of-the art causal inference methods can accelerate the adoption of causal inference among data analysts in both industry and academia. However, we find that for data scientists and machine learning engineers familiar with non-causal methods and unpracticed in the use of causal methods, one of the biggest challenges is the practice of modeling assumptions (i.e., translating domain knowledge into a causal graph) and the implications of these assumptions for causal identification and estimation. What is the right model? Another challenge is in the shift in verification and testing practicalities. Unlike supervised machine learning models that can be validated using held-out test data, causal tasks often have no ground truth answer available. Thus, checking core assumptions and applying sensitivity tests is critical to gaining confidence in results. But how to check those assumptions? Therefore, we built DoWhy, an end-to-end library for causal analysis that builds on the latest research in modeling assumptions and robustness checks ([athey2017state, kddtutorial]), and provides an easy interface for analysts to follow the best practices of causal inference. Specifically, DoWhy’s API is organized around the four key steps that are required for any causal analysis: Model, Identify, Estimate, and Refute. Model encodes prior knowledge as a formal causal graph, identify uses graph-based methods to identify the causal effect, estimate uses statistical methods for estimating the identified estimand, and finally refute tries to refute the obtained estimate by testing robustness to initial model’s assumptions. The focus on all the four steps, going from data to the final causal estimate (along with a measure of its robustness) is the key differentiator for DoWhy, compared to many existing libraries for causal inference in Python and R that only focus on estimation (the third step). These libraries expect an analyst to have already figured out how to build a reasonable causal model from data and domain knowledge, and to have identified the correct estimand. More critically, they also assume that the analyst may perform their own sensitivity and robustness checks, but provide no guidance on their own; which makes it hard to verify and build robust causal analyses. Under the hood, DoWhy builds on two of the most powerful frameworks for causal inference: graphical models ([pearl2009causality]) and potential outcomes ([imbens2015causal]). It uses graph-based criteria and do-calculus for modeling assumptions and identifying a non-parametric causal effect. For estimation, it switches to methods based primarily on potential outcomes. DoWhy is also built to be interoperable with other libraries that implement the estimation step. It currently supports calling EconML ([econml]) and CausalML ([causalml]) estimators. To summarize, DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts. DoWhy is available open-source on Github,, and has a growing community, including over 2300 stars and 31 contributors. Many people have made key contributions that are improving the usability and functionality of the library such as an integrated Pandas interface for DoWhy’s four steps, and we welcome more community contributions. The library makes three key contributions:

  1. Provides a principled way of modeling a given problem as a causal graph so that all assumptions are explicit, and identifying a desired causal effect.
  2. Provides a unified interface for many popular causal inference estimation methods, combining the two major frameworks of graphical models and potential outcomes.
  3. Automatically tests for the validity of causal assumptions if possible and assesses the robustness of the estimate to violations.
    数据科学中的许多问题基本上都是因果问题,例如营销活动或新产品功能的影响、客户流失的原因、哪种药物最适合哪个患者等等。随着数据科学领域的发展,许多从业者开始意识到因果推断在提供数据洞察方面的价值。然而,与使用 Tensorflow ( [ tensorflow ] ) 和 PyTorch ( [ pytorch ]等库的监督式机器学习的简化体验不同)),建立因果推断分析并非易事。实现最先进的因果推理方法的软件库可以加速因果推理在工业界和学术界的数据分析师中的采用。然而,我们发现对于熟悉非因果方法且未实践因果方法使用的数据科学家和机器学习工程师来说,最大的挑战之一是建模假设的实践(即,将领域知识转化为因果图)和这些假设对因果识别和估计的影响。 什么是正确的模型? 另一个挑战是验证和测试实用性的转变。与可以使用保留的测试数据验证的监督机器学习模型不同,因果任务通常没有可用的基本事实答案。因此,检查核心假设和应用敏感性测试对于获得对结果的信心至关重要。 但是如何检查这些假设呢? 因此,我们构建了 DoWhy,这是一个用于因果分析的端到端库,它基于建模假设和稳健性检查的最新研究([ athey2017state , kddtutorial ]),并为分析师提供了一个简单的界面,以遵循因果推理的最佳实践。具体来说,DoWhy 的 API 围绕任何因果分析所需的四个关键步骤进行组织:模型、识别、估计和驳斥。 模型将先验知识编码为正式的因果图,识别使用基于图的方法识别因果效应,估计使用统计方法估计识别的估计量,最后反驳试图通过测试对初始模型假设的稳健性来反驳所获得的估计。与 Python 和 R 中的许多现有因果推断库相比,DoWhy 的关键区别在于从数据到最终因果估计(以及对其稳健性的衡量)的所有四个步骤(仅关注估计)。第三步)。这些库希望分析师已经弄清楚如何根据数据和领域知识构建合理的因果模型,并确定正确的估计量。更关键的是,他们还假设分析师可能会执行他们自己的敏感性和稳健性检查,但不会自己提供指导;这使得很难验证和构建可靠的因果分析。在幕后,DoWhy 建立在两个最强大的因果推理框架之上:[ pearl2009causality ])和潜在结果([ imbens2015causal ])。它使用基于图的标准和 do-calculus 对假设进行建模并识别非参数因果效应。对于估计,它切换到主要基于潜在结果的方法。DoWhy 还可以与其他实现估计步骤的库互操作。目前支持调用 EconML ( [ econml ] ) 和 CausalML ( [ causalml ]) 估计量。总而言之,DoWhy 为因果推理方法提供了统一的接口,并自动测试了许多假设,从而使非专家也可以进行推理。DoWhy 在 GitHub 上开源,,并且拥有不断增长的社区,包括超过 2300 颗星和 31 位贡献者。许多人做出了重要贡献,以提高库的可用性和功能,例如 DoWhy 的四个步骤的集成 Pandas 界面,我们欢迎更多社区贡献。该库做出了三个主要贡献:
  4. 提供将给定问题建模为因果图的原则方法,以便所有假设都是明确的,并确定所需的因果效应。
  5. 为许多流行的因果推理估计方法提供统一接口,结合图形模型和潜在结果两大框架。
  6. 如果可能,自动测试因果假设的有效性,并评估估计对违规的稳健性。


DoWhy is based on a simple unifying language for causal inference. Causal inference may seem tricky, but almost all methods follow four key steps. Figure 1 shows a schematic of the DoWhy analysis pipeline.
DoWhy 基于一种简单的因果推理统一语言。因果推断可能看起来很棘手,但几乎所有方法都遵循四个关键步骤。图 1 显示了 DoWhy 分析管道的示意图。

The four-step analysis pipeline in DoWhy.
Figure 1: The four-step analysis pipeline in DoWhy.
图 1: DoWhy 中的四步分析流程。

I. Model the causal question. DoWhy creates an underlying causal graphical model ([pearl2009causality]) for each problem. This serves to make each causal assumption explicit. This graph need not be complete—an analyst may provide a partial graph, representing prior knowledge about some of the variables. DoWhy automatically considers the rest of the variables as potential confounders. II. Identify the causal estimand. Based on the causal graph, DoWhy finds all possible ways of identifying a desired causal effect based on the graphical model. It uses graph-based criteria and do-calculus to find potential ways find expressions that can identify the causal effect. Supported identification criteria are,
对因果问题建模。DoWhy 为每个问题创建一个潜在的因果图模型([ pearl2009causality ])。这有助于使每个因果假设明确。该图不必是完整的——分析师可以提供部分图,表示有关某些变量的先验知识。DoWhy 自动将其余变量视为潜在的混杂因素。 二、确定因果估计。基于因果图,DoWhy 根据图形模型找到所有可能的方法来识别所需的因果效应。它使用基于图的标准和 do-calculus 来寻找潜在的方法来找到可以识别因果效应的表达式。支持的识别标准是,

  • Back-door criterion
  • Front-door criterion
  • Instrumental Variables
  • Mediation (Direct and indirect effect identification)
  • 后门准则
  • 前门准则
  • 工具变量
  • 中介(直接和间接效果识别)

III. Estimate the causal effect. DoWhy supports methods based on both back-door criterion and instrumental variables. It also provides a non-parametric confidence intervals and a permutation test for testing the statistical significance of obtained estimate. Supported estimation methods include,
估计因果效应。DoWhy 支持基于后门标准和工具变量的方法。它还提供了一个非参数置信区间和一个置换检验,用于检验获得的估计值的统计显着性。支持的估计方法包括,

  • Methods based on estimating the treatment assignment: Propensity-based Stratification, Propensity Score Matching, Inverse Propensity Weighting
  • Methods based on estimating the outcome model: Linear Regression, Generalized Linear Models
  • Methods based on the instrumental variables identification: Binary Instrument/Wald Estimator, Two-stage least squares, Regression discontinuity
  • Methods for front-door criterion and general mediation: Two-stage linear regression
  • 基于估计治疗分配的方法:基于倾向的分层、倾向评分匹配、逆倾向加权
  • 基于估计结果模型的方法:线性回归、广义线性模型
  • 基于工具变量识别的方法:二元仪器/瓦尔德估计、两阶段最小二乘法、回归不连续性
  • 前门准则和一般调解的方法:两阶段线性回归

In addition, DoWhy support integrations with the EconML and CausalML packages for estimating the conditional average treatment effect (CATE). All estimators from these libraries can be directly called from DoWhy.
此外,DoWhy 支持与 EconML 和 CausalML 包的集成,用于估计条件平均治疗效果 (CATE)。这些库中的所有估算器都可以直接从 DoWhy 调用。

IV. Refute the obtained estimate. Having access to multiple refutation methods to validate an effect estimate from a causal estimator is a key benefit of using DoWhy. Supported refutation methods include:四、反驳得到的估计。 使用多种反驳方法来验证因果估计器的效果估计是使用 DoWhy 的一个主要好处。支持的反驳方法包括:

  • Add Random Common Cause: Does the estimation method change its estimate after we add an independent random variable as a common cause to the dataset? (Hint: It should not)
  • Placebo Treatment: What happens to the estimated causal effect when we replace the true treatment variable with an independent random variable? (Hint: the effect should go to zero)
  • Dummy Outcome: What happens to the estimated causal effect when we replace the true outcome variable with an independent random variable? (Hint: The effect should go to zero)
  • Simulated Outcome: What happens to the estimated causal effect when we replace the outcome with a simulated outcome based on a known data-generating process closest to the given dataset? (Hint: It should match the effect parameter from the data-generating process)
  • Add Unobserved Common Causes: How sensitive is the effect estimate when we add an additional common cause (confounder) to the dataset that is correlated with the treatment and the outcome? (Hint: It should not be too sensitive)
  • Data Subsets Validation: Does the estimated effect change significantly when we replace the given dataset with a randomly selected subset? (Hint: It should not)
  • Bootstrap Validation: Does the estimated effect change significantly when we replace the given dataset with bootstrapped samples from the same dataset? (Hint: It should not)
  • 添加随机共同原因:在我们向数据集添加独立随机变量作为共同原因后,估计方法是否会改变其估计?(提示:不应该)
  • 安慰剂治疗:当我们用一个独立的随机变量替换真实的治疗变量时,估计的因果效应会发生什么?(提示:效果应该为零)
  • 虚拟结果:当我们用一个独立的随机变量替换真实的结果变量时,估计的因果效应会发生什么?(提示:效果应该归零)
  • 模拟结果:当我们用基于最接近给定数据集的已知数据生成过程的模拟结果替换结果时,估计的因果效应会发生什么?(提示:它应该与数据生成过程中的效果参数匹配)
  • 添加未观察到的常见原因:当我们向与治疗和结果相关的数据集添加额外的常见原因(混杂因素)时,效果估计有多敏感?(提示:不要太敏感)
  • 数据子集验证:当我们用随机选择的子集替换给定数据集时,估计的效果是否会发生显着变化?(提示:不应该)
  • 引导验证:当我们用来自同一数据集的引导样本替换给定数据集时,估计的效果是否会发生显着变化?(提示:不应该)

Many of the above methods aim to refute the full causal analysis, including modeling, identification and estimation (as in Placebo Treatment or Dummy Outcome) whereas others refute a specific step (e.g., Data Subsets and Bootstrap Validation that test only the estimation step).


In this section, we show how causal inference using DoWhy simplifies to four lines of code, each corresponding to one of the four steps. Each analysis starts with a building a causal model. The assumptions can be viewed graphically or in terms of conditional independence statements. Wherever possible, DoWhy can also automatically test for stated assumptions using observed data.
在本节中,我们将展示使用 DoWhy 的因果推理如何简化为四行代码,每行代码对应四个步骤之一。每个分析都从建立因果模型开始。可以以图形方式或根据条件独立性语句来查看这些假设。在可能的情况下,DoWhy 还可以使用观察到的数据自动测试陈述的假设。

I. Create a causal model from the data and given graph.  根据数据和给定的图形创建因果模型。

model = CausalModel(
模型 = 因果模型(

Given the model, identification is a causal problem. Estimation is simply a statistical problem. DoWhy respects this boundary and treats them separately. This focuses the causal inference effort on identification, and frees up estimation using any available statistical estimator for a target estimand. In addition, multiple estimation methods can be used for a single identified estimand and vice-versa.
给定模型,识别是一个因果问题。估计只是一个统计问题。DoWhy 尊重这个界限,并分别对待。这将因果推断工作集中在识别上,并使用任何可用的统计估计量来释放对目标估计量的估计。此外,多种估计方法可用于单个识别的估计量,反之亦然。

II. Identify causal effect and return target estimand 确定因果效应并返回目标估计值

identified_estimand = model.identify_effect()

III. Estimate the target estimand using a statistical method. 使用统计方法估计目标 estimand。

estimate  = model.estimate_effect(identified_estimand,


For data with high-dimensional confounders, machine learning-based estimators may be more effective. Therefore, DoWhy supports calling estimators from other libraries like EconML. Here is an example of using the double machine learning estimator ([chernozhukov2017double]).
对于具有高维混杂因素的数据,基于机器学习的估计器可能更有效。因此,DoWhy 支持从 EconML 等其他库调用估算器。这是使用双机器学习估计器 ( [ chernozhukov2017double ] ) 的示例。

dml_estimate = model.estimate_effect(
identify_estimand, method_name="backdoor.econml.dml.DMLCateEstimator",
'model_t': GradientBoostingRegressor(),
'featurizer':PolynomialFeatures(degree=1, include_bias=True)},

The most critical, and often skipped, part of causal analysis is checking the robustness of an estimate to unverified assumptions. DoWhy makes it easy to automatically run sensitivity and robustness checks on the obtained estimate.
因果分析中最关键且经常被忽略的部分是检查估计对未经验证的假设的稳健性。DoWhy 可以轻松地对获得的估计值自动运行敏感性和稳健性检查。

IV. Refute the obtained estimate using multiple robustness checks. 使用多重稳健性检查反驳获得的估计。
refute_results = model.refute_estimate(identified_estimand, estimate,

Finally, DoWhy is easily extensible, allowing other implementations of the four verbs to co-exist. The four verbs are mutually independent, so their implementations can be combined in any way. Example notebooks of using DoWhy for different causal problems are available at
最后,DoWhy 易于扩展,允许四个动词的其他实现共存。这四个动词是相互独立的,因此它们的实现可以以任何方式组合。 提供了使用 DoWhy 解决不同因果问题的示例笔记本。


We presented DoWhy, an extensible and end-to-end library for causal inference. Unlike most other libraries, DoWhy focuses on helping an analyst devise the correct causal model and test its assumptions, in addition to estimating the causal effect. We look forward to extending DoWhy with more refutation and robustness analyses, and supporting more estimation methods with its 4-step API.
我们展示了 DoWhy,这是一个用于因果推理的可扩展的端到端库。与大多数其他库不同,DoWhy 除了估计因果效应外,还专注于帮助分析师设计正确的因果模型并测试其假设。我们期待通过更多的反驳和稳健性分析来扩展 DoWhy,并通过其 4 步 API 支持更多的估计方法。


A big thanks to all the open-source contributors to DoWhy that continue to make important additions to the library’s functionality and usability. The list of contributors is updated at \printbibliography