没有合适的资源?快使用搜索试试~ 我知道了~
A survey on causal inference
需积分: 0 2 下载量 20 浏览量
2023-09-30
04:20:12
上传
评论
收藏 946KB PDF 举报
温馨提示
试读
46页
因果推断
资源推荐
资源详情
资源评论
74
A Survey on Causal Inference
LIUYI YAO, Alibaba Group
ZHIXUAN CHU and SHENG LI, University of Georgia
YALIANG LI, Alibaba Group
JING GAO, Purdue University
AIDONG ZHANG, University of Virginia
Causal inference is a critical research topic across many domains, such as statistics, computer science, ed-
ucation, public policy, and economics, for decades. Nowadays, estimating causal eect from observational
data has become an appealing research direction owing to the large amount of available data and low bud-
get requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine
learning area, various causal eect estimation methods for observational data have sprung up. In this survey,
we provide a comprehensive review of causal inference methods under the potential outcome framework,
one of the well-known causal inference frameworks. The methods are divided into two categories depending
on whether they require all three assumptions of the potential outcome framework or not. For each category,
both the traditional statistical methods and the recent machine learning enhanced methods are discussed
and compared. The plausible applications of these methods are also presented, including the applications
in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as
well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore,
evaluate and apply the causal inference methods.
CCS Concepts: • Computing methodologies → Causal reasoning and diagnostics; Machine learning;•
Information systems → Data mining;
Additional Key Words and Phrases: Treatment eect estimation; Representation learning
ACM Reference format:
Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, Jing Gao, and Aidong Zhang. 2021. A Survey on Causal
Inference. ACM Trans. Knowl. Discov. Data 15, 5, Article 74 (May 2021), 46 pages.
https://doi.org/10.1145/3444944
Work done when Liuyi Yao was a Ph.D. student at University at Bualo.
This work is supported in part by the US National Science Foundation under grants IIS-1747614, IIS-2008208, IIS-1934600,
IIS-1938167, and IIS-1955151.
Authors’ addresses: L. Yao, Alibaba Group, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, 311121, China;
email: [email protected]; Z. Chu, University of Georgia, 415 Boyd Graduate Studies Research Center, Athens,
Georgia, 30602-7404, USA; email: [email protected]; S. Li, University of Georgia, 415 Boyd Graduate Studies Research
Center, Athens, Georgia, 30602-7404, USA; email: [email protected]; Y. Li, Alibaba Group, 500 108th Ave NE, Suite800,
Bellevue, Washington, 98004, USA; email: [email protected]; J. Gao, Purdue University, 465 Northwestern Ave.,
West Lafayette, Indiana, 47907-2035, USA; email: [email protected]; A. Zhang, University of Virginia, 85 Engineer’s
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
© 2021 Association for Computing Machinery.
1556-4681/2021/05-ART74 $15.00
https://doi.org/10.1145/3444944
ACM Transactions on Knowledge Discovery from Data, Vol. 15, No. 5, Article 74. Publication date: May 2021.
74:2 L. Yao et al.
1 INTRODUCTION
In everyday language, correlation and causality are commonly used interchangeably, although
they have quite dierent interpretations. Correlation indicates a general relationship: two variables
are correlated when they display an increasing or decreasing trend [7]. Causality is also referred
to as cause and eect where the cause is partly responsible for the eect, and the eect is partly
dependent on the cause. Causal inference is the process of drawing a conclusion about a causal
connection based on the conditions of the occurrence of an eect. The main dierence between
causal inference and inference of correlation is that the former analyzes the response of the eect
variable when the cause is changed [104, 150].
It is well known that “correlation does not imply causation.” For example, a study showed that
girls have breakfast normally have lightweight than the girls who don’t and thus concluded that
having breakfast can help to lose weight. But in fact, these two events may just have correlation
instead of causality. Maybe the girls who have breakfast every day have a better lifestyle, such
as exercise frequently, sleep regularly, and have a healthy diet, which nally makes them have
lightweight. In this case, having a better lifestyle is the common cause of both having breakfast
and lightweight, and thus we also can treat it as a confounder of the causality between having
breakfast and lightweight.
In many cases, it seems obvious that one action can cause another; however, there exists also
many cases that we cannot easily tease out and make sure the relationship. Therefore, learning
causality is one dauntingly challenging problem. The most eective way of inferring causality is
to conduct a randomized controlled trial, which randomly assigns participants into a treatment
group or a control group. As the randomized study is conducted, the only expected dierence
between the control and treatment groups is the outcome variable being studied. However, in re-
ality, randomized controlled trials are always time-consuming and expensive, and thus the study
cannot involve many subjects, which may be not representative of the real-world population a
treatment/intervention would eventually target. Another issue is that the randomized controlled
trials only focus on the average of samples, and it does not explain the mechanism or pertain for
individual subjects. In addition, ethical issues also need to be considered in most of the random-
ized controlled trials, which largely limits its applications. Therefore, instead of the randomized
controlled trials, the observational data is a tempting shortcut. Observational data is obtained by
the researcher simply observing the subjects without any interfering. That means, the researchers
have no control over treatments and subjects, and they just observe the subjects and record data
based on their observations. From the observational data, we can nd their actions, outcomes,
and information about what has occurred, but cannot gure out the mechanism why they took a
specic action. For the observational data, the core question is how to get the counterfactual out-
come. For example, we want to answer this question “would this patient have dierent results if he
received a dierent medication?” Answering such counterfactual questions is challenging due to
two reasons [135]: the rst one is that we only observe the factual outcome and never the counter-
factual outcomes that would potentially have happened if they have chosen a dierent treatment
option. The second one is that treatments are typically not assigned at random in observational
data, which may lead the treated population diers signicantly from the general population.
To solve these problems in causal inference from observational data, researchers develop
various frameworks, including the potential outcome framework [127, 149] and the structural
causal model (SCM) [102, 105, 107]. The potential outcome framework is also known as the
Neyman–Rubin Potential Outcomes or the Rubin Causal Model. In the example, we mentioned
above, a girl would have a particular weight if she had breakfast normally every day, whereas she
would have a dierent weight if she didn’t have breakfast normally. To measure the causal eect of
ACM Transactions on Knowledge Discovery from Data, Vol. 15, No. 5, Article 74. Publication date: May 2021.
A Survey on Causal Inference 74:3
having breakfast normally for a girl, we need to compare the outcomes for the same person under
both situations. Obviously, it is impossible to see both potential outcomes at the same time, and one
of the potential outcomes is always missing. The potential outcome framework aims to estimate
such potential outcomes and then calculate the treatment eect. Therefore, the treatment eect
estimation is one of the central problems in causal inference under the potential outcome frame-
work. Another inuential framework in causal inference is the SCM, which includes the causal
graph and the structural equations. The SCM describes the causal mechanisms of a system where
a set of variables and the causal relationship among them are modeled by a set of simultaneous
structural equations. Another line of learning causality is causal structure learning, whose
objective is to reveal the causal relation by generating a causal graph. Representative methods can
be divided into three categories, including constraint-based models [147], score-based models [31,
114], and functional causal models [62, 176]. Dierent from causal eect estimation, causal
structure learning address a dierent class of problems, which is out of our survey’s scope;
see [148] for more information.
The causal inference has a close relationship with the machine learning area. In recent years, the
magnicent bloom of the machine learning area enhances the development of the causal inference
area. Powerful machine learning methods such as decision tree, ensemble methods, deep neural
network, are applied to estimate the potential outcome more accurately. In addition to the amelio-
ration of the outcome estimation model, machine learning methods also provide a new aspect to
handle the confounders. Benetting from the recently deep representation learning methods, the
confounder variables are adjusted by learning the balanced representation for all covariates, so
that conditioning on the learned representation, the treatment assignment is independent of the
confounder variables. In machine learning, the more data the better. However, in causal inference,
more data alone is not yet enough. Having more data only helps to get more precise estimates, but
it cannot make sure these estimates are correct and unbiased. Machine learning methods enhance
the development of causal inference, meanwhile, causal inference also helps machine learning
methods. The simple pursuit of predictive accuracy is insucient for modern machine learning
research, and correctness and interpretability are also the targets of machine learning methods.
Causal inference is starting to help to improve machine learning, such as recommender systems
or reinforcement learning.
In this article, we provide a comprehensive review of the causal inference methods under the po-
tential outcome framework. We rst introduce the basic concepts of the potential outcome frame-
work as well as its three critical assumptions to identify the causal eect. After that, various causal
inference methods with these three assumptions are discussed in detail, including re-weighting
methods, stratication methods, matching based methods, tree-based methods, representation-
based methods, multi-task learning based methods, and meta-learning methods. Additionally,
causal eect estimation methods that relax the three assumptions are also described to fulll the
needs in dierent settings. After introducing various causal eect estimation methods, the real-
world applications that the discussed methods have great potential to benet are discussed, in-
cluding the advertisement area, recommendation area, medicine area, and reinforcement learning
area as the representative examples.
To the best of our knowledge, this is the rst article that provides a comprehensive survey for
causal inference methods under the potential outcome framework. There also exist several surveys
that discuss one category of the causal eect estimation methods, such as the survey of matching
based methods [151], survey of tree-based and ensemble-based method [12], and the review of
dynamic treatment regimes [28]. For the SCM, it is suggested to refer to the survey [104]orthe
book [103]. There is also a survey about learning causality from observational data [52]whose
content ranges from inferring the causal graph from observational data, SCM, potential outcome
ACM Transactions on Knowledge Discovery from Data, Vol. 15, No. 5, Article 74. Publication date: May 2021.
74:4 L. Yao et al.
framework, and their connection to machine learning. Compared with the surveys mentioned
above, this survey article mainly focuses on the theoretical background of the potential outcome
framework, the representative methods across the statistic domain and machine learning domain,
and how this framework and the machine learning area enhance each other.
To summarize, our contributions to this survey are as follows:
—New taxonomy: We separate various causal inference methods into two major categories
based on whether they require the three assumptions of the potential outcome framework.
The category requiring three assumptions are further divided into seven sub-categories
based on the way to handle the confounder variables.
—Comprehensive review: We provide a comprehensive survey of the causal inference methods
under the potential outcome framework. In each category, the detailed descriptions of the
representative methods, the connection and comparison between the mentioned methods,
and the general summation are provided.
—Abundant resources: In this survey, we list the state-of-art methods, the benchmark datasets,
open-source codes, and representative applications.
The rest of the article is organized as follows. In Section 2, the background of the potential
outcome framework is introduced, including the basic denitions, the assumptions, and the fun-
damental problems with their general solutions. In Section 3, the methods under three assumptions
are presented. Then, in Section 4, we discuss the problem when some assumptions are not satised,
and describe the methods that relax those assumptions. Next, we provide experimental guidelines
in Section 5. Afterward, in Section 6, the typical applications of causal inference are illustrated.
After that, in Section 7, the future directions and open problems are discussed. Finally, Section 8
summarizes the article.
2 BASIC OF CAUSAL INFERENCE
In this section, we present the background knowledge of causal inference, including task descrip-
tion, mathematical notions, assumptions, challenges, and general solutions. We also give an illus-
trative example that will be used throughout this survey.
Generally speaking, the task of causal inference is to estimate the outcome changes if another
treatment had been applied. For example, suppose there are two treatments that can be applied to
patients: Medicine A and Medicine B. When applying Medicine A to the interested patient cohort,
the recovery rate is 70%, while applying Medicine B to the same cohort, the recovery rate is 90%.
The change of recovery rate is the eect that treatment (i.e., medicine in this example) asserts on
the recovery rate.
The above example describes an ideal situation to measure the treatment eect: applying dier-
ent treatments to the same cohort. In real-world scenarios, this ideal situation can only be approx-
imated by a randomized experiment, in which the treatment assignment is controlled, such as a
completely random assignment. In this way, the group receives a specic treatment can be viewed
as an approximation to the cohort we are interested in.
However, performing randomized experiments are expensive, time-consuming, and sometimes
even unethical. Therefore, estimating the treatment eect from observational data has attracted
growing attention due to the wide availability of observational data. Observational data usually
contains a group of individuals taken dierent treatments, their corresponding outcomes, and
possibly more information, but without direct access to the reason/mechanism why they took the
specic treatment. Such observational data enable researchers to investigate the fundamental
problem of learning the causal eect of a certain treatment without performing randomized
experiments. To better introduce various treatment eect estimation methods, the following
ACM Transactions on Knowledge Discovery from Data, Vol. 15, No. 5, Article 74. Publication date: May 2021.
A Survey on Causal Inference 74:5
section introduces several denitions, including unit, treatment, outcome, treatment eect, and
other information (pre- and post-treatment variables) provided by observational data.
2.1 Definitions
Here we dene the notations under the potential outcome framework [127, 149], which is logically
equivalent to another framework, the SCM framework [72]. The foundation of the potential out-
come framework is that the causality is tied to treatment (or action, manipulation, intervention),
applied to a unit [69]. The treatment eect is obtained by comparing units’ potential outcomes of
treatments. In the following, we rst introduce three essential concepts in causal inference: unit,
treatment, and outcome.
Denition 1 (Unit). A unit is the atomic research object in the treatment eect study.
A unit can be a physical object, a rm, a patient, an individual person, or a collection of objects
or persons, such as a classroom or a market, at a particular time point [69]. Under the potential
outcome framework, the atomic research objects at dierent time points are dierent units. One
unit in the dataset is a sample of the whole population, so in this survey, the term “sample” and
“unit” are used interchangeably.
Denition 2 (Treatment). Treatment refers to the action that applies (exposes, or subjects) to a
unit.
Let W (W ∈
{
0, 1, 2,...,N
W
}
) denote the treatment, where N
W
+ 1 is the total number of pos-
sible treatments. In the aforementioned medicine example, Medicine A is a treatment. Most of the
literatures consider the binary treatment, and in this case, the group of units applied with treat-
ment W = 1isthetreated group, and the group of units with W = 0isthecontrol group.
Denition 3 (Potential Outcome). For each unit-treatment pair, the outcome of that treatment
when applied on that unit is the potential outcome [69].
The potential outcome of treatment with value w is denoted as Y (W = w).
Denition 4 (Observed Outcome). The observed outcome is the outcome of the treatment that is
actually applied.
The observed outcome is also called factual outcome, and we use Y
F
to denote it where F
stands for “factual.” The relation between the potential outcome and the observed outcome is:
Y
F
= Y (W = w) where w is the treatment actually applied.
Denition 5 (Counterfactual Outcome). Counterfactual outcome is the outcome if the unit had
taken another treatment.
The counterfactual outcomes are the potential outcomes of the treatments except the one actu-
ally taken by the unit. Since a unit can only take one treatment, only one potential outcome can
be observed, and the remaining unobserved potential outcomes are the counterfactual outcome.
In the multiple treatment case, let Y
CF
(W = w
) denote the counterfactual outcome of treatment
with value w
. In the binary treatment case, for notation simplicity, we use Y
CF
to denote the
counterfactual outcome, and Y
CF
= Y (W = 1 − w),wherew is the treatment actually taken by the
unit.
In the observational data, besides the chosen treatments and the observed outcome, the units’
other information is also recorded, and they can be separated as pre-treatment variables and the
post-treatment variables.
ACM Transactions on Knowledge Discovery from Data, Vol. 15, No. 5, Article 74. Publication date: May 2021.
剩余45页未读,继续阅读
资源评论
sinat_27639359
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功