【免费】提升与预测模型：理论分析.pdf资源-CSDN文库

需积分: 0 98 浏览量 2024-11-19 10:49:57 上传评论收藏 808KB PDF 举报

资源推荐

资源详情

资源评论

Uplift vs. predictive modeling: a theoretical analysis

eo Verhelst

, Robin Petit

, Wouter Verbeke

, Gianluca Bontempi

Machine-learning Group Universit

e Libre de Bruxelles, Belgium

Algorithms Research Group, Universit

e Libre de Bruxelles, Belgium

Information Systems Engineering Research Group, KU Leuven, Belgium

Abstract

Despite the growing popularity of machine-learning techniques in decision-making, the added

value of causal-oriented strategies with respect to pure machine-learning approaches has rarely been

quantified in the literature. These strategies are crucial for practitioners in various domains, such

as marketing, telecommunications, health care and finance. This paper presents a comprehensive

treatment of the subject, starting from firm theoretical foundations and highlighting the parameters

that influence the performance of the uplift and predictive approaches. The focus of the paper is on

a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift

modeling, comparing it with the classical predictive approach. The main research contributions of

the paper include a new formulation of the measure of profit, a formal proof of the convergence of

the uplift curve to the measure of profit ,and an illustration, through simulations, of the conditions

under which predictive approaches still outperform uplift modeling. We show that the mutual

information between the features and the outcome plays a significant role, along with the variance

of the estimators, the distribution of the potential outcomes and the underlying costs and benefits

of the treatment and the outcome.

Keywords: Uplift modeling, Profit measure, Causal inference, Decision-making

1 Introduction

With the growing popularity of machine-learning techniques in decision-making, the need for effective

and accurate models has become increasingly important in various domains. Conventional predictive

approaches have been used with success, for example, in churn prediction, where the models are built

to forecast whether a customer is likely to stop using a service based on historical data (

Oskarsd

ottir

et al. 2018; Zhu, Baesens, and Broucke 2017; Mitrovi

c et al. 2018; Idris and Khan 2014).

However, traditional predictive models often overlook an essential aspect of decision-making,

the causal nature of interventions. Recently, uplift modeling has been established as an important

approach to take this aspect into account for decision-making (Gutierrez and G

erardy 2016; Devriendt,

Berrevoets, and Verbeke 2021). Uplift modeling differs from conventional predictive models by explicitly

considering the causal effect of an intervention on the outcome variable. Rather than estimating the

arXiv:2309.12036v1 [cs.LG] 21 Sep 2023

conditional expectation of the outcome based on input features alone, uplift modeling focuses on

estimating the difference in outcomes under different treatment scenarios.

Consider a marketing campaign for churn prevention, as an example. The goal is to identify

customers who are less likely to churn in response to a promotional offer. Traditional predictive models

predict the likelihood of customer churning, however, they do not consider the causal effect of the

intervention (sending the offer) on the outcome (customer churn). In this setting, the possible behavior

of a customer can be summarized in terms of counterfactual statements (Devriendt, Berrevoets, and

Verbeke 2021):

• Sure thing: Customer not churning regardless of the action

• Persuadable: Customer churning only if not contacted

• Do-not-disturb: Customer churning only if contacted

• Lost cause: Customer churning regardless of the action

Ideally, only persuadable customers should be targeted by marketing actions. However, we observe

only one of the two potential outcomes (this is known as the fundamental problem of causal inference,

(Holland 1986)), and it is impossible to determine with certainty who are the persuadable customers.

Uplift modeling explicitly aims to estimate the difference in the probability of a positive outcome under

the treatment scenario (customer receives the offer) and the no-treatment scenario (customer does

not receive the offer). Individuals maximizing this difference are the most likely to generate a profit

increase when contacted. The term uplift is used mainly in business settings where large amounts of

experimental data are available, while in other fields, the same quantity is called the conditional average

treatment effect (CATE), or heterogeneous treatment effect (Gutierrez and G

erardy 2016), usually assuming

there is only access to observational data. A large number of models based on machine learning have

been developed in recent years to estimate uplift, such as the S-learner, T-learner and X-learner (Zhang,

J. Li, and Liu 2021; K

unzel et al. 2019).

Despite the intuitive appeal of uplift modeling, the added value of causal-oriented strategies with

respect to pure machine-learning predictive approaches has rarely been quantified in the literature. We

believe that it is important to assess whether the expected benefit of uplift strategies (derived from a bias

reduction in the estimation of causal effect) is still noticeable in settings where the data distribution is

characterized by a large number of dimensions, nonlinearity, class imbalance and low class separability.

The works of Devriendt, Berrevoets, and Verbeke (2021), Fern

andez-Loria and Provost (2022a),

Fern

andez-Loria and Provost (2022b), and Ascarza (2018) address this issue. Devriendt, Berrevoets, and

Verbeke (2021) and Ascarza (2018) present the uplift and predictive approaches, provide an empirical

evaluation of both approaches and conclude that uplift models should be preferred over churn models.

Fern

andez-Loria and Provost (2022a) develop an analytical criterion indicating when an uplift model

leads to a lower causal classification error than a predictive model for a given individual. The same

authors (Fern

andez-Loria and Provost 2022b) discuss and develop the differences between causal

classification and uplift modeling. and provide some qualitative arguments on when the predictive

approach should be preferred. We extend these papers by comprehensively treating the question,

starting from theoretical foundations and studying the influence of different characteristics of the

setting (distribution of the outcome, variance of the estimators, etc.) on the performance of the uplift

and predictive approaches.

A critical aspect of comparing the two approaches is the necessity for a meaningful and sensible

measure of model performance. In this paper, we extend the work of Verbeke, Olaya, Berrevoets, et al.

(2021) by developing a new formulation of the profit generated by a campaign where individuals targeted

by interventions are selected by a machine-learning model. By incorporating the concept of profit, we

go beyond the traditional evaluation metrics and consider the economic impact of decision-making

strategies. Our measure of profit generalizes Verbeke’s by accommodating varying costs and benefits

across individuals. This flexibility is beneficial, for example, in churn prediction, where prioritizing

higher-value customers is crucial. By selecting an appropriate measure, we ensure a fair and accurate

comparison between the uplift and predictive models, enabling decision-makers to make informed

choices based on the true effectiveness and suitability of each approach.

Our paper seeks to establish firm theoretical foundations for uplift modeling and to answer the

question “When does uplift modeling outperform predictive modeling?”. While we focus on a customer

churn prediction example, our findings have broad applicability across domains, including marketing,

telecommunications, health care and finance. Our main conclusions are as follows. The variance plays

a critical role in determining the performance of a model, and in most cases, the predictive approach

outperforms the uplift approach when the variance of the uplift estimator exceeds a certain threshold. We

also show the important impact of three other aspects: cost sensitivity, the mutual information between

the features and the outcome, and the distribution of the potential outcomes. While the importance

of cost sensitivity and the distribution of potential outcomes have been discussed in the literature

by Verbeke, Olaya, Berrevoets, et al. (2021) and Fern

andez-Loria and Provost (2022a), respectively, to

the best of our knowledge, the impact of mutual information has not been assessed before. We show

that it has an important impact on performance, independent of the other aspects (estimator variance,

cost sensitivity and distribution of potential outcomes).

Note, however, that we do not address the question of how to adapt uplift modeling to account for

cost sensitivity or the other aspects mentioned above. Our contributions pertain to model evaluation

rather than model optimization. Thus, it is left for future work to assess the effectiveness of cost-sensitive

models in terms of the metrics developed in this paper. On that topic, Gubela and Lessmann (2021)

have proposed a value-driven ranking method for targeted marketing campaigns.

The main research contributions of this paper are as follows:

•

A new formulation of the measure of profit, intensifying the focus on individual cost sensitivity

and on the stochastic nature of the machine-learning model used to rank individuals (Section 3.2).

•

A proof that the uplift curve (an evaluation curve often used in the uplift literature) is an estimator

of the measure of profit, highlighting the strict conditions necessary for the validity of the uplift

curve (Section 3.4).

•

An empirical estimator of the measure of profit, which is a cost-sensitive generalization of the

uplift curve (Section 3.5).

2.1 Notation

We use Pearl’s causal framework, which is based on the notion of structural causal models (SCM). A formal

definition of SCMs is given by Pearl (2009, Def. 7.1.1). Here,

is a random variable denoting the action,

or treatment,

is the outcome, and

is a set of features (or covariates) describing the unit/individual.

We denote the realizations of these variables as

t, y

and

, respectively. In this paper, we will limit

ourselves to considering the double binary causal classification case, that is, the setting where

y ∈ {0, 1}

and

t ∈ {0, 1}

. Importantly, we always assume having access to experimental data, in which the

treatment

is randomized. It is possible to learn the uplift from observational data, for example, with

propensity scores (K

unzel et al. 2019; Curth and Schaar 2021) or double machine learning (Jung, Tian,

and Bareinboim 2021), however, this is beyond the scope of this paper. The

do(t = t)

operator denotes a

causal intervention in the system. The conditional probability of

y = y

given

x = x

under intervention

do(t = t)

is written as

P (y = y | do(t = t), x = x)

, or

P (y

= y | x = x)

. For ease of notation,

we also define

(x) = P (y

= 1 | x = x) S

= P (y

= 1) (1)

(x) = P (y

= 1 | x = x) S

= P (y

= 1) (2)

U(x) = S

(x) − S

(x) U = S

− S

. (3)

In this notation,

is the uplift, or average treatment effect (ATE), and

U(x)

is the individual uplift, or

conditional average treatment effect (CATE). Note that, for example, in the literature pertaining to retail or

online advertisements, the uplift is defined as

U = S

− S

, and similarly

U(x) = S

(x) − S

(x)

This choice depends on whether the probability of the (positive) outcome

y = 1

should be minimized

(e.g., in churn prevention) or maximized (e.g., in sales). The uplift is then defined so that a positive uplift

corresponds to a beneficial outcome. Since we apply our results mostly to churn prevention, we use the

convention U = S

− S

Let

be a model that is used to rank individuals such that only the individuals with the highest

scores should be targeted by the action. The model

is trained from a data set

D = {(x

(i)

, y

(i)

, t

(i)

)}

i=1

iid

realizations of

(x, y, t)

. We assume that

is the result of a random process, and we denote it

as a random variable as D. We consider M(x, D

) as a learning algorithm, taking a data set D and a

set of features x as input and returning a score for x, for example, an estimation of U(x).

A threshold

is used to determine which individuals should be targeted. The model

prescribes

targeting all individuals with a score

M(x, D

) ≥ τ

and not targeting the remaining individuals. The

threshold

depends upon the model being used, because different models can provide scores in different

ranges, and that are differently distributed. Therefore, to consistently compare the performance of

different models, we let

ρ ∈ (0, 1)

be the proportion of individuals who should be targeted, and the

corresponding threshold

can be determined as the largest value that satisfies

ρ = P (M(x, D

) > τ )

Note the random variable

in this expression. Since

M(x, D

)

is a deterministic function of

(for a

given

M(x, D

)

is a random variable for which we can compute the probability

P (M(x, D

) >

τ).

The independence assumption might be violated in applications such as churn with, for example, a word-of-mouth effect

generating a second order of treatment.

剩余45页未读，继续阅读

评论收藏

内容反馈

KennySKwan

粉丝: 1825
资源: 3

提升与预测模型：理论分析.pdf

基于分形理论的一种新的机器学习方法：分形学习.pdf

大语言模型的工作原理与发展.pdf

基于遗传算法的拐点偏移距神经网络预测模型构建及分析.pdf

计算智能中的仿生学：理论与算法.pdf

预测模型预测模型预测模型.zip

MATLAB神经网络43个案例分析.pdf 源码.rar

混沌理论和机器学习算法的运动员成绩预测模型 (1).pdf

利用混沌理论进行船舶电力系统脆性模型仿真与分析.pdf

基于数据挖掘技术的输电工程造价预测模型的建立与实现.pdf

基于粒子群算法的滚削齿面综合轮廓误差预测模型与试验研究.pdf

基于指数平滑的云服务器请求量集成预测模型.pdf

基于BP神经网络的调剖效果预测模型分析.pdf

社会科学理论体系的基本结构、评价标准与建构方法参考.pdf

农村电商创业的关键影响因素及其效应分析.pdf

基于MATLAB的回归分析和灰理论组合模型在矿山边坡变形预测方面的应用.pdf

基于人工智能的钻速预测模型数据有效性下限分析.pdf

混沌理论和机器学习算法的运动员成绩预测模型.pdf

2020年中级经济师考试人力资源预测试题十六含答案.pdf

灰色预测公式的理论缺陷及改进.pdf

城市轨道交通机电设备报警的规律发掘与预测模型选择.pdf

基于混合贝叶斯网络数据挖掘及研究生升学预测模型的研究.pdf

基于SAS的回归预测分析.pdf

附录四 判别分析.pdf.zip

新能源客车新型“4P”营销理论分析.pdf

最新资源

附录四判别分析.pdf.zip