771-ASurveyonConceptDriftAdaptation.pdf资源-CSDN文库

需积分: 24 165 浏览量 2020-03-29 10:48:10 上传评论收藏 731KB PDF 举报

资源推荐

资源详情

资源评论

A Survey on Concept Drift Adaptation

AO GAMA, University of Porto, Portugal

INDR

ZLIOBAIT

E, Aalto University, Finland

ALBERT BIFET, Yahoo! Research Barcelona, Spain

MYKOLA PECHENIZKIY, Eindhoven University of Technology, the Netherlands

ABDELHAMID BOUCHACHIA, Bournemouth University, UK

Concept drift primarily refers to an online supervised learning scenario when the relation between the in-

put data and the target variable changes over time. Assuming a general knowledge of supervised learning

in this paper we characterize adaptive learning process, categorize existing strategies for handling concept

drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation

methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the

concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re-

searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept

drift in an integrated way to reﬂect on the existing scattered state-of-the-art.

Categories and Subject Descriptors: I.2.6 [Artiﬁcial Intelligence]: Learning

General Terms: Design, Algorithms, Performance

Additional Key Words and Phrases: concept drift, change detection, adaptive learning

ACM Reference Format:

Gama, J.,

Zliobait

e, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2013. A Survey on Concept Drift Adap-

tation. ACM Comput. Surv. 1, 1, Article 1 (January 2013), 35 pages.

DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

1. INTRODUCTION

Our digital universe is rapidly growing. The volume of data generated in 2012 has

been estimated to surpass 2.8 zetabytes (2.8 trillion gigabytes) as reported in the IDC

survey [Gantz and Reinsel 2012]. Efﬁcient and effective tools and analysis methods for

dealing with the ever-growing amount of data in different applications and ﬁelds are of

paramount need. Very often data comes in the form of streams rendering its analysis

and processing even more resource demanding.

Traditionally in data mining data is ﬁrst collected and then processed in an ofﬂine

mode. For instance, predictive models are trained using historical data given as a set

of pairs (input, output). Models trained in such a way can be afterwards applied for

predicting the output for new unseen input data. However, streaming data can not be

processed similarly because data comes continuously over time and possibly is never-

ending. Accommodating such data in the machine’s main memory is impractical and

often infeasible. Hence, only an online processing is suitable. In this case, predictive

models can be trained either incrementally by continuous update or by retraining us-

ing recent batches of data.

In dynamically changing and non-stationary environments, the data distribution

can change over time yielding the phenomenon of concept drift [Schlimmer and

Granger 1986; Widmer and Kubat 1996]. The real concept drift

refers to changes in

the conditional distribution of the output (i.e., target variable) given the input (input

features), while the distribution of the input may stay unchanged. A typical example

of the real concept drift is a change in user’s interests when following an online news

stream. Whilst the distribution of the incoming news documents often remains the

same, the conditional distribution of the interesting (and thus not interesting) news

The term real refers to one particular type of concept drift. It doesn’t mean that other types of drift are not

concept drifts.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2013.

1:2 J. Gama et al.

documents for that user changes. Adaptive learning refers to updating predictive mod-

els online during their operation to react to concept drifts.

Over the last decade research related to learning with concept drift has been in-

creasingly growing and many drift-aware adaptive learning algorithms have been de-

veloped. In spite of the popularity of this research topic, no comprehensive survey on

concept drift handling techniques is available to the community. One of the reasons for

that is that the problem is of a wide scope and spans across different research ﬁelds.

Moreover, terminology is not well established, thus similar adaptive learning strate-

gies have been developed independently under different names in different contexts.

Taking account of the current picture of research on concept drift, being very popular

but also scattered among various communities, there is a strong need for a comprehen-

sive summary of the research done so far to unify the concepts and terminology among

the researchers and to survey the state-of-the-art methodologies and techniques inves-

tigated over the past.

Several reviews related to drift-aware learning are available. However, they either

do not focus exclusively on concept drift or relate to speciﬁc topics of adaptive learn-

ing. Thus these reviews are fragmented and/or are outdated. Currently the most cited

survey on concept drift was published back in 2004 in [Tsymbal 2004]. The following

overviews which are related to the topic of concept drift focused on ensemble tech-

niques [Kuncheva 2004; 2008], inductive rule learning algorithms [Maloof 2010], or

mainly on non-incremental learning techniques [Zliobaite 2009] that can use compu-

tational resources unrestrictedly, thus were limited in scope. Reviews on data streams

[Gaber et al. 2005; Gama 2010; Bifet et al. 2011a] only partially deal with data drift.

Data streams research covers adaptive learning only to some extent, while the main

focus remains on making learning algorithms incremental and optimizing the balance

of computational resources and the predictive accuracy.

Several reviews are limited to speciﬁc application ﬁelds. A focused position paper

[Grisogono 2006] presents a set of requirements for complex adaptive systems to be

used for defence. A recent focused review [Kadlec et al. 2011] surveys adaptation mech-

anisms that have been used for soft sensors. Finally, a recent article [Moreno-Torres

et al. 2012] focuses on describing various ways how data distribution can change over

time and only brieﬂy covers adaptation techniques from dataset shift community per-

spective, mostly leaving out works on concept drift. A recent review [Alberg et al. 2012]

focuses on decision trees.

The present contribution provides an integrated view on handling concept drift, by

surveying adaptive learning methods, presenting evaluation methodologies and dis-

cussing illustrative applications. It focuses on online supervised learning when the

relation between the input features and the target variable changes over time.

The paper is organized as follows. In Section 2 we introduce the problem of con-

cept drift, characterize adaptive learning algorithms and present motivating applica-

tion examples. Section 3 presents a comprehensive taxonomy of methods for adaptive

learning. Section 4 discusses the experimental settings and evaluation methodologies

of adaptive learning algorithms. Section 5 concludes the survey.

2. ADAPTIVE LEARNING ALGORITHMS

Learning algorithms often need to operate in dynamic environments, which is chang-

ing unexpectedly. One desirable property of these algorithms is their ability of incorpo-

rating new data. If the data generating process is not strictly stationary (as applies to

most of the real world applications), the underlying concept, which we are predicting

(for example, interests of a user reading news), may be changing over time. The ability

to adapt to such concept drift can be seen as a natural extension for the incremen-

tal learning systems [Giraud-Carrier 2000] that learn predictive models example by

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2013.

A Survey on Concept Drift Adaptation 1:3

example. Adaptive learning algorithms can be seen as advanced incremental learning

algorithms that are able to adapt to evolution of the data generating process over time.

This section introduces concept drift and characterizes adaptive learning.

2.1. Setting and deﬁnitions

In machine learning the supervised learning problem is formally deﬁned as follows.

We aim to predict a target variable y ∈ <

in regression tasks (or y categorical in

classiﬁcation tasks) given a set of input features X ∈ <

. An example is one pair of

(X, y). For instance, X is a set of sensor readings of a chemical process at 2 p.m. on the

of January and y = “good

is the true quality of the produced product at that time.

In the training examples, that are used for model building, both X and y are known.

In the new examples, on which the predictive model is applied, X is known, but y is

not known at the time of prediction.

According to the Bayesian Decision Theory [Duda et al. 2001], a classiﬁcation can

be described by the prior probabilities of the classes p(y) and the class conditional

probability density functions p(X|y) for all classes y = 1, . . . , c, where c is the number

of classes. The classiﬁcation decision is made according to the posterior probabilities

of the classes, which for class y can be represented as

p(y|X) =

p(y)p(X|y)

p(X)

, (1)

where p(X) =

y=1

p(y)p(X|y). Here equal costs of misclassiﬁcation are assumed.

The type of the target variable space depends on the task. In classiﬁcation the target

variable takes categorical values (class labels), while in regression the target variable

takes continuous values.

We can distinguish two learning modes: ofﬂine learning and online learning. In of-

ﬂine learning the whole training data must be available at the time of model training.

Only when training is completed the model can be used for predicting. In contrast,

online algorithms process data sequentially. They produce a model and put it in oper-

ation without having the complete training data set available at the beginning. The

model is continuously updated during operation as more training data arrives.

Less restrictive than online algorithms are incremental algorithms that process in-

put examples one-by-one (or batch-by-batch) and update the decision model after re-

ceiving each example. Incremental algorithms may have random access to previous ex-

amples or representative/selected examples. In such a case these algorithms are called

incremental algorithms with partial memory [Maloof and Michalski 2004]. Typically,

in incremental algorithms, for any new presentation of data, the update operation of

the model is based on the previous one. Streaming algorithms are online algorithms

for processing high-speed continuous ﬂows of data. In streaming, examples are pro-

cessed sequentially as well and can be examined in only a few passes (typically just

one). These algorithms use limited memory and limited processing time per item.

In the setting that we are considering data arrives online, often in real time, forming

a stream which is potentially inﬁnite. The machinery is given input data that has just

arrived to predict its target variable(s). That is, a prediction machinery is deﬁned as

a mapping function between the input (feature) space and its corresponding output

(target) space. For instance, given sensor readings in a chemical production process

the task is to predict the quality of the product (output).

Because data is expected to evolve over time - especially in dynamically changing en-

vironments, where non-stationarity is typical, its underlying distribution can change

dynamically over time. The general assumption in the concept drift setting is that

the change happens unexpectedly and is unpredictable, although in some particular

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2013.

1:4 J. Gama et al.

real-world situations the change can be known ahead of time in correlation with the

occurrence of particular environmental events. But solutions for the general case of

drift entail the solutions for the particular cases. Moreover the change may take dif-

ferent forms, i.e. the input data characteristics or the relation between the input data

and the target variable may change.

Formally concept drift between time point t

and time point t

can be deﬁned as

∃X : p

(X, y) 6= p

(X, y), (2)

where p

denotes the joint distribution at time t

between the set of input variables

X and the target variable y. Changes in data can be characterized as changes in the

components of this relation [Kelly et al. 1999; Gao et al. 2007]. In other terms,

— the prior probabilities of classes p(y) may change,

— the class conditional probabilities p(X|y) may change, and

— as a result, the posterior probabilities of classes p(y|X) may change affecting the

prediction.

We are interested to know two implications of these changes: First, we are interested to

know (i) whether the data distribution p(y|X) changes and affects the predictive deci-

sion and (ii) whether the changes are visible from the data distribution without know-

ing the true labels, i.e. p(X) changes. From a predictive perspective only the changes

that affect the prediction decision require adaptation.

We can distinguish the following types of drifts:

(1) Real concept drift refers to changes in p(y|X). Such changes can happen either with

or without change in p(X). Real concept drift has been referred to as concept shift

in [Salganicoff 1997] and conditional change in [Gao et al. 2007].

(2) Population drift refers to changes in the population from which future samples will

be drawn compared the design/training sample was drawn[Kelly et al. 1999].

(3) Virtual drift happens if the distribution of the incoming data changes (i.e., p(X)

changes) without affecting p(y|X)[Delany et al. 2005; Tsymbal 2004; Widmer and

Kubat 1993]. However virtual drift has had different interpretations in the litera-

ture:

— Originally a virtual drift has been deﬁned [Widmer and Kubat 1993] to occur

due to incomplete data representation rather than change in concepts in reality,

— Virtual drift corresponds to change in data distribution that leads to changes in

the decision boundary[Tsymbal 2004],

— Virtual drift is a drift that does not affect the target concept [Delany et al. 2005],

— Virtual drift has been also referred to as temporary drift [Lazarescu et al.

2004],sampling shift [Salganicoff 1997] and feature change [Gao et al. 2007],

In this paper virtual drift refers to change in the data distribution p(X).

Example: Consider an online news stream of articles on real estate. The task for

a given user is to classify the incoming news into relevant and not relevant. Suppose

that the user is searching for a new apartment, then news on dwelling houses are rele-

vant whereas holiday homes are not relevant. If the editor of the news portal changes,

the writing style changes as well, but the dwelling houses remain relevant for the

user. This scenario corresponds to population drift. If due to a crisis more articles on

dwelling houses come out and less articles on holiday homes do, but the editor, the writ-

ing style, and the interests of the user remain the same, this situation corresponds to

drift in prior probabilities of the classes. If on the other hand the user has bought a

house and starts looking for a holiday destination, dwelling houses become not rele-

vant and holiday homes become relevant. This scenario corresponds to the real concept

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2013.

A Survey on Concept Drift Adaptation 1:5

drift. In this case that the writing style and the priors remain the same. It may happen

that all types of drifts takes place at the same time.

Figure 1 illustrates these concepts. We see that only the real concept drift changes

the decision boundary and the previous decision model becomes obsolete. In reality

the virtual drift, changing priors or novelties may appear in combination with the real

drift, in those cases the decision boundary is also affected.

Original data

Virtual driftReal concept drift

p(y|X) changes p(X) changes, but not p(y|X)

Fig. 1. Types of drifts: circles represent instances, different colors represent different classes.

This survey primarily focuses on handling the real concept drift which is not vis-

ible from the input data distribution. In many cases the techniques that handle the

real concept drift can also handle drifts that manifest in the input data distributions,

but not vice versa. The techniques that handle real concept drift typically rely on the

feedback about the predictive performance. In this present paper, drift that can be de-

tected from the incoming data distribution is not covered. This corresponds to tracking

drifting priors (an interested reader is referred to [Zhang and Zhou 2010]), and novelty

detection (an interested reader is referred to [Markou and Singh 2003; Masud et al.

2011]). Furthermore, semi-supervised drift handling techniques based on clustering

(an interested reader is referred to [Aggarwal 2005; Bouchachia et al. 2010]) are not

discussed in this paper.

2.2. Changes in data over time

Changes in data distribution over time may manifest in different forms, as illustrated

in Figure 2 on a one-dimensional data. There changes happen in the data mean. Drift

time

data mean

sudden/abrupt incremental gradual reoccuring concepts outlier (not concept drift)

Fig. 2. Patterns of changes over time (outlier is not concept drift).

may happen suddenly/abruptly by switching from one concept to another (e.g. replac-

ing a sensor in a chemical plant that has a different calibration), or incrementally con-

sisting on many intermediate concepts in between (e.g. a sensor slowly wears off and

becomes less accurate). Drift may happen suddenly (e.g. the topics of interest that one

is surveying as a credit analyst may suddenly switch from, for instance, meat prices

to public transportation) or gradually (e.g. relevant news topics change from dwelling

to holiday homes, while the user does not switch abruptly, but rather keeps going back

to the previous interest for some time). One of the challenges for concept drift han-

dling algorithms is not to mix the true drift with an outlier or noise which refers to a

once-off random deviation or anomaly (see [Chandola et al. 2009] for outlier detection).

No adaptivity is needed in the latter case. Finally, drifts may introduce new concepts

that were not seen before, or previously seen concepts may reoccur after some time

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2013.

剩余43页未读，继续阅读

评论收藏

内容反馈

hywcxq

粉丝: 0
资源: 33

771-A Survey on Concept Drift Adaptation.pdf

最新资源

771-A Survey on Concept Drift Adaptation.pdf

ConceptDrift-data:概念漂移实验的数据集

Domain Adaptation for Medical Image Analysis A Survey.pdf

A Survey on Deep Domain Adaptation for LiDAR Perception.pdf

Visual-Inertial Monocular SLAM with Map Reuse.pdf

Python库 | python_drift-0.6.1-py3-none-any.whl

大学生-微生物-期末复习名词解释排序版.pdf

react-native-drift：Drift.com平台的React Native包装器:link:

HDDM-0.8.0-cp36-cp36m-win_amd64.whl.zip

APS011_Sources-of-Error-in-Two-Way-Ranging-Schemes_v1.1.pdf

PyPI 官网下载 | data-drift-detector-mightyhive-0.0.2.tar.gz

Python库 | azureml_datadrift-1.27.0-py3-none-any.whl

Python库 | data-drift-0.0.0.tar.gz

GD10 0-20N_薄型软性压力传感器.PDF

UCAM-CL-TR-696.pdf

Random-Drift-Method.rar_drift

HDDM-0.6.1-cp34-cp34m-win_amd64.whl.zip

HDDM-0.7.1-cp35-cp35m-win_amd64.whl.zip

HDDM-0.8.0-cp37-cp37m-win_amd64.whl.zip

HDDM-0.8.0-pp38-pypy38_pp73-win_amd64.whl.zip

HDDM-0.7.1-cp27-cp27m-win_amd64.whl.zip

d3-discriminative-drift-detector-concept-drift:无监督概念漂移检测

2018年SaaS企业100强报告-Drift-201812.pdf

moa-release-2019.04.1.rar

运放参数的详细解释和分析-合集（共25集）

Mining decision rules on data streams in the presence of concept drifts.pdf

PyKrige Documentation.pdf

求解随机微分方程split-step欧拉方法的收敛性.docx

藏经阁-Building Data Pipelines with S.pdf

局域网搭建NTP时间服务器及配置借鉴.pdf

ULV8551 友顺UTC 电子元器件芯片.pdf

最新资源