最新迁移学习综述论文（AComprehensiveSurveyonTransferLearning）-中科院.zip

共1个文件

pdf：1个

transfer_learnin

5星 · 超过95%的资源需积分: 46 30 浏览量 2019-11-12 23:17:12 上传评论 9 收藏 684KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

A Comprehensive Survey on Transfer Learning.zip （1个子文件）

A Comprehensive Survey on Transfer Learning.pdf 802KB

arXiv:1911.02685v1 [cs.LG] 7 Nov 2019

A Comprehensive Survey on Transfer Learning

Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Senior Member, IEEE,

Hui Xiong, Senior Member, IEEE, and Qing He

Abstract—Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge

contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced

for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in

machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce

approaches in a relatively isolat ed way and lack the recent advances in transfer learning. As the rapid expansion of the transfer

learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect

and systematize the existing transfer learning researches, as well as to summar ize and interpret the mechanisms and the strategies in

a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Different from

previous surveys, this survey paper reviews over forty representative transfer learning approaches from t he perspectives of data and

model. The applications of transfer l earning are also brieﬂy introduced. In order to show the performance of different transfer learning

models, twenty representative transfer learning models are used for experiments. The models are performed on three different

datasets, i.e., Amazon Reviews, Reuters-21578, and Ofﬁce-31. And the experimental results demonstrate the importance of selecting

appropriate transfer learning models for different applications in practice.

Index Terms—Transfer learning, machine learning, domain adaptation, interpretation.

✦

1 INTRODUCTION

LTHOUGH traditional machine learning technology has

achieved great success and has been successfully ap-

plied in many practical ap plications, it still has some limit a-

tions for certain real-world scenarios. The ideal scenario of

machine learning is that there are abundant labeled training

instances, which have the sa m e distribu tion of the test

data. However, collecting sufﬁcient training data is often

expensive, time-consuming, or even unrealistic in many

applications. Semi-supervised learning can pa rtly solve this

problem by relaxing the need of mass labeled data. Typ-

ically, a semi-supervised approach only requires a limited

number of labeled data, and it utilizes a large amount of un-

labeled data to improve the learning accuracy. But in many

cases, unlabeled instances are also difﬁcult to collect, which

usually makes the resultant traditional models uns atisfying.

Tra nsfer learning, which focuses on transferring the

knowledge across domains, is a promising machine learning

methodology for resolving the above p roblem. In practice,

a person who has learned the piano can learn the violin

faster than others. Inspired by human beings’ capabilities to

transfer knowledge across domains, transfer learning aims

to leverage knowledge from a related domain (called source

domain) to improve the learning performance or minimize

the number of labeled examples required in a target domain.

It is worth mentioning that the relationship between the

• Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yo ngchun Zhu,

and Qing He are with the Key Laboratory of Intelligent Information

Processing of Chinese Academy of Sciences (CAS), Institute of Computing

Technology, CAS, Beijing 100190, China and the University of Chinese

Academy o f Sciences, Beijing 100049, China.

• Hengshu Zhu is with Baidu Inc., No. 10 Shangdi 10th Street, Haidian

District, Beijing, China.

• Hui Xiong is with Rutgers, the State University of New Jersey, 1

Washington Park, N e wark, New Jersey, USA.

• Zhiyuan Qi is with the equal contribution to the ﬁrst authour.

source and the target domains affects the pe rformance of

the t ra nsfer learning models. Intuitively, a person who has

learned the viola usually learns the violin faster than the

one who has learned the piano. In contrast, if there is little

in common between the domains, the learner is particularly

likely to be negatively affected by the transferred knowl-

edge. This phenomenon is termed as negative transfer.

Roughly speaking, according to the discrepancy between

domains, transfer learning can be further divided into t wo

categories, i.e., homogeneous and heterogeneous transfer

learning [1 ]. Homogeneous transfer learning approaches are

developed and proposed for handling the situ ation t hat

the domains have the same feature sp ace. Some studies

assume that domains differ only in marginal distributions.

Therefore, they adapt the domains by correcting the sam-

ple selection bias [2] or covariate shift [3]. However, this

assumption does not hold in many cases. For example, in

sentiment classiﬁcation problem, a word may have different

meaning tendencies in different domains. This phenomenon

is als o called context feature bias [4]. To solve this problem,

some studies further adapt the conditional distributions.

Heterogeneous transfer learning refers to t he knowledge

transfer process in the situation that the domains have

different feature space. In a ddition to distribution adapta-

tion, heterogeneous transfer learning requires feature space

adaptation [4], which makes it more complicated than ho-

mogeneous transfer learning.

The survey aims to give readers a comprehensive un-

derstanding about transfer learning from the perspectives

of data and model. The mechanisms and the strategies of

the transfer learning approaches are introduced to make

readers grasp how the approaches work. And a number of

the existing transfer learning researches are connected and

systematized. Speciﬁcally, over forty representative transfer

learning approaches are introduced. Besides, we conduct

experiments to demonstrate on which dataset a transfer

learning model performs well.

In this survey, we focus more on homogeneous transfer

learning. Some interesting transfer learning topics are not

covered in this survey, such as reinforcement transfer learn-

ing [5], lifelong transfer learning [6], and online transfer

learning [7]. The rest of this survey are organized into seven

sections. Section 2 cla rify the difference between transfer

learning and other related machine learning techniques.

Section 3 introduces the notations used in this survey and

the deﬁnitions about transfer learning. Sections 4 and 5

interpret transfer learning approaches from the dat a and the

model pers pectives, respectively. Section 6 introduces some

applications of transfer learning. Experiments are conducted

and the results are provided in Section 7. The last section

concludes this survey. The main contributions of this survey

are sum marized below.

• Over forty representative transfer learning approaches

are introduced and sum m arized, which can give read-

ers a comprehensive overview about transfer learning.

• We conduct experiments to compare different trans-

fer learning approaches. The performance of twent y

different approaches is displayed intuitiv e ly and then

analyzed, which may be instructive and helpful for the

readers to select the appropriate ones in practice.

2 RELATED WORK

Some areas related t o transfer learning are introduced. The

connections and the difference between them and transfer

learning are clariﬁed.

Semi-Supervised Learning [8]: Semi-supervise d learning

is a kind of machine learning tasks and methods, which

lies between supervis e d learning (with comp le tely labeled

instances) and unsupervised learning (without any labeled

instances). Typically, a semi-supervised ta sk utilizes abun-

dant unlabeled instances combined with a limited number

of labeled instances to tra in a learner. Semi-supervised

learning relaxes the dependence on labeled instances, and

thus reduces the expensive labeling cos ts. Note that, in

semi-supervised learning, both t he labeled and unlabele d

instances are drawn from the same distribution. In contrast,

in transfer learning, the data distributions of the source and

the target domains are different.

In transfer learning, the concept that whether label in-

formation is available is ambiguous because both the source

and the target domains can be involve d. Therefore, semi-

supervised trans fer learning is a controversial term. It is

worth mentioning that many transfer learning approaches

absorb t he technology of semi-supervised learning. The key

assumptions in semi-supervised learning, i.e ., smoothness,

cluster, and manifold assumptions , are also made use of in

transfer learning.

Multi-View Learning [9]: Multi-view learning focuses on

the machine learning problems with multi-view data. A

view represents a distinct feature set. An intuitive example

about m ultiple views is that a video object can be described

from two different viewpoints, i.e., the image signal and

the audio signal. Brieﬂy, multi-view learning describes an

object from multiple views, which results in abundant in-

formation. By properly considering the information from all

views, the learner’s performance could be improved. There

are several strategie s adopted in multi-view learning such

as subspace learning, multi-kernel learning, a nd co-training

[10]. These strategies are also used in some transfer learning

approaches.

Multi-Task Learning [11]: The thought of multi-task learn-

ing is to jointly learn a group of related tasks. In this

way, the generalization of each task is enhanced. The main

difference between transfer learning and multi-task learning

is that the former transfer the knowledge contained in the

related domains, while the latter transfer the knowledge via

simultaneously learning some related tasks. In other words,

multi-task le arning pays equal attention to each task, while

transfer learning pays more attention to the target task than

to the source tasks.

Multi-task learning reinforces each task by making use of

the interconnections between tasks, taking into account both

the relevance and the difference between tasks. There are

some commons and associations between transfer learning

and multi-task learning. Both of them aim to improve the

performance of learners via knowledge transfer. Besides,

they adopt some similar strategies for constructing models,

such as feature transformation and parameter sharing. Note

that some existing studies utilize both the transfer learning

and the multi-task learning technologies [12].

3 OVERVIEW

In this section, the notations used in this survey are listed for

convenience. Besides, some deﬁnitions and categorizations

about transfer learning are introduced. Some related surveys

are also provided.

3.1 Notation

For convenience, a list of symbols and their deﬁnitions are

shown in Table 1. Besides, we use || ·|| to represent the norm

and superscript

to denote the transpose of a vector/matrix.

3.2 Deﬁnition

In this subsection, some deﬁnitions a bout transfer lea rning

are given. Before giving the deﬁnition of transfer learning,

let us review the deﬁnitions of a domain and a task.

Deﬁnition 1. (Domain) A dom ain D is composed of two parts,

i.e., a feature space X and a marginal distribution P (X). In

other words, D = {X , P (X)}. And the symbol X denotes

an instance set, which is deﬁned as X = {x|x

∈ X , i =

1, · · · , n}.

Note that the marginal distribution P (X) is generally an

invisible component, and it is hard to obtain its ex plicit

formulation.

Deﬁnition 2. (Task) A task T consists of a labe l space Y and a

decision function f , i.e., T = {Y, f}. The decision function

f is an implicit one, which is expected to be learned from the

sample data.

Some machine learning models actually output the predic-

tions of the conditional distributions of instances. In this

case, f(x

) = {P (y

)|y

∈ Y, k = 1, · · · , |Y|}.

TABLE 1

Notations.

Symbol Deﬁnition

n Number of instances

m Number of domains

D Domain

T Task

X Feature space

Y Label space

x Feature vector

y Label

X Instance set

Y Label set corresponding to X

S Source domain

T Target domain

L Labeled instances

U Unlabeled instances

H Reproducing kernel Hilbert space

θ Mapping/Coefﬁcient vector

α We ighting coefﬁcient

β Weighting coefﬁcient

λ Tradeoff parameter

δ Parameter/Error

b Bias

B Boundary parameter

N Iteration/Kernel number

f Decision function

L Loss function

η Scale parameter

G Graph

Φ Nonlinear mapping

σ Monotonically increasing function

Ω Structural risk

κ Kernel function

K Kernel matrix

H Centering matrix

C Covariance matrix

d Document

w Word

z Class variable

z Noise

D Discriminator

G Generator

S Function

M Orthonormal bases

Θ Model parameters

P Probability

E Expectation

Q Matrix variable

R Matrix variable

W Mapping matrix

In practice, a domain is often observed by a numbe r

of instances with/without the label information. For ex-

ample, a source domain D

corresponding to a source

task T

is usually observed via t he instance-label pairs,

i.e., D

= {(x, y)|x

∈ X

, y

∈ Y

, i = 1, · · · , n

};

an observation of the target domain usually consists of a

number of unlabe led instances and/or limited number of

labeled instances.

Deﬁnition 3. (Transfer Learning) Given some/an observation(s)

corresponding to m

∈ N

source domain(s) and task(s)

(i.e., {(D

, T

)|i = 1, · · · , m

}), and some/an observa-

tion(s) about m

∈ N

target domain(s) and task(s) (i.e.,

{(D

, T

)|j = 1, · · · , m

}), transfer learning utilizes

the knowledge implied in the source domain(s) to im prove

the performance of the learned decision functions f

(j =

1, · · · , m

) on the target domain(s).

The above deﬁnition, which covers the s ituation of multi-

source transfer learning, is an extension of the one presented

in the survey [13]. If m

equals 1, the scenario is called

single-source transfer learning. Otherwise, it is called multi-

source transfer learning. Besides, m

represents the number

of the transfer learning tasks. A few studies focus on the

setting that m

≥ 2 [14]. T he e x isting transfer learning

studies p ay more attention to the scenarios where m

= 1

(especially where m

= m

= 1). It is worth mentioning

that the observation of a domain or a task is a concept

with broad sense. An observation is often instantiated as the

labeled/unlabeled instances or a pre-learned model. A com-

mon scenario is that we have ab undant labeled instances or

have a well-trained model on t he source domain, and we

only have limited labeled target-domain instances. In this

case, the resources such as the instances and the model are

actually the observations, and the goal of transfer learning

is to learn a more accurate decision function on the target

domain.

3.3 Categorization of Transfer Learning

There are several categorization criteria of transfer learning.

For example, transfer learning problems can be divided

into three categories, i.e., transductive, inductive, and un-

supervised transfer learning [13]. The complete deﬁnitions

of thes e three categories are presented in [13]. These three

categories can be interpreted from a label-setting aspect.

Roughly speaking, transductive transfer learning refers to

the situation that the label information only comes from

the source domain. If the label information of the target-

domain instances is available, the scenario ca n be catego-

rized into inductive transfer learning. If the label informa-

tion is unknown for both the source and the target domains,

the situation is known as unsupervised transfer learning.

Another categorization is based on the consistency between

the source and the target feat ure spaces and label spaces.

If X

= X

and Y

= Y

, the scenario is termed as

homogeneous transfer learning. Otherwise, if X

6= X

or/and Y

6= Y

, the scenario is referred to as heteroge-

neous transfer learning.

According to the survey [13], the transfer learning ap-

proaches can be categorized into four groups: instance-

based, feature-based, parameter-based, and relational-based

approaches. Instance-based transfer learning approaches are

mainly based on the instance weighting strategy. Feature-

based approaches transform the original features to create a

new feature representation; they can be further divided into

two subcategories, i.e., asymmetric and symmet ric feature-

based transfer learning. Asymmetric approaches transform

the source features to match the target ones. In contrast,

symmetric approaches attempt to ﬁnd a common latent

feature space and then transform both the s ou rce and

the target features into a new feature representation. The

parameter-based transfer learning approaches transfer the

knowledge via t he models or the parameters. Rela tional-

based transfer learning mainly focuses on the problems

in relational domains . This typ e of approach transfers the

logical relationship or rules learned in the source domain to

the target domain. For better understanding, Fig. 1 shows

the above-mentioned categorizations of transfer learning.

Transfer Learning

Problem Categorization

Solution Categorization

Homogeneous Transfer Learning

Heterogeneous Transfer Learning

Inductive Transfer Learning

Transductive Transfer Learning

Unsupervised Transfer Learning

Instance-Based Approach

Feature-Based Approach

Parameter-Based Approach

Relational-Based Approach

Symmetric Transformation

Asymmetric Transformation

Label-Setting-Based

Categorization

Space-Setting-Based

Categorization

Fig. 1. Categorizations of transfer learning.

Some surveys are provided for the readers who want

to have a more complete understanding of this ﬁeld.

The survey by Pan and Yang [13], which is a pioneering

work, categorizes tra nsfer learning and reviews the research

progress before 2010. The survey by Weiss et al. introduces

and summarizes a number of homogeneous and hetero-

geneous transfer learning approaches [1]. Heterogeneous

transfer learning is specially reviewed in the survey by Day

and Khoshgoftaar [4]. Some surveys review the literatures

related to speciﬁc themes such as reinforcement learning

[5], computational intelligence [15], and deep learning [16],

[17]. Besides, a number of surveys focus on the speciﬁc

application scenarios including activity recognition [18], vi-

sual categorization [19], collaborative recommendation [20],

computer vision [17], and sentiment analysis [21].

In the next two sections, transfer learning approaches are

interpreted from the data and the model perspectives.

4 DATA -BASED INTERPRETATION

Many transfer learning approaches, especially the data-

based approaches, focus on transferring the knowledge via

the adjustment and the transformation of the data. Fig. 2

shows the strategies and the objectives of the approaches

from the data perspective. As shown in Fig. 2, space adap-

tation is one of the objectives. This objective should be sat-

isﬁed mostly in heterogeneous transfer learning scenarios.

In this survey, we focus more on the homogeneous transfer

learning, and the main objective in this scenario is to re-

duce t he distribution difference between the source-domain

and the target-domain instances. Besides, some advanced

approaches may attempt to preserve the data properties

in the domain a daptation process. There are generally two

strategies to realize the objectives from the data perspective,

i.e., instance weighting and feature transformation. In this

section, some related transfer learning a pproaches are intro-

duced in proper order according to the strategies shown in

Fig. 2.

4.1 Instance Weighting Strategy

Let u s ﬁrst consider a simple scenario in which a large

number of labeled source-domain and a limit e d numb e r

of target-domain instances a re availab le ; domains differ

only in marginal distributions, i. e ., P

(X) 6= P

(X) and

(Y |X) = P

(Y |X). In this scenario, it is nature to con-

sider adapting the marginal distributions. A simple idea is

to assign weights to the source-domain insta nces in the loss

function. The weighting strategy is based on the following

equation [2]:

(x,y)∼P

T [L(x, y; f )] = E

(x,y)∼P



(x, y)

L(x, y; f)



= E

(x,y)∼P



(x)

L(x, y; f)



Therefore, the general objective function of a learning task

can be written as [2]:

min

i=1



f(x

), y



+ Ω(f ),

where β

(i = 1, 2, · · · , n

) is the weighting parameter.

The theoretical value of β

is equal to P

)/P

However, this ratio is generally unknown and is difﬁcult

to be obtained by using the traditional methods.

Kernel Mean Matching (KMM) [2], which is proposed by

Huang et al., resolves the estimation problem of the above

unknown ratios by matching the means between the source-

domain and the target-domain inst ances in a Reproducing

Kernel Hilbert Space (RKHS), i.e.,

arg min

∈[0,B]



i=1

Φ(x

) −

j=1

Φ(x

)



s.t. |

i=1

− 1| ≤ δ,

where δ is a small param e ter, and B is a parameter for con-

straint. The above optim ization problem can be converted

into a quadratic programming problem by exp anding and

using the kernel trick. This approach to es timating the

ratios of distributions can be easily incorporated into many

existing algorithms for classiﬁca tion or regression. Once

the weight β

is obtained, a lea rner can be trained on the

weighted source-domain instances.

There are some other stu dies attempting to estimate

the weights. For example, Sugiyama et al. proposed an

Covariance

...

Geometric Structure

Cluster Structure

...

Data-Based Interpretation

Objective

Measurement

Type

Distribution Adaptation

Data Property

Preservation/Adjustment

Marginal Distribution Adaptation

Conditional Distribution Adaptation

Kullback-Leibler Divergence

Maximum Mean Discrepancy

Jensen-Shannon Divergence

...

Statistical Property

Strategy

Instance Weighting

Feature Transformation

Feature Clustering

Feature Alignment

Feature Augmentation

Feature Reduction

Joint Distribution Adaptation

Feature Replication

...

Feature Encoding

Bregman Divergence

Feature Stacking

Mean

Manifold Structure

...

Feature Mapping

Estimation Method

...

Heuristic Method

Space Adaptation

Feature Space Adaptation

Label Space Adaptation

Spectral Feature Alignment

Subspace Feature Alignment

...

Statistic Feature Alignment

Feature Selection

Fig. 2. Strategies and the objectives of the transfer learning approaches from the data perspective.

approach termed Kullback-Leibler Importance Estimation

Procedure (KLIEP) [3]. KLIEP depends on the minimiza tion

of the Kullback-Leibler (KL) divergence; it also incorporates

a built-in model s e lect ion procedure, which makes this

approach more useful and reliable. Based on the s tudies

of weight est im ation, some inst ance-based transfer learning

frameworks or algorithms are proposed. For example, Sun

et al. proposed a multi-source transfer learning framework

termed 2-Stage Weighting Framework for Multi-Source Do-

main Adaptation (2SW-MDA) [22].

1. Instance Weighting: In the ﬁrst stage, the source-domain

instances are assigned with weights to reduce the

marginal distribution difference, which is s imilar to

KMM.

2. Domain Weighting: In the second stage, weights are

assigned to each source domain for reducing the condi-

tional dist ribution difference based on the sm oothnes s

assumption [23].

The source-domain instances are reweighted based on the

instance weights and the domain weights. These reweighted

instances and the labeled target-domain instances are used

to train the target classiﬁer. By adopting the 2-stage weight-

ing operations, 2SW-MDA can reduce both the marginal and

the conditional difference.

In addition to directly estimating the weighting param-

eters, adjusting weights iteratively is als o effective. The

key is to design a mechanism to decrease the weights of

the instances which have negative effects on the target

learner. A representative work is TrAdaBoost [24], which

is a framework proposed by Dai et al. This framework is

an extension of AdaBoost [25]. AdaBoost is an effective

boosting algorithm designed for traditional machine learn-

ing tasks. In each iteration of AdaBoost, a learner is trained

on the instances with updated weights, which results in

a weak classiﬁer. The weighting mechanism of instances

ensures that the instances with incorrect classiﬁcation are

given more attention. Finally, the resultant weak classiﬁers

are combined to form a strong clas siﬁer. TrAdaBoost ex-

tends the AdaBoost to the transfer learning scenario; a new

weighting m e chanism is designed to reduce the impact of

the distribution difference. Speciﬁcally, in TrAdaBoost, the

labeled source-domain and labeled target-domain instances

are combined as a whole, i.e., a training set, to train the

weak classiﬁer. The weighting mechanism is different for

the source-domain and the target-domain instances. In each

iteration, a temporary variable

δ, which me asures the clas si-

ﬁcation error rate on the labeled ta rget-domain instances, is

calculated. Then, the weights of the target-domain instances

are updated based on

δ and the individual classiﬁcation

results, while the weights of the source-domain instances are

updated based on a designed constant and the individual

classiﬁcation results. For better understanding, the formulas

used in the k-th iteration (k = 1, · · · , N ) for updating the

weights are presented repeatedly as follows [24]:

k,i

= β

k−1,i

(1 +

2 ln n

/N )

−

)−y

(i = 1, · · · , n

k,j

= β

k−1,j

(

/(1 −

))

−

)−y

(j = 1, · · · , n

Note that each iteration forms a new weak classiﬁer. The

ﬁnal classiﬁer is constructed by combining and ensembling

half the number of the newly resultant weak classiﬁers

through voting scheme.

Some studies further extend TrAdaBoost. The work by

Ya o and Doretto [26] proposes a Multi-Source TrAdaBoost

(MsTrAdaBoost) algorithm. This approach is designed for

评论收藏

内容反馈

alto1394

2021-08-05

A Comprehensive Survey on Transfer Learning Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Senior Member, IEEE, Hui Xiong, Senior Member, IEEE, and Qing He
weixin_43288777

2020-01-05

论文完整，没有积分的伙计，可以到arxiv下载

syp_net
上传者
2020-01-05

谢谢。