迁移学习入门级综述文章：ASurveyonTransferLearning

5星 · 超过95%的资源需积分: 15 21 浏览量 2018-01-18 11:49:55 上传评论 2 收藏 2.41MB PDF 举报

迁移学习是一种机器学习方法，其核心思想是将一个领域（源域）学到的知识应用到另一个领域（目标任务域）的学习中去，即使这两个领域的数据特征空间或分布存在差异。这一点在传统的机器学习算法中是一个重要的假设，即训练数据和未来数据必须处于相同的特征空间并具有相同的分布。然而，在许多现实世界的应用中，这个假设并不成立。例如，有时我们在感兴趣的某一领域有一个分类任务，但只有在另一个有足够训练数据的不同领域，这些数据可能具有不同的特征空间或不同的数据分布。在这种情况下，如果能够成功地进行知识转移，将大大提升学习性能，避免大量的昂贵的数据标注工作。迁移学习在过去几年中作为解决这个问题的新学习框架出现。本文综述的主要焦点是分类、回归和聚类问题的迁移学习的当前进展进行分类和评述。文章讨论了迁移学习与领域自适应、多任务学习、样本选择偏差以及协变量偏移等其他相关机器学习技术之间的关系。同时，文章也探讨了迁移学习研究中的一些潜在未来问题。数据挖掘和机器学习技术在很多知识工程领域已经取得了显著的成功，包括分类、回归和聚类问题。但许多机器学习方法只有在训练数据和测试数据来自同一个特征空间和分布的条件下才工作得很好。当数据分布发生变化时，大多数统计模型需要使用新收集的训练数据从头开始重建。在许多实际应用中，重新收集所需训练数据和重建模型成本高昂或者不可能实现。在这种情况下，知识转移或迁移学习将是非常理想的。知识工程中有很多例子表明迁移学习是可行的。例如，假设我们有一个来自特定领域的分类任务，但是这个领域的训练数据非常有限。如果另一个领域（可能是完全不同的领域）拥有大量的训练数据，而这两个领域的数据存在分布差异，那么我们可以将那个领域的知识转移到目标任务领域，从而避免了从头开始标注大量数据的高昂成本。这种情况下，迁移学习可以有效减少收集训练数据的需要和努力。迁移学习的常见应用场景包括但不限于：自然语言处理、计算机视觉、推荐系统和生物信息学等领域。在自然语言处理中，迁移学习被用来在多种语言之间传递知识；在计算机视觉中，迁移学习帮助模型在有限标记数据条件下通过使用在大数据集上训练好的模型进行图像识别；在推荐系统中，迁移学习用于改善跨域推荐的准确性；在生物信息学中，迁移学习能够帮助分析不同种类的生物数据，发现生物标记物，或者预测药物作用。在迁移学习的研究中，领域自适应、多任务学习和样本选择偏差处理等是重要的主题。领域自适应解决的是源领域和目标任务领域数据分布不一致的问题，而多任务学习研究如何同时学习多个相关任务以提升各任务的性能。样本选择偏差关注的是训练数据与真实数据分布不匹配的问题。协变量偏移则是指在源领域和目标任务领域中，输入变量和输出变量之间关系发生改变的情况。迁移学习的技术和研究还在不断发展中，它在解决实际问题中具有巨大的潜力，尤其是在数据稀缺或者需要跨领域解决问题的场景中。研究者们还在探索更高效的知识传递机制、如何更好地整合多源异构数据、解决领域适应问题的算法优化，以及迁移学习与其他机器学习方法的融合等问题。随着深度学习的出现和大数据时代的到来，迁移学习在将来还会有更多的应用空间和研究方向。

资源推荐

资源详情

资源评论

A Survey on Transfer Learning

Sinno Jialin Pan and Qiang Yang, Fellow, IEEE

Abstract—A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the

same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For

example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain

of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge

transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In

recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing

and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we

discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask

learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning

research.

Index Terms—Transfer learning, survey, machine learning, data mining.

1INTRODUCTION

ATA mining and machine learning technologies have

already achieved significant success in many knowl-

edge engineering areas including classification, regression,

and clustering (e.g., [1], [2]). However, many machine

learning methods work well only under a common assump-

tion: the training and test data are drawn from the same

feature space and the same distribution. When the distribu-

tion changes, most statistical models need to be rebuilt from

scratch using newly collected training data. In many real-

world applications, it is expensive or impossible to recollect

the needed training data and rebuild the models. It would be

nice to reduce the need and effort to recollect the training

data. In such cases, knowledge transfer or transfer learning

between task domains would be desirable.

Many examples in knowledge engineering can be found

where transfer learning can truly be beneficial. One

example is Web-document classification [3], [4], [5], where

our goal is to classify a given Web document into several

predefined categories. As an example, in the area of Web-

document classification (see, e.g., [6]), the labeled examples

may be the university webpages that are associated with

category information obtained through previous manual-

labeling efforts. For a classification task on a newly created

website where the data features or data distributions may

be different, there may be a lack of labeled training data. As

a result, we may not be able to directly apply the webpage

classifiers learned on the university website to the new

website. In such cases, it would be helpful if we could

transfer the classification knowledge into the new domain.

The need for transfer learning may arise when the data

can be easily outdated. In this case, the labeled data

obtained in one time period may not follow the same

distribution in a later time period. For example, in indoor

WiFi localization problems, which aims to detect a user’s

current location based on previously collected WiFi data, it

is very expensive to calibrate WiFi data for building

localization models in a large-scale environment, because

a user needs to label a large collection of WiFi signal data at

each location. However, the WiFi signal-strength values

may be a function of time, device, or other dynamic factors.

A model trained in one time period or on one device may

cause the performance for location estimation in another

time period or on another device to be reduced. To reduce

the recalibration effort, we might wish to adapt the

localization model trained in one time period (the source

domain) for a new time period (the target domain), or to

adapt the localization model trained on a mobile device (the

source domain) for a new mobile device (the target

domain), as done in [7].

As a third example, consider the problem of sentiment

classification, where our task is to automatically classify the

reviews on a product, such as a brand of camera, into

positive and negative views. For this classification task, we

need to first collect many reviews of the product and

annotate them. We would then train a classifier on the

reviews with their corresponding labels. Since the distribu-

tion of review data among different types of products can be

very different, to maintain good classification performance,

we need to collect a large amount of labeled data in order to

train the review-classification models for each product.

However, this data-labeling process can be very expensive to

do. To reduce the effort for annotating reviews for various

products, we may want to adapt a classification model that is

trained on some products to help learn classification models

for some other products. In such cases, transfer learning can

save a significant amount of labeling effort [8].

In this survey paper, we give a comprehensive overview

of transfer learning for classification, regression, and cluster-

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010 1345

. The authors are with the Department of Computer Science and

Engineering, Hong Kong University of Science and Technology,

Clearwater Bay, Kowloon, Hong Kong.

E-mail: {sinnopan, qyang}@cse.ust.hk.

Manuscript received 13 Nov. 2008; revised 29 May 2009; accepted 13 July

2009; published online 12 Oct. 2009.

Recommended for acceptance by C. Clifton.

For information on obtaining reprints of this article, please send e-mail to:

tkde@computer.org, and reference IEEECS Log Number TKDE-2008-11-0600.

Digital Object Identifier no. 10.1109/TKDE.2009.191.

1041-4347/10/$26.00 ß 2010 IEEE Published by the IEEE Computer Society

ing developed in machine learning and data mining areas.

There has been a large amount of work on transfer learning

for reinforcement learning in the machine learning literature

(e.g., [9], [10]). However, in this paper, we only focus on

transfer learning for classification, regression, and clustering

problems that are related more closely to data mining tasks.

By doing the survey, we hope to provide a useful resource for

the data mining and machine learning community.

The rest of the survey is organized as follows: In the next

four sections, we first give a general overview and define

some notations we will use later. We, then, briefly survey the

history of transfer learning, give a unified definition of

transfer learning and categorize transfer learning into three

different settings (given in Table 2 and Fig. 2). For each

setting, we review different approaches, given in Table 3 in

detail. After that, in Section 6, we review some current

research on the topic of “negative transfer,” which happens

when knowledge transfer has a negative impact on target

learning. In Section 7, we introduce some successful

applications of transfer learning and list some published

data sets and software toolkits for transfer learning research.

Finally, we conclude the paper with a discussion of future

works in Section 8.

2OVERVIEW

2.1 A Brief History of Transfer Learning

Traditional data mining and machine learning algorithms

make predictions on the future data using statistical models

that are trained on previously collected labeled or unlabeled

training data [11], [12], [13]. Semisupervised classification

[14], [15], [16], [17] addresses the problem that the labeled

data may be too few to build a good classifier, by making use

of a large amount of unlabeled data and a small amount of

labeled data. Variations of supervised and semisupervised

learning for imperfect data sets have been studied; for

example, Zhu and Wu [18] have studied how to deal with the

noisy class-label problems. Yang et al. considered cost-

sensitive learning [19] when additional tests can be made to

future samples. Nevertheless, most of them assume that the

distributions of the labeled and unlabeled data are the same.

Transfer learning, in contrast, allows the domains, tasks, and

distributions used in training and testing to be different. In

the real world, we observe many examples of transfer

learning. For example, we may find that learning to

recognize apples might help to recognize pears. Similarly,

learning to play the electronic organ may help facilitate

learning the piano. The study of Transfer learning is motivated

by the fact that people can intelligently apply knowledge

learned previously to solve new problems faster or with

better solutions. The fundamental motivation for Transfer

learning in the field of machine learning was discussed in a

NIPS-95 workshop on “Learning to Learn,”

which focused

on the need for lifelong machine learning methods that retain

and reuse previously learned knowledge.

Research on transfer learning has attracted more and

more attention since 1995 in different names: learning to

learn, life-long learning, knowledge transfer, inductive

transfer, multitask learning, knowledge consolidation,

context-sensitive learning, knowledge-based inductive bias,

metalearning, and incremental/cumulative learning [20].

Among these, a closely related learning technique to

transfer learning is the multitask learning framework [21],

which tries to learn multiple tasks simultaneously even

when they are different. A typical approach for multitask

learning is to uncover the common (latent) features that can

benefit each individual task.

In 2005, the Broad Agency Announcement (BAA) 05-29

of Defense Advanced Research Projects Agency (DARPA)’s

Information Processing Technology Office (IPTO)

gave a

new mission of transfer learning: the ability of a system to

recognize and apply knowledge and skills learne d in

previous tasks to novel tasks. In this definition, transfer

learning aims to extract the knowledge from one or more

source tasks and applies the knowledge to a target task.In

contrast to multitask learning, rather than learning all of the

source and target tasks simultaneously, transfer learning

cares most about the target task. The roles of the source and

target tasks are no longer symmetric in transfer learning.

Fig. 1 shows the difference between the learning processes

of traditional and transfer learning techniques. As we can

see, traditional machine learning techniques try to learn each

task from scratch, while transfer learning techniques try to

transfer the knowledge from some previous tasks to a target

task when the latter has fewer high-quality training data.

Today, transfer learning methods appear in several top

venues, most notably in data mining (ACM KDD, IEEE

ICDM, and PKDD, for example), machine learning (ICML,

NIPS, ECML, AAAI, and IJCAI, for example) and applica-

tions of machine learning and data mining (ACM SIGIR,

WWW, and ACL, for example).

Before we give different

categorizations of transfer learning, we first describe the

notations used in this paper.

2.2 Notations and Definitions

In this section, we introduce some notations and definitions

that are used in this survey. First of all, we give the

definitions of a “domain” and a “task,” respectively.

In this survey, a domain D consists of two components: a

feature space X and a marginal probability distribution P ðXÞ,

where X ¼fx

; ...;x

g2X. For example, if our learning task

1346 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010

Fig. 1. Different learning processes between (a) traditional machine

learning and (b) transfer learning.

1. http://socrates.acadiau.ca/courses/comp/dsilver/NIPS9 5_LTL/

transfer.workshop.1995.html.

2. http://www.darpa.mil/ipto/programs/tl/tl.asp.

3. We summarize a list of conferences and workshops where transfer

learning papers appear in these few years in the following webpage for

reference, http://www.cse.ust.hk/~sinnopan/conferenceTL.htm.

is document classification, and each term is taken as a binary

feature, then X is the space of all term vectors, x

is the ith term

vector corresponding to some documents, and X is a

particular learning sample. In general, if two domains are

different, then they may have different feature spaces or

different marginal probability distributions.

Given a specific domain, D¼fX;PðXÞg,atask consists

of two components: a label space Y and an objective

predictive function fðÞ (denoted by T¼fY;fðÞg), which is

not observed but can be learned from the training data,

which consist of pairs fx

g, where x

2 X and y

2Y. The

function fðÞ can be used to predict the corresponding label,

fðxÞ, of a new instance x. From a probabilistic viewpoint,

fðxÞ can be written as PðyjxÞ. In our document classification

example, Y is the set of all labels, which is True, False for a

binary classification task, and y

is “True” or “False.”

For simplicity, in this survey, we only consider the case

where there is one source domain D

, and one target domain,

, as this is by far the most popular of the research works in

the literature. More specifically, we denote the source domain

data as D

¼fðx

Þ; ...; ðx

Þg, where x

the data instance and y

is the corresponding class

label. In our document classification example, D

can be a set

of term vectors together with their associated true or false

class labels. Similarly, we denote the target-domain data as

¼fðx

Þ; ...; ðx

Þg, where the input x

is in

and y

is the corresponding output. In most cases,

0  n

 n

We now give a unified definition of transfer learning.

Definition 1 (Transfer Learning). Given a source domain D

and learning task T

, a target domain D

and learning task

, transfer learning aims to help improve the learning of the

target predictive function f

ðÞ in D

using the knowledge in

and T

, where D

6¼D

,orT

6¼T

In the above definition, a domain is a pair D¼fX;PðXÞg.

Thus, the condition D

6¼D

implies that either X

6¼X

ðXÞ 6¼ P

ðXÞ. For example, in our document classification

example, this means that between a source document set and

a target document set, either the term features are different

between the two sets (e.g., they use different languages), or

their marginal distributions are different.

Similarly, a task is defined as a pair T¼fY;PðY jXÞg.

Thus, the condition T

6¼T

implies that either Y

6¼Y

P ðY

Þ 6¼ P ðY

Þ. When the target and source domains

are the same, i.e., D

¼D

, and their learning tasks are the

same, i.e., T

¼T

, the learning problem becomes a

traditional machine learning problem. When the domains

are different, then either 1) the feature spaces between the

domains are different, i.e., X

6¼X

, or 2) the feature spaces

between the domains are the same but the marginal

probability distributions between domain data are different;

i.e., P ðX

Þ 6¼ P ðX

Þ, where X

and X

.Asan

example, in our document classification example, case 1

corresponds to when the two sets of documents are

described in different languages, and case 2 may correspond

to when the source domain documents and the target-

domain documents focus on different topics.

Given specific domains D

and D

, when the learning

tasks T

and T

are different, then either 1) the label

spaces between the domains are different, i.e., Y

6¼Y

,or

2) the conditional probability distributions between the

domains are different; i.e., P ðY

Þ 6¼ P ðY

Þ, where

and Y

. In our document classification

example, case 1 corresponds to the situation where source

domain has binary document classes, whereas the target

domain has 10 classes to classify the documents to. Case 2

corresponds to the situation where the source and target

documents are very unbalanced in terms of the user-

defined classes.

In addition, when there exists some relationship, explicit

or implicit, between the feature spaces of the two domains,

we say that the source and target domains are related.

2.3 A Categorization of

Transfer Learning Techniques

In transfer learning, we have the following three main

research issues: 1) what to transfer, 2) how to transfer, and

3) when to transfer.

“What to transfer” asks which part of knowledge can be

transferred across domains or tasks. Some knowledge is

specific for individual domains or tasks, and some knowl-

edge may be common between different domains such that

they may help improve performance for the target domain or

task. After discovering which knowledge can be transferred,

learning algorithms need to be developed to transfer the

knowledge, which corresponds to the “how to transfer” issue.

“When to transfer” asks in which situations, transferring

skills should be done. Likewise, we are interested in

knowing in which situations, knowledge should not be

transferred. In some situations, when the source domain

and target domain are not related to each other, brute-force

transfer may be unsuccessful. In the worst case, it may

even hurt the performance of learning in the target

domain, a situation which is often referred to as negative

transfer. Most current work on transfer learning focuses on

“What to transfer” and “How to transfer,” by implicitly

assuming that the source and target domains be related to

each other. However, how to avoid negative transfer is an

important open issue that is attracting more and more

attention in the future.

Based on the definition of transfer learning, we summarize

the relationship between traditional machine learning and

various transfer learning settings in Table 1, where we

PAN AND YANG: A SURVEY ON TRANSFER LEARNING 1347

TABLE 1

Relationship between Traditional Machine Learning and Various Transfer Learning Settings

剩余14页未读，继续阅读

评论收藏

内容反馈

qq_26493017

2019-04-21

原版论文,入坑迁移学习.

链上物联网

粉丝: 106
资源: 6

迁移学习入门级综述文章：A Survey on Transfer Learning

最新资源

迁移学习入门级综述文章：A Survey on Transfer Learning

迁移学习综述

综述笔记—A Survey on Transfer Learning

A survey on transfer learniing.pdf

A survey of transfer learning.pdf

（原文+译文）A Survey on Transfer Learning_Pan and Yang_2010.pdf

A Comprehensive Survey on Transfer Learning.pdf

迁移学习研究综述.pdf

最新迁移学习综述论文（A Comprehensive Survey on Transfer Learning）- 中科院.zip

迁移学习综述a survey on transfer learning的整理下载

A Survey on Transfer Learning知识迁移的总结.zip

A review on transfer learning approaches in brain–computer interface.pdf

迁移学习Python实战 Hands on transfer learning with Python

迁移学习教程，Transfer learning介绍，TL调查

tkde_transfer_learning迁移学习综述

迁移学习入门手册transfer_learning_tutorial_wjd.pdf

(2021PIEEE) TransferLearning_关于迁移学习的最新和最权威综述_

Metric Learning A Survey

A Survey on Transfer Learnin Sinno Jialin Pan and Qiang Yang pdf

迁移学习相关论文

A Survey of Zero-Shot Learning(零样本学习综述)

元学习综述，meta learning，A Survey of Zero-Shot Learning + poster

A SURVEY ON DEEP LEARNING-BASED ARCHITECTUR.pdf

Hands On Transfer Learning with Py Implement Advanced DL and NN Models Using T,K

transferlearning, Everything about Transfer Learning and Domain Adaptation--迁移学习.zip

Hands On Transfer Learning with Python

Everything about Transfer Learning and Domain Adaptation--迁移学习

迁移学习入门

站在巨人的肩膀上, 迁移学习 Transfer Learning

最新资源