迁移学习入门级综述文章：A Survey on Transfer Learning

迁移学习入门级综述文章：A Survey on Transfer Learning。分享给大家~
PAN AND YANG: A SURVEY ON TRANSFER LEARNING 1347 TABLE 1 Relationship between Traditional Machine Learning and various Transfer Learning Settings Learning Settings Source and Target Domains Source and Target Tasks Traditional machine learning e same the same Inductive Transfer Learning the same different but related Transfer Learning Unsupervised Transfer Learning different but related different but related transductive Transfer Learning different but related the same is document classification, and each term is taken as a binary probability distributions between domain data are different feature, then 2 is the space of all term vectors, I; is the ith term i. e. P(X5)+P(Xr), where Xs, E rs and Xr; e AT. As an vector corresponding to some documents, and x is a example, in our document classification example, case 1 particular learning sample. In general, if two domains are corresponds to when the two sets of documents are different, then they may have different feature spaces or described in different languages, and case 2 may correspond different marginal probability distributions to when the source domain documents and the target Given a specific domain, D=x, P(X)), a task consists domain documents focus on different topics of two components: a label space y and an objective Given specific domains Ds and Dr, when the learning predictive function / ((denoted by T=0, /(1), which is tasks I s and Tr are different, then either 1)the labe not observed but can be learned from the training data, spaces between the domains are different, i. e, Vs+yr, or which consist of pairs ai, y:, where i E X and y;E ). The 2) the conditional probability distributions between the function/()can be used to predict the corresponding label, domains are different; i.e, P(Ys Xs) P(YT Xr), where r(r), of a new instance a From a probabilistic viewpoint, Ys E ys and Yr E r. In our document classification )can be written as P(vlr). In our document classification example, case 1 corresponds to the situation where source f(r) example, y is the set of all labels, which is True, False for a domain has binary document classes, whereas the target binary classification task, and y is "True or"alse domain has 10 classes to classify the documents to Case 2 For simplicity, in this survey, we only consider the case corresponds to the situation where the source and target where there is one source domain ds and one target domain documents are very unbalanced in terms of the user DT, as this is by far the most popular of the research works in defined classes the literature. More specifically we denote the source domain In addition, when there exists some relationship, explicit {( (CSis 3Sne), where s, E&'s is or implicit, between the feature spaces of the two domains the data instance and ys, E ys is the corresponding class we say that the source and target domains are related label. In our document classification example, Ds can be a set 2. 3 A Categorization of of term vectors together with their associated true or false Transfer Learning Techniques Dr=f(an,,yr),.,(Tmr UIm ) T and yT, E y'r is the corresponding output. In most cases, 3) when to transfer 0<m<n What to transfer"asks which part of knowledge can be We now give a unified definition of transfer learning. transferred across domains or tasks. Some knowledge is Definition 1 Transfer learning ng). Given a source domain Ds specific for individual domains or tasks, and some knowl and learning task T s, a target domain Dr and learning task edge may be common between different domains such that TT, transfer learning aims to help improve the learning of the they may help improve performance for the target domain or target predictive function )in Dr using the knowledge in task. After discovering which knowledge can be transferred, Ds and T s, where Ds≠Dr,orTs≠Tr learning algorithms need to be developed to transfer the knowledge, which corresponds to the"how to transfer"issue In the above definition, a domain is a pair D=1, P(XI When to transfer"asks in which situations, transferring Thus, the condition Ds# Dr implies that either &ts f xTo skills should be done. Likewise, we are interested Ps(X)+Pr(X). For example, in our document classification nowing in which situations knowledge should not be transferred. In some situations, when the source domain example, this means that between a source document set and a target document set, either the term features are different transfer may be unsuccessful. In the worst case, it mapy and target domain are not related to each other, bruteforo between the two sets(e.g, they use different languages),or even hurt the performance of learning in the target their marginal distributions are different domain, a situation which is often referred to as negativ Similarly, a task is defined as a pair T=D, P(rXh fransfer. Most current work on transfer learning focuses on Thus, the condition !s+Ir implies that either ]sfyr or "What to transfer" and"How to transfer,"by implicitly P(YSIXs)+P(Yr Xr). When the target and source domains assuming that the source and target domains be related to are the same, i. e, Ds=DI, and their learning tasks are the each other. However, how to avoid negative transfer is an same,ieIs=TT, the learning problem becomes a important open issue that is attracting more and more traditional machine learning problem. When the domains attention in the future are different, then either 1)the feature spaces between the Based on the definition of transfer learning, we summarize domains are different, i. e. &s rr, or 2)the feature spaces the relationship between traditional machine learning and between the domains are the same but the marginal various transfer learning settings in Table 1, where we 1348 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. VOL 22. NO. 10. OCTOBER 2010 TABLE 2 Different Settings of Transfer Leaning Transfer Learning Settings Related Areas Source Domain Labels Target Domain Labels Tasks Inductive Transfer Learning Multitask learning Available Available egression Classification Selftaught Learning Unavailable Available Regression lassification Transductive Transfer Learning Domain Adaptation, Sample Available Unavailable Regression Selection bias. Covariate Shift Classification Unsupervised Transfer Learnin Unavailable Unavailable Clustering Dimensionality Reduction categorize transfer learning under three subsetting, inductive distributions of the input data are different, transfer learning transductive transfer learning and unsuper P(Xs)≠P(Xr) vised transfer learning based on different situations between The latter case of the transductive transfe the source and target domains and tasks learning setting is related to domain adaptation 1. In the inductive transfer learning setting, the target task for knowledge transfer in text classification [23 is different from the source task no matter when the and sample selection bias [24] or covariate shift source and target domains are the same or not [25, whose assumptions are similar In this case, some labeled data in the target 3. Finally, in the unsupervised transfer learning setting, domain are required to induce an objective predictive similar to inductive transfer learning setting, the target model fr( for use in the target domain. In addition task is different from but related to the source task according to different situations of labeled and However, the unsupervised transfer learning focus on unlabeled data in the source domain we can further solving unsupervised learning tasks in the target categorize the inductive transfer learning setting into domain, such as clustering, dimensionality reduction two cases and density estimation 26, 27. In this case, there are no labeled data available in both source and target a. a lot of labeled data in the source domain are domains in training available. In this case, the inductive transfer The relationship between the different settings of learning setting is similar to the multitask learning transfer learning and the related areas are summarized in setting. However, the inductive transfer learning Table 2 and Fig. 2 setting only aims at achieving high performance Approaches to transfer learning in the above three the target task by transferring knowledge from different settings can be summarized into four cases based he source task while multitask learning tries to on"What to transfer. Table 3 shows hese four cases and learn the target and source task simultaneously. brief description. The first context can be referred to as b. No labeled data in the source domain are instancebased transfer learning(or instance transfer available. In this case, the inductive transfer approach6[28[29130,[31[24132]33,1341135 learning setting is similar to the selftaught which assumes that certain parts of the data in the source learning setting, which is first proposed by Raina domain can be reused for learning in the target domain by et al. [22]. In the selftaught learning setting, the label spaces between the source and target reweighting. Instance reweighting and importance sampling are two major techniques in this context domains may be different, which implies the A second case can be referred to as featurerepresenta side information of the source domain cannot be tiontransfer approach [22 136,[37,[ ,[39 ,181,1401 used directly. Thus, it's similar to the inductive [41],142,1431,[44]. The intuitive idea behind this case is to transfer learning setting where the labeled data learn a"good"feature representation for the target domain in the source domain are unavailable In this case the knowledge used to transfer across domains 2. In the transductive fransfer learning setting, the source and target tasks are the same, while the source and is encoded into the learned feature representation with the target domains are different new feature representation, the performance of the target In this situation, no labeled data in the target task is expected to improve significantly domain are available while a lot of labeled data in A third case can be referred to as parametertransfer the source domain are available. In addition, approach[45], 1461, [471,[48[49], which assumes that the according to different situations between the source source tasks and the target tasks share some parameters or and target domains, we can further categorize the prior distributions of the hyperparameters of the models.The transductive transfer learning setting into two cases transferred knowledge is encoded into the shared para meters or priors. Thus, by discovering the shared parameters a. The feature spaces between the source and or priors, knowledge can be transferred across tasks target domains are different, s+r Finally the last case can be referred to as the relational b. The feature spaces between domains are the knowledgetransfer problem [50], which deals with transfer same,xs=T, but the marginal probability learning for relational domains. The basic assumption PAN AND YANG: A SURVEY ON TRANSFER LEARNING 1349 Selftaught Case 1 Learning No labeled data in a source domain Inductive transfer Learning Labeled data are available eled data are available in a source domain in a target domain Multitask Case 2 target tasks are h Learning Transter Labeled dala are Lc earning available only il a Transductive source domain different Domain Transfer learning domains but Adaptation single task No labeled data in oth Assumption: single target domain domain and single task Unsupervised Sample selection Bias Transfer learning /Covariance Shift Fig. 2. An overview of different settings of transfer behind this context is that some relationship among the data fr( in Dr using the knowledge in Ds and T s, where in the source and target domains is similar. Thus, the Ts/Ir knowledge to be transferred is the relationship among the data. Recently, statistical relational learning techniques Based on the above definition of the inductive transfer dominate this context 51 521 learning setting, a few labeled data in the target domain are Table 4 shows the cases where the different approaches required as the training data to induce the target predictive are used for each transfer learning setting. We can see that function. As mentioned in Section 2.3, this setting has two the inductive transfer learning setting has been studied in cases: 1)labeled data in the source domain are available and many research works, while the unsupervised transfer 2)labeled data in the source domain are unavailable while learning setting is a relatively new research topic and only unlabeled data in the source domain are available. Most studied in the context of the featurerepresentationtrarisfer transfer learning approaches in this setting focus on the case. In addition, the featurerepresentationtransfer problem former case has been proposed to all three settings of transfer learnin ig However, the parametertransfer and the relationalknowvledge 3.1 Transferring Knowledge of Instances transfer approach are only studied in the inductive transfer The instancetransfer approach to the inductive transfe learning setting, which we discuss in detail below learning setting is intuitively appealing: although the source domain data cannot be reused directly there are certain 3 INDUCTIVE TRANSFER LEARNING parts of the data that can still be reused together with a few labeled data in the target domain. Definition 2 (Inductive Transfer Learning). Given a source Dai et al [6] proposed a boosting algorithm, Tr boost, domain Ds and a learning task Is, a target domain Dr which is an extension of the Ada boost algorithm, to address and a learning task IT, inductive transfer learning aims the inductive transfer learning problems. Tr Ada Boost assumes to help improve the learning of the target predictive function that the source and targetdomain data use exactly the same able 3 Different Approaches to Transfer Learning Transfer Learning Approaches Brief Description Instancetransfer To reweight some labeled data in the source domain for use in the target domain [6],[28:[29] 30],[31,[24],[32],[33,[34],[35] Featurepepresentationtransfer Find agood "feature representation that reduces diffcrence bclwcen the source and the target domains and the error of classification and regression models [22], [36].[37],[38],[39],[8 [40],[41],[42],[43],[44 Parametertransfer Discover shared parameters or priors between the source domain and target domain models, which can benefit for transfer learning [45][46],[47 ].[48],[49] Relationallnowledgetr'ansfer Build mapping of relational knowledge between the source domain and the target domains. Both domains are relational domains and i.i.d assumption is relaxed in each domain [50,51, [52 1350 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. VOL 22. NO. 10. OCTOBER 2010 TABLE 4 Different Approaches Used in Different Settings Inductive Transfer Learning Transductive Transfer Learning Unsupervised Transfer Learning Instancetransfer Featurerepresentationtr'ansfer Parametertransfer Relationalknowledgetransfer set of features and labels, but the distributions of the data in learning setting, the common features can be learned by the two domains are different. In addition, TrAda boost solving an optimization problem, given as follow assumes that due to the difference in distributions between the source and the target domains, some of the source domain data may be useful in learning for the target arg mIn ∑∑(m,(n,U2m1)+1412 domain but some of them may not and could even be harmful. tt attempts to iteratively reweight the source s.U∈O domain data to reduce the effect of the bad"source data In this equation, S and T' denote the tasks in the source while encourage the"good" source data to contribute more for the target domain. For each round of iteration, domain and target domain, respectively. A=as, ar E TrAda boost trains the base classifier on the weighted source is a matrix of parameters. U is a d x d orthogonal matrix and target data. The error is only calculated on the target (mapping function) for mapping the original highdimen data. Furthermore, Tr Ada Boost uses the same strategy as sional data to lowdimensional representations (T AdaBoost to update the incorrectly classified examples in the norm of A is defined as Al .p:(a target domain while using a different strategy from optimization problem (1)estimates the lowdimensional Ada boost to update the incorrectly classified source exam representations UXT, U Xs and the parameters, A, of the ples in the source domain. Theoretical analysis of TrAda model at the same time. The optimization problem(1)can Boost in also given in [6] be further transformed into an equivalent convex optimiza Jiang and Zhai [30] proposed a heuristic method to tion formulation and be solved efficiently In a followup remove"misleading"training examples from the source work, Argyriou et al. [41] proposed a spectral regularization domain based on the difference between conditional framework on matrices for multitask structure learning probabilities P(yrar)and P(slrs). Liao et al. [31] Lee et al. 42] proposed a convex optimization algorithm proposed a new active learning method to select the for simultaneously learning metapriors and feature weights unlabeled data in a target domain to be labeled with the from an ensemble of related prediction tasks. The meta help of the source domain data. W u and Dietterich [53] priors can be transferred among different tasks. Jebara[43] integrated the source domain (auxiliary) data an Support proposed to select features for multitask learning with Vector Machine (SVM)framework for improving the SvMs. Ruckert and Kramer 154 designed a kernelbased classification performance approach to inductive transfer, which aims at finding a suitable kernel for the target data 3.2 Transferring Knowledge of Feature Representations 3.2.2 Unsupervised Feature Construction he featurerepresentationtransfer approach to the induc In [22], Raina et al. proposed to apply sparse coding [55 Live transfer learning problem aims at finding good"feature which is an unsupervised feature construction method, for representations to minimize domain divergence and classi learning higher level features for transfer learning. The basic fication or regression model error. Strategies to find g00(m idea of this approach consists of two steps. In the first step, feature representations are different for difterent types of higher level basis vectors b=(b1, b2:. bs)are learned on the source domain data. If a lot of labeled data in the source the source do data by solving the optimization domain are available, supervised learning methods can be problem(2)as shown as follows used to construct a feature representation. This is similar to common feature learning in the field of multitask learning mIn ∑s 40. If no labeled data in the source domain are available, unsupervised learning methods are proposed to construct the feature representation st.bl2≤1,Vj∈1,…S In this equation, as, is a new representation of basis b, for 3.2.1 Supervised Feature Construction ut s, and B is a coefficient to balance the feature Supervised feature construction methods for the inductive construction term and the regularization term. After learning transfer learning setting are similar to those used in multitask the basis vectors b, in the second step, an optimization learning. The basic idea is to learn a lowdimensional algorithm(3)is applied on the targetdomain data to learn representation that is shared across related tasks. In higher level features based on the basis vectors b addition, the learned new representation can reduce the classification or regression model error of each task as well Argyriou et al. [40 proposed a sparse feature learning arg minl ∑吗+刚zl method for multitask learning. In the inductive transfer PAN AND YANG: A SURVEY ON TRANSFER LEARNING 1351 Finally, discriminative algorithms can be applied to an,)s (0,v,s) with corresponding labels to train classification or regres sion models for use in the target domain. One drawback of this method is that the socalled higher level basis vectors ∑∑5+2∑n1+m 女∈{5,T}i=1 t∈{S,T learned on the source domain in the optimization problem st.y(0t)·m≥ (2)may not be suitable for use in the target domain Recently, manifold learning methods have been 5≥0,讠∈{1,2,,,n}andt∈{S,T} adapted for transfer learning. In 144] Wang and Mahade By solving the optimization problem above, we can learn van proposed a Procrustes analysisbased approach to the parameters wo, is, and r simultaneously manifold alignment without correspondences ch Several researchers have pursued the parametertransfer be used to transfer the knowledge across domains via the approach further. Gao et al. [49] proposed a locally aligned manifolds weighted ensemble learning framework to combine multi 3.3 Transferring knowledge of Parameters ple models for transfer learning, where the weights are dynamically assigned according to a model's predictive Most parametertransfer approaches to the inductive transfer power on each test example in the target domain learning setting assume that individual models for related tasks should share some parameters or prior distributions 3.4 Transferring Relational Knowledge of hyperparameters. Most approaches described in this Different from other three contexts, the relationalknowl section, including a regularization framework and a edgetransfer approach deals with transfer learning pro hierarchical Bayesian framework, are designed to work blems in relational domains where the data are non  i id and under multitask learning. However, they can be easily can be represented by multiple relations, such as networked modified for transfer learning. As mentioned above, multi data and social network data. This approach does not assume task learning tries to learn both the source and target tasks that the data drawn from each domain be independent and simultaneously and perfectly, while transfer learning only identically distributed (i.i. d )as traditionally assumed. It aims at boosting the performance of the target domain by tries to transfer the relationship among data from a source utilizing the source domain data. Thus, in multitask domain to a target domain. In this context, statistical relational earll ng, weights of the loss functions for the source and learning techniques are proposed to solve these problems target data are the same. In contrast, in transfer learning, Mihalkova et al. 50 proposed an algorithm TAMAR that weights in the loss functions for different domains can be transfers relational knowledge with Markov Logic ne different. Intuitively, we may assign a larger weight to the works MLNs)across relational domains. MLNs [56] is a loss function of the target domain to make sure that we can powerful formalism, which combines the compact expres achieve better performance in the target domain SIveness of firstorder logic with flexibility of probability, known as MTIVM, which is based on Gaussian Processes relational domain are represented by predicates and the, Y Lawrence and Platt[45] proposed an efficient algorithm for statistical relational learning. In MLNS, entitie (GP), to handle the multitask learning case. MTIVM tries to relationships are represented in firstorder logic. TAMAR is learn parameters of a Gaussian Process over multiple tasks motivated by the fact that if two domains are related to each by sharing the same GP prior. Bonilla et al. 146] also other, there may exist mappings to connect entities and investigated multitask learning in the context of GP. The their relationships from a source domain to a target domain authors proposed to use a freeform covariance matrix over For example, a professor can be considered as playing a tasks to model intertask dependencies, where a Gp prior is similar role in an academic domain as a manager in an used to induce correlations between tasks. Schwaighofer industrial management domain. In addition the relation et al. 147 proposed to use a hierarchical Bayesian frame ship between a professor and his or her students is similar work(HB)together with GP for multitask learning to the relationship between a manager and his or her Besides transferring the priors of the GP models, some workers Thus, there may exist a mapping from professor to researchers also proposed to transfer parameters of SVMs manager and a mapping from the Professorstudent under a regularization framework. Evgeniou and Pontil (481 relationship to the managerworker relationship. In this borrowed of hb to svms for multitask learning vein tamar tries to use an mln le earned for a source The proposed method assumed that the parameter, u, in domain to aid in the learning of an mln for a target SVMs for each task can be separated into two terms. One is domain. Basically, TAMAR is a twostage algorithm. In the first step, a mapping is constructed from a source min to a common term over tasks and the other is a taskspecinle the target domain based on weighted pseudo loglikelihood term In inductive transfer learning, measure(WPLL). In the second step, a revision is done for ws=wo+us and wr =wo+uT, the mapped structure in the target domain throu FORTE algorithm [57], which is where ws and ar are parameters of the sVMs for the source g programming(ILP)algorithm for revising firstorder task and the target learning task, respectively. wo is a theories. The revised mln can be used as a relational common parameter while us and ur are specific parameters model for inference or reasoning in the target domain for the source task and the target task, respectively. By In the AAAI2008 workshop on transfer learning fo assuming a hyperplane for task t, an complex tasks, Mihalkova and Mooney 151] extended extension of svms to multitask learning case can be written as the following ttp://www.cs.ulexas.cdu/mtaylor/aaaiostl/ 1352 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. VOL 22. NO 10. OCTOBER 2010 TAMAR to the singleentitycentered setting of transfer Most approaches described in the following sections are learning where only one entity in a target domain is related to case 2 above. available. Davis and Domingos [52] proposed an approach to transferring relational knowledge based on a form o 4.1 Transferring the Knowledge of Instances secondorder markov logic. The basic idea of the algorithm Most instancetransfer approaches to the transductive is to discover structural regularities in the source domain in transfer learning setting are motivated by importance the form of markov logic formulas with predicate variables, sampling. To see how importancesamplingbased methods by instantiating these formulas with predicates from the may help in this setting, we first review the problem of target domain empirical risk minimization(ERM)[60]. In general, we might want to learn the optimal parameters o ne mo by minimizing the expected risk, 4 TRANSDUCTIVE TRANSFER LEARNING The term transductive transfer learning was first proposed by A'=arg min H(jiep[l(r, 1, 0)1 6∈G Arnold et al. [58 where they required that the source and target tasks be the same although the domains may be where l(a, 3, A) is a loss function that depends on the different.On top of these conditions, they further required parameter 0. However, since it is hard to estimate the hat all unlabeled data in the target domain are available at probability distribution P, we choose to minimize the erm training time but we believe that this condition can be instead relaxed; instead, in our definition of the transductive transfer learning setting we only require that part of the unlabeled 6= arg min ece 77 , target data be seen at training time in order to obtain the marginal probability for the target data Note that the wordtransductiveis used with several where n is size of the training data In the transductive transfer learning setting, we want to meanings. In the traditional machine learning setting, learn an optimal model for the target domain by minimiz test data are required to be seen at training time, and that ing the expected risk, the learned model cannot be reused for future data. Thus when some new test data arrive, they must be classified f"= Lrg mIII∑P(Dh)(xy ∈G together with all existing data. In our categorization of transfer learning, in contrast, we use the term transductive to However, since no labeled data in the target domain are emphasize the concept that in this type of transfer learning, observed in training data, we have to learn a model from the tasks must be the same and there must be some the source domain data instead If P(Ds)=P(Dr), then we unlabeled data available in the target domain may simply learn the model by solving the following Definition 3(Transductive Transfer Learning. Given a optimization problem for use in the target domain, source domain Ds and a corresponding learning task T s, a target domain Dr and a corresponding learning task TT, f"=argm∑PD)Hx,明 transductive transfer learning aims to improve the learning of (x:,y1∈D the target predictive function /r( )in Dr using the knowledge in Otherwise, when P(D5)+P(Dr), we need to modify the s and T s, where Ds f Dr and Ts=TT. In addition, some above optimization problem to learn a model with high unlabeled targetdomain data must be available at training tim generalization ability for the target domain, as follows This definition covers the work of arnold et al. 58 since the latter considered domain adaptation where the difference 0=mgm∑b,P(Ds(x lies between the marginal probability distributions of eee (a, wedS source and target data; i. e, the tasks are the same but the (IT, yT domains are different e∈e Ps(ts, ys Similar to the traditional transductive learning setting, which aims to make the best use of the unlabeled test data herefore, by adding different penalty values to each instance for learning, in our classification scheme under transductive (as, gs )with the corresponding weight Prier, ur we can unlabeled data be given. In the above definition of learn a precise model for the target domain. Furthermore, transductive transfer learning, the source and target tasks since P(Yr Xr)=P(Ys Xs). Thus, the difference between are the same, which implies that one can adapt the P(Ds)and P(Dr)is caused by P(xs)and P(Xr)and predictive function learned in the source domain for use in the target domain through some unlabeled targetdomain P(ut, yr) P(s data. As mentioned in Section 2.3, this setting can be split to Ps(s, ys. P(r:) two cases: 1)The feature spaces between the source and If we can estimate /(zs: target domains are different, xs+IT, and 2)the feature transductive transfe x t For each instance, we can solve the spaces between domains are the same,x s=tT, but the learning problems marginal probability distributions of the input data are There exist various ways to estimate d Zat grozny different, P(Xs)+P(Xr). This is similar to the require proposed to estimate the terms P('s, )and P(r, )indepen ments in domain adaptation and sample selection bias. dently by constructing simple classification problems PAN AND YANG: A SURVEY ON TRANSFER LEARNING Fan et al. [35] further analyzed the problems by using domains. Then, SCL removes these pivot features from the various classifiers to estimate the probability ratio. Huang data and treats each pivot feature as a new label vector. The t al. [32] Proposed a kernelmean matching (KM) m classification problems can be constructed By assuming each problem can be solved by linear classifier, which is algorithm to learn p(as) directly by matching the means shown as follows between the source domain data and the target domain data in a reproducingkernel Hilbert space(RKHS). KMM can be f(x)=sgm(n7·x), rewritten as the following quadratic programming (QP) SCL can learn a matrix W=w1w2.Wm of parameters. In optimization problem the third step, singular value decomposition(SVD)is applied to matrix W=a1u'2.Lm. Let w=UnV, then B=Ul 11 7K8k23 (h is the number of the shared features) is the matrix (linear (6) mapping) whose rows are the top left singular vectors of W st.∈Band∑Bm≤n Finally, standard discriminative algorithms can be applied to the augmented feature vector to build models. The augmen ted feature vector contains all the inal featu where appended with the new shared features 8: ;. As mentioned Ks. s Ks.T in [38, if the pivot features are well designed, then the learned mapping 0 encodes the correspondence between the KTS KT features from the different domains. Although BenDavic and K =ki(ai, i). Ks,s and Krr are kernel matrices for et al. [61] showed experimentally that SCL can reduce the the source domain data and the target domain data, difference between domains how to select the pivot features respectively. Fi:=HI Ei k(ai, Tr,), where x; E XsUXT, is difficult and domain dependent. In[38], Blitzer al. used a while T∈X heuristic method to select pivot features for natural language It can be proved that B 32). An advantage of using p processing(NLP)problems, such as tagging of sentences. In KMM is that it can avoid performing density estimation of their followup work, the researchers proposed to use Mutual Information(Mi)to choose the pivot features instead either P(cs,)or P(ar), which is difficult when the size of the of using more heuristic criteria [ 8]. MISCL tries to find some data set is small. Sugiyama et al. [34 proposed an algorithm pivot features that have high dependence on the labels in the known as KullbackLeibler Importance Estimation Proce source domain dure(Klien) to estimate pits directly, based on the Transfer learning in the nlp domain is sometimes minimization of the KulbackLeibler divergence. can be referred to as domain adaptation. In this area, Daume [39] integrated with crossvalidation to pe erform mo del selection proposed a kernelmapping function for NLP problems, automatically in two steps: 1)estimating the weights of the which maps the data from both source and target domains to source domain data and 2)training models on the reweighted a highdimensional feature space, where standard discrimi data. Bickel et al. [33 combined the two steps in a unified native learning methods are used to train the classifiers However, the constructed kernelmapping function is framework by deriving a kernellogistic regression classifier. domain knowledge driven. It is not easy to generalize the Besides sample reweighting techniques, Dai et al. [28 kernel mapping to other areas or applications. Blitzer et al extended a traditional Naive Bayesian classifier for the [62] analyzed the uniform convergence bounds for algo transductive transfer learning problems. For more informa rithms that minimized a convex combination of source and tion on importance sampling and reweighting methods for target empirical risks covariate shift or sample selection bias, readers can refer to a In[36], Daiet al proposed a coclusteringbased algorithm recently published book [29]by QuioneroCandela et al. One to propagate the label information across different domains In [63], Xing et al. proposed a novel algorithm known as can also consult a tutorial on Sample Selection Bias by Fan bridged refinement to correct the labels predickedby a shift and Sugiyama in ICDM08 unaware classifier toward a target distribution and take the 4.2 Transferring Knowledge of Feature mixture distribution of the training and test data as a bridge Representations to better transfer from the training data to the test data. In Most featurerepresentationtransfer approaches to the [64\, Ling et al. proposed a spectral classification framework transductive transfer learning setting are under unsuper for crossdomain transfer learning problem, where the vised learning frameworks. Blitzer et al. [38] proposed a objective function is introduced to seek consistency between structural correspondence learning(SCL)algorithm, which the indomain supervision and the outofdomain intrinsic tends [371 to make use of the unlabeled data from the structure In [651, Xue et al. pr roposed a crossdomain text target domain to extract some relevant features that may classification algorithm that extended the traditional prob reduce the difference between the domains. The first step of abilistic latent semantic analysis(PLSA)algorithm to SCL is to define a set of pivot features(the number of pivot integrate labeled and unlabeled data from different but feature is denoted by ma)on the unlabeled data from both related domains, into a unified probabilistic model. The new model is called Topicbridged PLSA, or TPLSA 5.Tutorialslidescanbefoundathttp://www.cs.columbia.edu/tan/Transferlearningviadimensionalityreductionwas 6. The pivot fcatures arc domain specific and dcpend on prior recently proposed by Pan et al.[66]. In this work, Pan et al exploited the Maximum Mean Discrepancy Embedding 1354 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. VOL 22. NO. 10. OCTOBER 2010 reduction, to learn a lowdimensional space to reduce the iteratively to find the best subspace for the target datz Fun (MMDE) method, originally designed for dimensionality data to reduce the dimensions. These two steps difference of distributions between different domai transductive transfer learning. However, MMDE may suffer 6 TRANSFER BOUNDS AND NEGATIVE TRANSFER 67 Pan et al further proposed an efficient feature extraction algorithm, An important issue is to recognize the limit of the power of known as Transfer Component Analysis(TCA)to overcome transfer learning. In [68], Mahmud and Ray analyzed the the drawback of mmde case of transfer learning using Kolmogorov complexity, where some theoretical bounds are proved. In particular, 5 UNSUPERVISED TRANSFER LEARNING the authors used conditional Kolmogorov complexity to measure relatedness between tasks and transfer the"right Definition 4(Unsupervised Transfer Learning). Given a amount of information in a sequential transfer learning task source domain Ds with a learning task T s, a target domain Dr under a Bayesian framework and a corresponding learning task Tr, unsupervised transfer Recently, Eaton et al. [69] proposed a novel graphbased learning aims to help improve the learning of the target method for knowledge transfer, where the relationships predictive function fr( )in Dr using the knowledge in Ds and between source tasks are modeled by embedding the set of S, where Ts f Tr and ]'s and ]r are not observable arned source models in a graph using transferability as the Transferring to a new task ey mapping Based on the definition of the unsupervised transfer problem into the graph and then learning a function on this learnng setting no labeled data are observed in the source graph that automatically determines the parameters to and target domains in training. So far, there is little research transfer to the new learning task work on this setting. Recently, Selftaught clustering(StCh Negative transfer happens when the source domain data [26] and transferred discriminative analysis(TDA)[27 and task contribute to the reduced performance of learning algorithms are proposed to transfer clustering and transfer in the target domain Despite the fact that how to avoid dimensionality reduction problems, respectively negative transfer is a very important issue, little research 5.1 Transferring Knowledge of Feature work has been published on this topic. Rosenstein et al. 70J Representations empirically showed that if two tasks are too dissimilar, then Dai et al. [26] studied a new case of clustering problems bruteforce transfer may hurt the performance of the target known as selftaught clustering. Selftaught clustering is an task. Some works have been exploited to analyze related instance of unsupervised transfer learning, which aims at ness among tasks and task clustering techniques, such as clustering a small collection of unlabeled data in the [71 ,[72], which may help provide guidance on how to target domain with the help of a large amount of avoid negative transfer automatically. Bakker and Heskes unlabeled data in the source domain STC tries to learn 72 adopted a Bayesian approach in which some of the a common feature space across domains, which helps in model parameters are shared for all tasks and others more clustering in the target domain. The objective function of loosely connected through a joint prior distribution that can StC is shown as follows be learned from the data. Thus, the data are clustered based on the task parameters where tasks in the same cluster are 1(1,2)1(X,2)A(X,2s,2, supposed to be related to each other. Argyriou et al. 73 considered situations in which the learning tasks can be divided into groups. Tasks within each group are related by where Xs and Xr are the source and target domain data, sharing a lowdimensional representation, which differs respectively. Z is a shared feature space by Xs and Xr, and among different groups. As a result, tasks within a group I(, ) is the mutual information between two random can find it easier to transfer useful knowledge variables. Suppose that there exist three clustering functions x7→XT,Cx:Xs→Xs, and Cz:z→Z, where XT, XS, and Z are corresponding clusters of XT, Xs, and Z. 7 APPLICATIONS OF TRANSFER LEARNING respectively. Ine goal of StC is to learn Xr by solving the ecently, transfer learning techniques have been applied optimization problem (7) successfully in many realworld applications. Raina et al 74] and Dai et al. [36], [28] proposed to use transfe arg min (XT, Xs 8)learning techniques to learn text data across domains, espectively. Bli prop An iterative algorithm for solving the optimization function solving NLP problems. An extension of SCL was proposed (8)was in [8 for solving sentiment classification problems. Wu and Similarly, Wang et al. [27] proposed a TDA algorithm to Dietterich [53 proposed to use both inadequate target domain data and plenty of low quality source domain data solve the transfer dimensionality reduction problem. TDA first for image classification problems. Arnold et al.[58] for the target unlabeled data. It then applies dimensionality proposed to use transductive transfer learning methods to reduction methods to the target data and labeled source solve nameentity recognition problems. In [75],[76], [78],[79], transfer learning techniques are proposed to ed transfer lca the predicted labels arc latent extract knowledge from WiFi localization models across variables, such as clusters or reduced dimensions time periods, space, and mobile devices, to benefit WiFi

20190421
 3.53MB
迁移学习入门手册transfer_learning_tutorial_wjd.pdf
20191115第 1 章介绍了迁移学习的概念，重点解决什么是迁移学习、为什么要进行迁移学习这 两个问题。 第 2 章介绍了迁移学习的研究领域。 第 3 章介绍了迁移学习的应用领域。 第 4 章是迁移学习领域的一些基
迁移学习入门手册transfer_learning_tutorial_wjd.pdf下载_course
20200709第 1 章介绍了迁移学习的概念，重点解决什么是迁移学习、为什么要进行迁移学习这 两个问题。 第 2 章介绍了迁移学习的研究领域。 第 3 章介绍了迁移学习的应用领域。 第 4 章是迁移学习领域的一些基
2020华为HCIA/HCNA/数通/路由交换/实验/视频/教程/持续更新赠题库
20200525本课程不仅可以帮助大家顺利考取华为HCIA证书，同时技术视频均为理论+实战配套讲解，讲解细致，通俗易懂，资料完整，可以让大家学到实实在在企业用到的网络技术，本课程包含完整的学习资料，视频+PPT课件，能够帮助你快速掌握HCIA数通网络技术，同时视频中34视频后面的附件课件包含了HCIA数通考试题库（带答案），适合从零基础学网络考HCIA的同学！
21天通关Python（仅视频课）
20190521本页面购买不发书！！！仅为视频课购买！！！ 请务必到https://edu.csdn.net/bundled/detail/49下单购买课+书。 本页面，仅为观看视频页面，如需一并购买图书，请务必到https://edu.csdn.net/bundled/detail/49下单购买课程+图书！！！ 疯狂Python精讲课程覆盖《疯狂Python讲义》全书的主体内容。 内容包括Python基本数据类型、Python列表、元组和字典、流程控制、函数式编程、面向对象编程、文件读写、异常控制、数据库编程、并发编程与网络编程、数据可视化分析、Python爬虫等。 全套课程从Python基础开始介绍，逐步步入当前就业热点。将会带着大家从Python基础语法开始学习，为每个知识点都提供对应的代码实操、代码练习，逐步过渡到文件IO、数据库编程、并发编程、网络编程、数据分&nbsp;析和网络爬虫等内容，本课程会从小案例起，至爬虫、数据分析案例终、以Python知识体系作为内在逻辑，以Python案例作为学习方式，最终达到“知行合一”。
 28KB
各显卡算力对照表！
20180111挖矿必备算力对照！看看你的机器是否达到标准！看完自己想想办法刷机！
 117B
Keil5安装包
20190523Keil5安装包，附带STM31F1、STM32F4支持包以及破解软件。
Windows版YOLOv4目标检测实战：人脸口罩佩戴检测
20200520课程演示环境：Windows10；CUDA10.2; cuDNN 7.6.5; Python 3.7; Visual Studio 2019; OpenCV3.4 需要学习Ubuntu系统YOLOv4的同学请前往《YOLOv4目标检测实战：人脸口罩佩戴检测》 课程链接：https://edu.csdn.net/course/detail/28860 当前，人脸口罩佩戴检测是急需的应用，而YOLOv4是新推出的强悍的目标检测技术。本课程使用YOLOv4实现人脸口罩佩戴的实时检测。课程提供超万张已标注人脸口罩数据集。训练后的YOLOv4可对真实场景下人脸口罩佩戴进行高精度地实时检测。 本课程会讲述本项目超万张人脸口罩数据集的制作方法，包括使用labelImg标注工具标注以及如何使用Python代码对第三方数据集进行修复和清洗。 本课程的YOLOv4使用AlexyAB/darknet，在Windows系统上做人脸口罩佩戴检测项目演示。具体项目过程包括：安装YOLOv4、训练集和测试集自动划分、修改配置文件、训练网络模型、测试训练出的网络模型、性能统计(mAP计算)和先验框聚类分析。
 41.55MB
25个经典网站源代码
2013060925个经典网站源代码 有简约的有时尚的方便大家参考、模仿。
Python金融数据分析入门到实战
20190926会用Python分析金融数据 or 金融行业会用Python 职场竞争力更高 Python金融数据分析入门到实战 掌握金融行业数据分析必备技能 以股票量化交易为应用场景 完成技术指标实现的全过程 课程选取股票量化交易为应用场景，由股票数据的获取、技术指标的实现，逐步进阶到策略的设计和回测，由浅入深、由技术到思维地为同学们讲解Python金融数据分析在股票量化交易中的应用。 以Python为编程语言 解锁3大主流数据分析工具 Python做金融具有先天优势，课程提取了Python数据分析工具NumPy、Pandas及可视化工具Matplotlib的关键点详细讲解，帮助同学掌握数据分析的关键技能。 2大购课福利
程序员的数学：线性代数
20190926编程的基础是计算机科学，而计算机科学的基础是数学。因此，学习数学有助于巩固编程的基础，写出更健壮的程序。程序员的数学系列课程主要讲解程序员必备的数学知识，借以培养程序员的数学思维。学习者无需精通编程，也无需精通数学。从概率统计、线性代数、微积分、优化理论、随机过程到当前大热的机器学习，讲师幽默风趣，课件精致美观，深入浅出带你重学数学，时间不可重来，知识可以重学！
手把手教你蓝牙协议栈入门
20200716本课程定位是：引领想学习蓝牙协议栈的学生或者从事蓝牙，但是对蓝牙没有一个系统概念的工程师快速入门 课程是多年从事蓝牙经验总结出来的，希望能让你看完有一种醍醐灌顶的感觉。 不要在摸着石头过河了·学习完这些你肯定还是要继续学习蓝牙协议栈，但是至少懂了蓝牙的一些概念以及适合高效的学习方法 本课程一共分为4个小节： 1）蓝牙教程计划.mp4 ，主要介绍下我们的视频规划以及后续的蓝牙教程规划 2）蓝牙的前生后世.mp4 主要介绍下蓝牙的产生背景概念，以及蓝牙从开始产生到现在最新的5.2的发展过程，新赠的功能特性 3）市面蓝牙架构调查.mp4 主要介绍市面蓝牙产品的架构以及HCI蓝牙芯片的详细架构，让你对蓝牙有一个整体的认识，对于后续做蓝牙产品选型大有帮助 4）快速学习蓝牙文档介绍_工具介绍.mp4 主要介绍HCI蓝牙芯片的协议栈以及profile获取途径以及学习蓝牙的高效工具，引领你快速找到适合自己的方法来学习蓝牙
4天搞定Spring核心原理训练营
20200617本课程主要是分享当前互联网Java架构及高级热门技术，由业内技术大牛，行业及实战经验丰富的讲师进行技术分享。其中涵盖redis/mongodb/dubbo/zookeeper/kafka 高并发、高可用、分布式、微服务技术。

学院
ELF视频教程
ELF视频教程

学院
用微服务spring cloud架构打造物联网云平台
用微服务spring cloud架构打造物联网云平台

学院
【Python随到随学】FLask第二周
【Python随到随学】FLask第二周

下载
基于遗传算法的平面阵列阵列稀疏（matlab程序）.zip
基于遗传算法的平面阵列阵列稀疏（matlab程序）.zip

学院
零基础极简以太坊智能合约开发环境搭建并开发部署
零基础极简以太坊智能合约开发环境搭建并开发部署

学院
智能停车场云平台（附vue+SpringBoot前后端项目源码）
智能停车场云平台（附vue+SpringBoot前后端项目源码）

下载
联想EXCEL培训资料.ppt
联想EXCEL培训资料.ppt

下载
Java讲座源码
Java讲座源码

学院
Windows系统管理
Windows系统管理

学院
FFmpeg4.3系列之16：WebRTC之小白入门与视频聊天的实战
FFmpeg4.3系列之16：WebRTC之小白入门与视频聊天的实战

学院
2021年 系统架构设计师 系列课
2021年 系统架构设计师 系列课

下载
只需要几秒 超强win10关闭自动更新工具.rar
只需要几秒 超强win10关闭自动更新工具.rar

学院
MySQL 高可用工具 DRBD 实战部署详解
MySQL 高可用工具 DRBD 实战部署详解

博客
[源码] Meidapipe框架分析——Packet
[源码] Meidapipe框架分析——Packet

学院
NFS 实现高可用（DRBD + heartbeat）
NFS 实现高可用（DRBD + heartbeat）

博客
Window10问题解决(持续更新)
Window10问题解决(持续更新)

博客
PHP SOCKET编程
PHP SOCKET编程

下载
Web前端开发规范手册.rar
Web前端开发规范手册.rar

学院
零基础一小时极简以太坊智能合约开发环境搭建并开发部署
零基础一小时极简以太坊智能合约开发环境搭建并开发部署

博客
PyTorch hub 模块
PyTorch hub 模块

博客
办理注册公司的流程是什么？
办理注册公司的流程是什么？

博客
谈一谈php://input和php://output
谈一谈php://input和php://output

学院
PowerBI重要外部工具详解
PowerBI重要外部工具详解

博客
基于SpringBoot+MyBatis的电影购票系统
基于SpringBoot+MyBatis的电影购票系统

学院
用Go语言来写区块链（一）
用Go语言来写区块链（一）

学院
【Python随到随学】 FLask第一周
【Python随到随学】 FLask第一周

下载
摄影测量之内定向程序.zip
摄影测量之内定向程序.zip

博客
weex 简单编写饿了吗
weex 简单编写饿了吗

学院
使用 Linux 平台充当 Router 路由器
使用 Linux 平台充当 Router 路由器

博客
127.0.0.1和0.0.0.0地址的区别
127.0.0.1和0.0.0.0地址的区别