没有合适的资源?快使用搜索试试~ 我知道了~
第二十四篇——Unsupervised Domain Adaptation by Backpropagation.pdf
需积分: 0 7 下载量 179 浏览量
2020-11-23
10:57:15
上传
评论
收藏 3.18MB PDF 举报
温馨提示
顶级的深层架构是在大量标记数据上训练的。在某项任务缺少标记数据时,域适应通常提供了一个有吸引力的选择,因为可以获得性质相似但来自不同领域的标记数据(例如合成图像)。在此,我们提出了一种新的领域适应方法,该方法可以训练来自源域的大量标记数据和来自目标域的大量未标记数据(不需要标记的目标域数据)。随着训练的进行,该方法促进了“深度”特征的出现,这些特征(i)对源域上的主要学习任务具有区分能力,(ii)对于域之间的转换不变性。我们表明,这种适应行为可以在几乎任何前馈模型中实现,通过增加少量的标准层和一个简单的新的梯度反转层。由此产生的增强架构可以使用标准的反向传播进行训练。总的来说,使用任何深度学
资源推荐
资源详情
资源评论
Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin GANIN@SKOLTECH.RU
Victor Lempitsky LEMPITSKY@SKOLTECH.RU
Skolkovo Institute of Science and Technology (Skoltech)
Abstract
Top-performing deep architectures are trained on
massive amounts of labeled data. In the absence
of labeled data for a certain task, domain adap-
tation often provides an attractive option given
that labeled data of similar nature but from a dif-
ferent domain (e.g. synthetic images) are avail-
able. Here, we propose a new approach to do-
main adaptation in deep architectures that can
be trained on large amount of labeled data from
the source domain and large amount of unlabeled
data from the target domain (no labeled target-
domain data is necessary).
As the training progresses, the approach pro-
motes the emergence of “deep” features that are
(i) discriminative for the main learning task on
the source domain and (ii) invariant with respect
to the shift between the domains. We show that
this adaptation behaviour can be achieved in al-
most any feed-forward model by augmenting it
with few standard layers and a simple new gra-
dient reversal layer. The resulting augmented
architecture can be trained using standard back-
propagation.
Overall, the approach can be implemented with
little effort using any of the deep-learning pack-
ages. The method performs very well in a se-
ries of image classification experiments, achiev-
ing adaptation effect in the presence of big do-
main shifts and outperforming previous state-of-
the-art on Office datasets.
1. Introduction
Deep feed-forward architectures have brought impressive
advances to the state-of-the-art across a wide variety of
machine-learning tasks and applications. At the moment,
however, these leaps in performance come only when a
large amount of labeled training data is available. At the
same time, for problems lacking labeled data, it may be
still possible to obtain training sets that are big enough for
training large-scale deep models, but that suffer from the
shift in data distribution from the actual data encountered
at “test time”. One particularly important example is syn-
thetic or semi-synthetic training data, which may come in
abundance and be fully labeled, but which inevitably have
a distribution that is different from real data (Liebelt &
Schmid, 2010; Stark et al., 2010; V
´
azquez et al., 2014; Sun
& Saenko, 2014).
Learning a discriminative classifier or other predictor in
the presence of a shift between training and test distribu-
tions is known as domain adaptation (DA). A number of
approaches to domain adaptation has been suggested in the
context of shallow learning, e.g. in the situation when data
representation/features are given and fixed. The proposed
approaches then build the mappings between the source
(training-time) and the target (test-time) domains, so that
the classifier learned for the source domain can also be ap-
plied to the target domain, when composed with the learned
mapping between domains. The appeal of the domain
adaptation approaches is the ability to learn a mapping be-
tween domains in the situation when the target domain data
are either fully unlabeled (unsupervised domain annota-
tion) or have few labeled samples (semi-supervised domain
adaptation). Below, we focus on the harder unsupervised
case, although the proposed approach can be generalized to
the semi-supervised case rather straightforwardly.
Unlike most previous papers on domain adaptation that
worked with fixed feature representations, we focus on
combining domain adaptation and deep feature learning
within one training process (deep domain adaptation). Our
goal is to embed domain adaptation into the process of
learning representation, so that the final classification de-
cisions are made based on features that are both discrim-
inative and invariant to the change of domains, i.e. have
the same or very similar distributions in the source and the
target domains. In this way, the obtained feed-forward net-
work can be applicable to the target domain without being
hindered by the shift between the two domains.
We thus focus on learning features that combine (i)
discriminativeness and (ii) domain-invariance. This is
achieved by jointly optimizing the underlying features as
well as two discriminative classifiers operating on these
features: (i) the label predictor that predicts class labels
and is used both during training and at test time and (ii) the
arXiv:1409.7495v2 [stat.ML] 27 Feb 2015
Unsupervised Domain Adaptation by Backpropagation
domain classifier that discriminates between the source and
the target domains during training. While the parameters of
the classifiers are optimized in order to minimize their error
on the training set, the parameters of the underlying deep
feature mapping are optimized in order to minimize the loss
of the label classifier and to maximize the loss of the domain
classifier. The latter encourages domain-invariant features
to emerge in the course of the optimization.
Crucially, we show that all three training processes can
be embedded into an appropriately composed deep feed-
forward network (Figure 1) that uses standard layers and
loss functions, and can be trained using standard backprop-
agation algorithms based on stochastic gradient descent or
its modifications (e.g. SGD with momentum). Our ap-
proach is generic as it can be used to add domain adaptation
to any existing feed-forward architecture that is trainable by
backpropagation. In practice, the only non-standard com-
ponent of the proposed architecture is a rather trivial gra-
dient reversal layer that leaves the input unchanged during
forward propagation and reverses the gradient by multiply-
ing it by a negative scalar during the backpropagation.
Below, we detail the proposed approach to domain adap-
tation in deep architectures, and present results on tradi-
tional deep learning image datasets (such as MNIST (Le-
Cun et al., 1998) and SVHN (Netzer et al., 2011)) as well
as on OFFICE benchmarks (Saenko et al., 2010), where
the proposed method considerably improves over previous
state-of-the-art accuracy.
2. Related work
A large number of domain adaptation methods have been
proposed over the recent years, and here we focus on the
most related ones. Multiple methods perform unsuper-
vised domain adaptation by matching the feature distri-
butions in the source and the target domains. Some ap-
proaches perform this by reweighing or selecting samples
from the source domain (Borgwardt et al., 2006; Huang
et al., 2006; Gong et al., 2013), while others seek an ex-
plicit feature space transformation that would map source
distribution into the target ones (Pan et al., 2011; Gopalan
et al., 2011; Baktashmotlagh et al., 2013). An important
aspect of the distribution matching approach is the way the
(dis)similarity between distributions is measured. Here,
one popular choice is matching the distribution means in
the kernel-reproducing Hilbert space (Borgwardt et al.,
2006; Huang et al., 2006), whereas (Gong et al., 2012; Fer-
nando et al., 2013) map the principal axes associated with
each of the distributions. Our approach also attempts to
match feature space distributions, however this is accom-
plished by modifying the feature representation itself rather
than by reweighing or geometric transformation. Also, our
method uses (implicitly) a rather different way to measure
the disparity between distributions based on their separa-
bility by a deep discriminatively-trained classifier.
Several approaches perform gradual transition from the
source to the target domain (Gopalan et al., 2011; Gong
et al., 2012) by a gradual change of the training distribu-
tion. Among these methods, (S. Chopra & Gopalan, 2013)
does this in a “deep” way by the layerwise training of a
sequence of deep autoencoders, while gradually replacing
source-domain samples with target-domain samples. This
improves over a similar approach of (Glorot et al., 2011)
that simply trains a single deep autoencoder for both do-
mains. In both approaches, the actual classifier/predictor
is learned in a separate step using the feature representa-
tion learned by autoencoder(s). In contrast to (Glorot et al.,
2011; S. Chopra & Gopalan, 2013), our approach performs
feature learning, domain adaptation and classifier learning
jointly, in a unified architecture, and using a single learning
algorithm (backpropagation). We therefore argue that our
approach is simpler (both conceptually and in terms of its
implementation). Our method also achieves considerably
better results on the popular OFFICE benchmark.
While the above approaches perform unsupervised domain
adaptation, there are approaches that perform supervised
domain adaptation by exploiting labeled data from the tar-
get domain. In the context of deep feed-forward archi-
tectures, such data can be used to “fine-tune” the net-
work trained on the source domain (Zeiler & Fergus, 2013;
Oquab et al., 2014; Babenko et al., 2014). Our approach
does not require labeled target-domain data. At the same
time, it can easily incorporate such data when it is avail-
able.
An idea related to ours is described in (Goodfellow et al.,
2014). While their goal is quite different (building gener-
ative deep networks that can synthesize samples), the way
they measure and minimize the discrepancy between the
distribution of the training data and the distribution of the
synthesized data is very similar to the way our architecture
measures and minimizes the discrepancy between feature
distributions for the two domains.
Finally, a recent and concurrent report by (Tzeng et al.,
2014) also focuses on domain adaptation in feed-forward
networks. Their set of techniques measures and minimizes
the distance of the data means across domains. This ap-
proach may be regarded as a “first-order” approximation
to our approach, which seeks a tighter alignment between
distributions.
3. Deep Domain Adaptation
3.1. The model
We now detail the proposed model for the domain adap-
tation. We assume that the model works with input sam-
ples x ∈ X, where X is some input space and cer-
tain labels (output) y from the label space Y . Below,
we assume classification problems where Y is a finite set
(Y = {1, 2, . . . L}), however our approach is generic and
can handle any output label space that other deep feed-
剩余10页未读,继续阅读
资源评论
疯狂java杰尼龟
- 粉丝: 4w+
- 资源: 13
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功