没有合适的资源?快使用搜索试试~ 我知道了~
[NMT 超参的影响] Massive Exploration of Neural Machine Translation Ar
需积分: 0 0 下载量 75 浏览量
2022-08-03
21:49:40
上传
评论
收藏 299KB PDF 举报
温馨提示
试读
9页
Massive Exploration of Neural Machine TranslationDenny Britz∗†, Anna Goldie∗, Mi
资源详情
资源评论
资源推荐
Massive Exploration of Neural Machine Translation
Architectures
Denny Britz
∗†
, Anna Goldie
∗
, Minh-Thang Luong, Quoc Le
{dennybritz,agoldie,thangluong,qvl}@google.com
Google Brain
Abstract
Neural Machine Translation (NMT) has
shown remarkable progress over the past
few years with production systems now
being deployed to end-users. One major
drawback of current architectures is that
they are expensive to train, typically re-
quiring days to weeks of GPU time to
converge. This makes exhaustive hyper-
parameter search, as is commonly done
with other neural network architectures,
prohibitively expensive. In this work,
we present the first large-scale analy-
sis of NMT architecture hyperparameters.
We report empirical results and variance
numbers for several hundred experimental
runs, corresponding to over 250,000 GPU
hours on the standard WMT English to
German translation task. Our experiments
lead to novel insights and practical advice
for building and extending NMT architec-
tures. As part of this contribution, we
release an open-source NMT framework
1
that enables researchers to easily experi-
ment with novel techniques and reproduce
state of the art results.
1 Introduction
Neural Machine Translation (NMT) (Kalchbren-
ner and Blunsom, 2013; Sutskever et al., 2014;
Cho et al., 2014) is an end-to-end approach to
automated translation. NMT has shown impres-
sive results (Jean et al., 2015; Luong et al., 2015b;
Sennrich et al., 2016a; Wu et al., 2016) sur-
passing those of phrase-based systems while ad-
dressing shortcomings such as the need for hand-
∗
Both authors contributed equally to this work.
†
Work done as a member of the Google Brain Residency
program (g.co/brainresidency).
1
https://github.com/google/seq2seq/
engineered features. The most popular approaches
to NMT are based on an encoder-decoder architec-
ture consisting of two recurrent neural networks
(RNNs) and an attention mechanism that aligns
target with source tokens (Bahdanau et al., 2015;
Luong et al., 2015a).
One shortcoming of current NMT architectures
is the amount of compute required to train them.
Training on real-world datasets of several million
examples typically requires dozens of GPUs and
convergence time is on the order of days to weeks
(Wu et al., 2016). While sweeping across large hy-
perparameter spaces is common in Computer Vi-
sion (Huang et al., 2016b), such exploration would
be prohibitively expensive for NMT models, lim-
iting researchers to well-established architectures
and hyperparameter choices. Furthermore, there
have been no large-scale studies of how architec-
tural hyperparameters affect the performance of
NMT systems. As a result, it remains unclear why
these models perform as well as they do, as well
as how we might improve them.
In this work, we present the first comprehen-
sive analysis of architectural hyperparameters for
Neural Machine Translation systems. Using a to-
tal of more than 250,000 GPU hours, we explore
common variations of NMT architectures and pro-
vide insight into which architectural choices mat-
ter most. We report BLEU scores, perplexities,
model sizes, and convergence time for all ex-
periments, including variance numbers calculated
across several runs of each experiment. In ad-
dition, we release to the public a new software
framework that was used to run the experiments.
In summary, the main contributions of this work
are as follows:
• We provide immediately applicable insights
into the optimization of Neural Machine
Translation models, as well as promising di-
rections for future research. For example, we
1
arXiv:1703.03906v2 [cs.CL] 21 Mar 2017
Asama浅间
- 粉丝: 22
- 资源: 299
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0