【免费】[NMT超参的影响]MassiveExplorationofNeuralMachineTranslationAr资源-CSDN文库

需积分: 0 75 浏览量 2022-08-03 21:49:40 上传评论收藏 299KB PDF 举报

资源详情

资源评论

资源推荐

Massive Exploration of Neural Machine Translation

Architectures

Denny Britz

∗†

, Anna Goldie

∗

, Minh-Thang Luong, Quoc Le

{dennybritz,agoldie,thangluong,qvl}@google.com

Google Brain

Abstract

Neural Machine Translation (NMT) has

shown remarkable progress over the past

few years with production systems now

being deployed to end-users. One major

drawback of current architectures is that

they are expensive to train, typically re-

quiring days to weeks of GPU time to

converge. This makes exhaustive hyper-

parameter search, as is commonly done

with other neural network architectures,

prohibitively expensive. In this work,

we present the ﬁrst large-scale analy-

sis of NMT architecture hyperparameters.

We report empirical results and variance

numbers for several hundred experimental

runs, corresponding to over 250,000 GPU

hours on the standard WMT English to

German translation task. Our experiments

lead to novel insights and practical advice

for building and extending NMT architec-

tures. As part of this contribution, we

release an open-source NMT framework

that enables researchers to easily experi-

ment with novel techniques and reproduce

state of the art results.

1 Introduction

Neural Machine Translation (NMT) (Kalchbren-

ner and Blunsom, 2013; Sutskever et al., 2014;

Cho et al., 2014) is an end-to-end approach to

automated translation. NMT has shown impres-

sive results (Jean et al., 2015; Luong et al., 2015b;

Sennrich et al., 2016a; Wu et al., 2016) sur-

passing those of phrase-based systems while ad-

dressing shortcomings such as the need for hand-

∗

Both authors contributed equally to this work.

†

Work done as a member of the Google Brain Residency

program (g.co/brainresidency).

https://github.com/google/seq2seq/

engineered features. The most popular approaches

to NMT are based on an encoder-decoder architec-

ture consisting of two recurrent neural networks

(RNNs) and an attention mechanism that aligns

target with source tokens (Bahdanau et al., 2015;

Luong et al., 2015a).

One shortcoming of current NMT architectures

is the amount of compute required to train them.

Training on real-world datasets of several million

examples typically requires dozens of GPUs and

convergence time is on the order of days to weeks

(Wu et al., 2016). While sweeping across large hy-

perparameter spaces is common in Computer Vi-

sion (Huang et al., 2016b), such exploration would

be prohibitively expensive for NMT models, lim-

iting researchers to well-established architectures

and hyperparameter choices. Furthermore, there

have been no large-scale studies of how architec-

tural hyperparameters affect the performance of

NMT systems. As a result, it remains unclear why

these models perform as well as they do, as well

as how we might improve them.

In this work, we present the ﬁrst comprehen-

sive analysis of architectural hyperparameters for

Neural Machine Translation systems. Using a to-

tal of more than 250,000 GPU hours, we explore

common variations of NMT architectures and pro-

vide insight into which architectural choices mat-

ter most. We report BLEU scores, perplexities,

model sizes, and convergence time for all ex-

periments, including variance numbers calculated

across several runs of each experiment. In ad-

dition, we release to the public a new software

framework that was used to run the experiments.

In summary, the main contributions of this work

are as follows:

• We provide immediately applicable insights

into the optimization of Neural Machine

Translation models, as well as promising di-

rections for future research. For example, we

arXiv:1703.03906v2 [cs.CL] 21 Mar 2017

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

评论收藏

内容反馈

Asama浅间

粉丝: 22
资源: 299

[NMT 超参的影响] Massive Exploration of Neural Machine Translation Ar

评论0

最新资源

[NMT 超参的影响] Massive Exploration of Neural Machine Translation Ar

评论0

机器翻译PPT-nueral machine translation

A Method of Unknown Words Processing for Neural Machine Translation Using HowNet

谷歌开源神经机器翻译模型底层框架seq2seq.zip

nmt_utils and models

Neural-Machine-Translation:使用注意机制的神经机器翻译

百度的何中军在《神经网络机器翻译技术及应用》中提到的5篇论文

Social Media Machine Translation Toolkit

nmt-master.zip_DEMO_nmt_seq2seq_神经 翻译_神经机器翻译

MANNs4NMT-master

NMT机器翻译Attention.zip

jvm-nmt-tracing-master.zip

matlab代码左移-nmt:GNMT模型训练中日韩翻译

Neural-Machine-Translation:使用Tensorflow seq2seq模型将英语翻译成法语

neural-machine-translation:PML-DL库。 作业4

An Unknown Word Processing Method in NMT by Integrating Syntactic Structure and Semantic Concept

神经机器翻译代码库nmt.hybrid.zip

matlab代码左移-Neural-Machine-Translation-seq2seq-Tutorial:神经机器翻译-seq2seq-教

涂兆鹏-关于提高nmt忠实度的报告文档

NMT说明文档2

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

网络安全+《2024网络安全报告》

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

最新资源

nmt-master.zip_DEMO_nmt_seq2seq_神经翻译_神经机器翻译

neural-machine-translation:PML-DL库。作业4