【免费】A修复训练与推断的误差1资源-CSDN文库

需积分: 0 74 浏览量 2022-08-03 15:49:03 上传评论收藏 834KB PDF 举报

在神经机器翻译（Neural Machine Translation, NMT）领域，模型在训练和推理阶段存在一定的差异，这导致了训练与推理间的误差积累。通常，NMT模型基于编码器-解码器架构工作，如RNN（Recurrent Neural Network）、CNN（Convolutional Neural Network）和Transformer等模型，它们依赖于上下文词来预测下一个目标词，从而构建目标语言模型。在训练时，模型以真实词作为上下文进行预测，而在推理时则必须从头开始生成整个序列，这种不匹配是导致错误积累的主要原因。此外，词汇级别的训练要求生成序列与真实序列严格匹配，这可能导致对不同但合理的翻译进行过度修正。这种过度修正限制了模型的泛化能力，因为它可能过于依赖特定的翻译路径，而忽略了其他可能的正确翻译。为了解决这些问题，本文提出了一种新的训练策略，即在训练期间不仅从真实序列中采样上下文词，也从模型预测的序列中采样。通过选择具有句子级最优的预测序列，模型能够更好地学习到不同上下文情况下的翻译模式。实验结果表明，该方法在中文到英文以及WMT'14英文到德文的翻译任务上都取得了显著的性能提升。具体实现上，可以采用一种混合采样策略：一部分上下文词来自于已知的真实序列，另一部分则根据模型当前的预测状态生成。这样，模型在训练过程中能接触到更广泛的上下文情况，包括自身预测的错误，从而增强其适应性和鲁棒性。此外，这种方法还可以缓解训练过程中的过拟合问题，因为它允许模型在一定程度上偏离精确的地面真值，探索多种可能的合理翻译路径。这样，模型在推理时就更有可能生成高质量的翻译，减少了由于错误累积导致的性能下降。总的来说，本研究提出的训练策略旨在缩小训练与推理之间的差距，提高NMT模型的泛化能力和翻译质量。通过结合真实序列和预测序列，模型能够学习到更丰富的上下文信息，降低对单一翻译路径的依赖，从而提高翻译的准确性和流畅性。这一改进对于提升神经机器翻译的实际应用效果具有重要意义。

资源详情

资源评论

资源推荐

Bridging the Gap between Training and Inference

for Neural Machine Translation

Wen Zhang

1,2

Yang Feng

1,2∗

Fandong Meng

Di You

Qun Liu

Key Laboratory of Intelligent Information Processing

Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS)

University of Chinese Academy of Sciences, Beijing, China

{zhangwen,fengyang}@ict.ac.cn

Pattern Recognition Center, WeChat AI, Tencent Inc, China

fandongmeng@tencent.com

Worcester Polytechnic Institute, Worcester, MA, USA

dyou@wpi.edu

Huawei Noah’s Ark Lab, Hong Kong, China

qun.liu@huawei.com

Abstract

Neural Machine Translation (NMT) generates

target words sequentially in the way of pre-

dicting the next word conditioned on the con-

text words. At training time, it predicts with

the ground truth words as context while at in-

ference it has to generate the entire sequence

from scratch. This discrepancy of the fed con-

text leads to error accumulation among the

way. Furthermore, word-level training re-

quires strict matching between the generated

sequence and the ground truth sequence which

leads to overcorrection over different but rea-

sonable translations. In this paper, we ad-

dress these issues by sampling context words

not only from the ground truth sequence but

also from the predicted sequence by the model

during training, where the predicted sequence

is selected with a sentence-level optimum.

Experiment results on Chinese→English and

WMT’14 English→German translation tasks

demonstrate that our approach can achieve sig-

niﬁcant improvements on multiple datasets.

1 Introduction

Neural Machine Translation has shown promising

results and drawn more attention recently. Most

NMT models ﬁt in the encoder-decoder frame-

work, including the RNN-based (Sutskever et al.,

2014; Bahdanau et al., 2015; Meng and Zhang,

2019), the CNN-based (Gehring et al., 2017) and

the attention-based (Vaswani et al., 2017) mod-

els, which predict the next word conditioned on

the previous context words, deriving a language

model over target words. The scenario is at train-

ing time the ground truth words are used as context

∗

Corresponding author.

while at inference the entire sequence is generated

by the resulting model on its own and hence the

previous words generated by the model are fed as

context. As a result, the predicted words at train-

ing and inference are drawn from different dis-

tributions, namely, from the data distribution as

opposed to the model distribution. This discrep-

ancy, called exposure bias (Ranzato et al., 2015),

leads to a gap between training and inference. As

the target sequence grows, the errors accumulate

among the sequence and the model has to predict

under the condition it has never met at training

time.

Intuitively, to address this problem, the model

should be trained to predict under the same con-

dition it will face at inference. Inspired by DATA

AS DEMONSTRATOR (DAD) (Venkatraman et al.,

2015), feeding as context both ground truth words

and the predicted words during training can be

a solution. NMT models usually optimize the

cross-entropy loss which requires a strict pairwise

matching at the word level between the predicted

sequence and the ground truth sequence. Once

the model generates a word deviating from the

ground truth sequence, the cross-entropy loss will

correct the error immediately and draw the re-

maining generation back to the ground truth se-

quence. However, this causes a new problem. A

sentence usually has multiple reasonable transla-

tions and it cannot be said that the model makes a

mistake even if it generates a word different from

the ground truth word. For example,

reference: We should comply with the rule.

cand1: We should abide with the rule.

cand2: We should abide by the law.

cand3: We should abide by the rule.

arXiv:1906.02448v2 [cs.CL] 17 Jun 2019

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

ask_ai_app

粉丝: 20
资源: 326

A修复训练与推断的误差1

评论0

最新资源

A修复训练与推断的误差1

评论0

低精度表示用于深度学习训练与推断.pdf

沪科版化学一轮复习训练物质推断题无答案精选.doc

人教版九年级化学物质的转化与推断通关训练(7)创新型推断.pdf

有机合成与推断专项训练单元达标同步练习.zip

贝叶斯方法 概率编程与贝叶斯推断

空间抽样与统计推断

2020届高考化学二轮复习主观题综合训练：有机合成与推断.docx

2017版高考化学专题专项训练八有机合成与推断新人教版.doc

变分推断原理1

贝叶斯统计推断介绍

广西壮族自治区高中化学有机合成与推断专项训练专题复习含答案.doc

统计推断+中文版

测量误差对统计推断的影响及其对策

天津专用2020高考化学二轮复习热点专攻13有机合成与推断训练含解析202001101147

广东省揭阳真理中学九年级化学 推断题专题训练1

广西专用2020版高考化学二轮复习非选择题专项训练5有机合成与推断选修含解析

题型检测（十三） 有机合成与推断题.doc

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

软件工程导论(第六版)课后习题答案1

STM32F103C8T6核心板-电路原理图1.PDF

OpenVAS离线资源

最新资源

贝叶斯方法概率编程与贝叶斯推断

广东省揭阳真理中学九年级化学推断题专题训练1

题型检测（十三）有机合成与推断题.doc