基于骨架的翻译的混合方法资源-CSDN文库

82 浏览量 2021-03-19 22:57:14 上传评论收藏 186KB PDF 举报

基于骨架的翻译混合方法是一种结合了句子骨架信息的机器翻译方法。机器翻译（Machine Translation，MT）是指利用计算机技术来自动翻译文本或语音内容的过程。文章中提到的“骨架”是指句子的关键元素或结构，它们被认为是构成句子整体意义的基础。混合方法是指使用骨架翻译模型来翻译输入句子的关键元素，然后用全面翻译模型来覆盖剩余的部分。当前统计机器翻译（Statistical Machine Translation，SMT）方法将翻译问题建模为生成原子翻译单元派生的过程，通常假设每个单元都来源于同一个模型。其中最简单的是基于短语的方法，该方法使用全局模型来处理输入句子的任何子字符串。在翻译句子时，翻译系统需要不断增加源语言单词序列，直到覆盖整个句子。然而，这种方法忽略了每个源语言单词的角色，与翻译人员通常使用的策略有所不同。比如，在人类翻译中，通常会先翻译句子的关键元素或结构（即骨架），然后翻译剩余的部分。这一点对某些语言特别有意义，比如中文，因为这些语言通常涉及复杂的句子结构。因此，源语言结构信息在基于句法的翻译模型的研究中得到了深入的探讨。文章中提到的研究将句子骨架信息明确地考虑到了机器翻译中。具体做法是用骨架翻译模型翻译输入句子的关键元素，然后用全面翻译模型覆盖剩余的部分。该方法应用于目前最先进的基于短语的系统，并在NIST中文-英文机器翻译评估数据上展示了非常有前景的BLEU改进和TER（翻译错误率）的减少。文章中提到的BLEU是一种评价机器翻译质量的指标，它衡量机器翻译与一组高质量的人工翻译之间的相似度。BLEU值越高，表示翻译质量越高。而TER是另一种衡量机器翻译质量的指标，它计算的是将机器翻译的输出转换成人工作品所需的最小编辑距离（例如插入、删除、替换单词等操作）。TER越低，说明机器翻译的输出与人类翻译越接近，翻译质量越高。文章还提到了先前的研究工作，比如基于句法树的模型，这些模型使用Treebank注释，以及将源语言句法作为软约束的其他方法。文章中的研究方法在这些先前工作的基础上，特别针对中文这种复杂结构的语言，尝试捕捉句子中的关键结构，以提高机器翻译的质量。总结来说，基于骨架的翻译混合方法是一种利用句子关键结构信息的机器翻译技术。它通过先翻译句子的骨架部分，然后用全面翻译模型覆盖剩余部分的方法，试图模拟人类翻译策略，并且在机器翻译实践中取得了显著的翻译质量提升。这种方法尤其适用于句法结构复杂，如中文这样的语言。通过整合先进的翻译模型和源语言结构信息，该方法有望进一步推动机器翻译领域的发展。

资源推荐

资源详情

资源评论

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 563–568,

Baltimore, Maryland, USA, June 23-25 2014.

2014 Association for Computational Linguistics

A Hybrid Approach to Skeleton-based Translation

Tong Xiao†‡, Jingbo Zhu†‡, Chunliang Zhang†‡

† Northeastern University, Shenyang 110819, China

‡ Hangzhou YaTuo Company, 358 Wener Rd., Hangzhou 310012, China

{xiaotong,zhujingbo,zhangcl}@mail.neu.edu.cn

Abstract

In this paper we explicitly consider sen-

tence skeleton information for Machine

Translation (MT). The basic idea is that

we translate the key elements of the input

sentence using a skeleton translation mod-

el, and then cover the remain segments us-

ing a full translation model. We apply our

approach to a state-of-the-art phrase-based

system and demonstrate very promising

BLEU improvements and TER reductions

on the NIST Chinese-English MT evalua-

tion data.

1 Introduction

Current Statistical Machine Translation (SMT) ap-

proaches model the translation problem as a pro-

cess of generating a derivation of atomic transla-

tion units, assuming that every unit is drawn out

of the same model. The simplest of these is the

phrase-based approach (Och et al., 1999; Koehn

et al., 2003) which employs a global model to

process any sub-strings of the input sentence. In

this way, all we need is to increasingly translate

a sequence of source words each time until the

entire sentence is covered. Despite good result-

s in many tasks, such a method ignores the roles

of each source word and is somewhat differen-

t from the way used by translators. For exam-

ple, an important-ﬁrst strategy is generally adopt-

ed in human translation - we translate the key ele-

ments/structures (or skeleton) of the sentence ﬁrst,

and then translate the remaining parts. This es-

pecially makes sense for some languages, such as

Chinese, where complex structures are usually in-

volved.

Note that the source-language structural infor-

mation has been intensively investigated in recent

studies of syntactic translation models. Some of

them developed syntax-based models on complete

syntactic trees with Treebank annotations (Liu et

al., 2006; Huang et al., 2006; Zhang et al., 2008),

and others used source-language syntax as soft

constraints (Marton and Resnik, 2008; Chiang,

2010). However, these approaches suffer from

the same problem as the phrase-based counterpart

and use the single global model to handle differ-

ent translation units, no matter they are from the

skeleton of the input tree/sentence or other not-so-

important sub-structures.

In this paper we instead explicitly model the

translation problem with sentence skeleton infor-

mation. In particular,

• We develop a skeleton-based model which

divides translation into two sub-models: a

skeleton translation model (i.e., translating

the key elements) and a full translation model

(i.e., translating the remaining source words

and generating the complete translation).

• We develop a skeletal language model to de-

scribe the possibility of translation skeleton

and handle some of the long-distance word

dependencies.

• We apply the proposed model to Chinese-

English phrase-based MT and demonstrate

promising BLEU improvements and TER re-

ductions on the NIST evaluation data.

2 A Skeleton-based Approach to MT

2.1 Skeleton Identiﬁcation

The ﬁrst issue that arises is how to identify the

skeleton for a given source sentence. Many ways

are available. E.g., we can start with a full syntac-

tic tree and transform it into a simpler form (e.g.,

removing a sub-tree). Here we choose a simple

and straightforward method: a skeleton is obtained

by dropping all unimportant words in the origi-

nal sentence, while preserving the grammaticali-

ty. See the following for an example skeleton of a

Chinese sentence.

563

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余5页未读，立即下载

评论收藏

内容反馈

weixin_38705874

粉丝: 6
资源: 922

基于骨架的翻译的混合方法

基于主亚骨架的ATB-25混合料级配研究 (2011年)

基于句法骨架的翻译

基于道路骨架性的城市道路等级划分方法 (2011年)

电子功用-基于有机或有机无机杂化骨架的钙钛矿太阳电池制备方法

基于骨架模型的系列产品变型设计方法

行业分类-设备装置-基于骨架梁的大跨度隧道拱部二次衬砌钢筋绑扎施工方法.zip

基于双骨架模型的自顶向下设计方法 (2011年)

基于改进 Hilditch骨架的交通标志特征提取方法 (2011年)

基于改进骨架毛刺剪除的人体关节点定位方法

基于点云内骨架的分割算法

基于距离变换的多尺度连通骨架算法

基于骨架形态学处理和hough变换的道路线条检测算法matlab仿真【包括程序操作视频】

基于opencv的骨架提取代码

基于人体骨架和深度学习的学生课堂行为识别.pdf

一种基于骨架特征顺序编码的脱机手写体数字识别方法

数字图像处理图像的骨架生成和提取（Matlab）三种方法

步态-基于广谱多轴混合器+人体骨架的步态表征学习实现-附项目源码+流程教程-优质项目实战.zip

基于距离变换细化的骨架提取

骨架密实型沥青混合料集料级配设计方法

TS-TCN基于骨架的人体动作识别算法

电信设备-基于骨架信息的时不变及视不变的人体行为识别方法.zip

最新资源