使用ActionN-Gram模型增强Shift-Reduce成分分析资源-CSDN文库

134 浏览量 2021-03-26 11:25:14 上传评论收藏 555KB PDF 举报

在这篇题为《使用Action N-Gram模型增强Shift-Reduce成分分析》的论文中，作者们探讨了一种新的方法来提升基于Shift-Reduce的成分分析的精确度。成分分析是自然语言处理领域中的一项重要技术，用于解析句子的语法结构。传统的基于转移的方法通过执行一系列的shift-reduce操作来解析句子，这些操作基于对句子的语法成分进行分类。与基于图表的传统分析方法相比，基于转移的解析器在执行速度上更快，尤其适合于数据驱动的方法。这篇论文的主要贡献是提出了一个名为Action N-Gram的模型，该模型通过使用动作序列来解决解析歧义。具体而言，Action N-Gram模型在训练时利用了n-gram估计方法，这为特定的动作历史提供了平滑的最大似然估计。研究发现，将Action N-Gram模型结合到先进的解析框架中，可以在多种语言的多个数据集上实现解析准确性的提升。在此背景下，论文首先介绍了基于转移的解析方法的工作原理，这些方法以线性时间运行，比传统的基于图表的解析器更加高效。然而，现有的基于转移的解析器在理解解析动作上下文方面存在局限性。作者指出，目前这些解析器通过包含大量二进制指示特征的判别模型来“理解”解析动作的上下文。这种做法虽然在某些情况下有效，但在处理歧义时往往不够精准。为了解决上述问题，论文提出了Action N-Gram模型，它利用动作序列来辅助进行解析的消除歧义。这个模型在训练时采用n-gram估计方法，该方法为特定动作历史提供了平滑的最大似然估计。由于n-gram模型能够捕捉到动作序列之间的依赖性，它可以帮助解析器更准确地预测下一次将要执行的动作。在实现上，论文展示了将Action N-Gram模型集成到最先进的解析框架中所带来的解析准确性提升。通过在多个数据集上进行实验，研究者证实了这一模型的有效性，不论是在中文还是英文数据集上。这种提升在不同语言的三个不同数据集上都得到了体现。为了支持这一研究，论文提到了得到了中国国家自然科学基金和江苏省自然科学基金的支持。研究作者来自南京大学、南京师范大学、新加坡科技设计大学等机构，显示了跨机构的学术合作。论文使用了ACM参考格式记录了相关的参考文献，体现了学术论文中引用的规范性。整体而言，这篇论文为自然语言处理中的成分分析提供了新的理论和技术支持。它所提出的Action N-Gram模型不仅可以应用于成分分析，也为其他需要动作序列预测的领域提供了新的思路。通过这种模型，可以提高解析器在面对复杂句子结构时的解析能力，从而更准确地理解句子的语法结构和语义内容。

资源推荐

资源详情

资源评论

Enhancing Shift-Reduce Constituent Parsing with Action N-Gram

Model

HAO ZHOU, Nanjing University

SHUJIAN HUANG, Nanjing University

JUNSHENG ZHOU, Nanjing Normal University

YUE ZHANG, Singapore University of Technology and Design

HUADONG CHEN, Nanjing University

XINYU DAI, Nanjing University

CHUAN CHENG, Nanjing University

JIAJUN CHEN, Nanjing University

Current data-driven shift-reduce parsers ’understand’ the context of parser actions by embodying large num-

bers of binary indicator features with a discriminative model. In this paper, we propose the action n-gram

model, which utilizes the action sequence for parsing disambiguation. The action n-gram model is trained on

action sequences with the n-gram estimation method, which gives a smoothed maximum likelihood estima-

tion of the action probability for a speciﬁc action history. We show that incorporating action n-gram models

into a state-of-the-art parsing framework could achieve parsing accuracy improvements on three data sets

across two languages.

Categories and Subject Descriptors: I.2.7 [Artiﬁcial Intelligence]: Natural Language Processing—Syntax

Parsing

General Terms: Languages, Experiments

Additional Key Words and Phrases: Shift-Reduce Constituent Parsing, Action History, Action N-gram Model

ACM Reference Format:

Hao Zhou, Shujian Huang, Junsheng Zhou, Yue Zhang, Huadong Chen, Xinyu Dai, Chuan Cheng and Jiajun

Chen, 2014. Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Model. ACM Trans. Asian

Lang. Inform. Process. 9, 4, Article 1 (September YYYY), 17 pages.

DOI:http://dx.doi.org/10.1145/0000000.0000000

1. INTRODUCTION

Modern data-driven transition-based parsers parse a sentence by performing a se-

quence of shift-reduce actions. Most of these transition-based parsers run in linear

time, which is faster than traditional chart-based parsers [Eisner 1996; Collins 1997;

Charniak 2000; McDonald et al. 2005]. The linear parsers achieve state-of-the-art

This work was supported by National Natural Science Foundation of China (61300158, 61170181, 61472191)

and Natural Science Foundation of Jiangsu Province, China (BK20130580).

Author’s addresses: Hao Zhou, Shujian Huang

(corresponding author), Huadong Chen, Xinyu Dai,

Chuan Cheng and Jiajun Chen, State Key Laboratory for Novel Software Technology, Nanjing Univer-

sity, Nanjing, China; Junsheng Zhou, Department of Computer Science and Technology, Nanjing Nor-

mal University, Nanjing, China; Yue Zhang, Singapore University of Technology and Design, Singa-

pore; Email: zhouh@nlp.nju.edu.cn, huangsj@nlp.nju.edu.cn, zhoujs@njnu.edu.cn, yue zhang@sutd.edu.sg,

chenhd@nlp.nju.edu.cn, daixy@nlp.nju.edu.cn, chengc@nlp.nju.edu.cn, chenjj@nlp.nju.edu.cn.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted

without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that

copies show this notice on the ﬁrst page or initial screen of a display along with the full citation. Copyrights

for components of this work owned by others than ACM must be honored. Abstracting with credit is per-

mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component

of this work in other works requires prior speciﬁc permission and/or a fee. Permissions may be requested

from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)

869-0481, or permissions@acm.org.

 YYYY ACM 1530-0226/YYYY/09-ART1 $15.00

DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on Asian Language Information Processing, Vol. 9, No. 4, Article 1, Publication date: September YYYY.

1:2 H. Zhou et al.

parsing performance

and are widely used in both dependency [Yamada and Mat-

sumoto 2003; Nivre et al. 2007; Zhang and Nivre 2011; Goldberg and Nivre 2013] and

constituent [Sagae and Lavie 2006; Wang et al. 2006; Zhang and Clark 2009] parsing

tasks.

Traditional features used by shift-reduce parsers can be divided into 3 mainly cat-

egories: lexical features, POS features and constituent label or dependency label fea-

tures. More complex features can be obtained by combining these atomic features.

Much work has been proposed to exploit the feature context. For example, high-order

dependency arcs and arc labels have been used for dependency parsing [Zhang and

Nivre 2011]. Joint segmentation, part-of-speech (POS) tagging and parsing [Hatori

et al. 2012; Zhang et al. 2013] exploits larger context by making use of features from

the lexical analysis tasks to achieve better performance. Huang et al. [2009] propose

to utilize the bilingual word alignment and ordering information to disambiguate the

shift-reduce dependency parsing. Chen et al. [2012] take advantage of the dependency

language model [Shen et al. 2008] to enhance the accuracy of dependency parsing and

achieve signiﬁcant accuracy improvement. These methods focus on either extracting

high order and long distance patterns from local parts of syntax trees, or calculating

lexical statistics from a large, usually unannotated, corpus. The previous work do not

pay enough attentions to the syntax tree structure information.

In shift-reduce constituent parsing, parser actions can be a useful representation

of syntax tree structures, which has not been fully exploited by previous work. Dur-

ing shift-reduce constituent parsing process, parsers select a SHIFT or REDUCE ac-

tion at each step. The actions used during parsing a sentence compose an action

sequence. Typically, for constituent parsing, each action sequence corresponds to a

unique binary-headed syntax tree (a binary tree with marked head child of each tree

node), and each binary-headed syntax tree also corresponds to a unique action se-

quence

. As a result, action sequences could be regarded as linearized versions of tree

structures and may be helpful to parsing disambiguation.

To exploit the distribution of action sequences, we propose the action n-gram model

(ANM). The action n-gram model is trained on the action sequences with statistical n-

gram estimation method [Chen and Goodman 1996]. To make use of larger context, we

incorporate POS tags and lexical information to enhance the basic action sequence. We

construct a log-linear interpolated model over the action sequence to represent various

parsing context information. With the action n-gram model, we can get the probabil-

ity of each action given the former parsing action sequence. Instead of extracting the

binary indicator features of shift-reduce parsing context,the action n-gram model uti-

lize these action sequences directly to do the runtime parse disambiguation, which is

a different way to exploit the context of shift-reduce parsing.

Additionally, we enhance shift-reduce parsing performance with action n-gram mod-

els. Adding dense features such as action n-gram models into the parsing framework

is not a trivial task. The structure perceptron [Collins and Roark 2004] works poorly

when the model contains millions of binary features and only a few dense features. We

propose three different methods to incorporate action n-gram models into a state-of-

the-art parsing framework. The ﬁrst directly adds the scores of action n-gram models

as real-valued features into decoding; the second adds the action n-grams into pars-

ing as binary indicator features; the last adopts a cascaded model [Jiang et al. 2008]

Higher performance could be obtained by employing more complex decoding methods [Goldberg and El-

hadad 2010; Huang and Sagae 2010], which do not run in linear complexity and are not in the scope of this

paper.

In shift-reduce dependency parsing or CCG parsing, each generated dependency tree does not correspond

to unique action sequence due to spurious ambiguation.

ACM Transactions on Asian Language Information Processing, Vol. 9, No. 4, Article 1, Publication date: September YYYY.

Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Models 1:3

to incorporate action n-gram models into parsing. In experiments, we ﬁnd that action

n-gram models with cascaded model outperforms the other methods in both efﬁciency

and accuracy.

We conduct different experiments to validate the effectiveness of action n-gram mod-

els. First, we show that the action n-gram model scores have similar Pearson’s corre-

lation coefﬁcient compared to the baseline discriminative model scores. This conﬁrms

that action n-gram models are able to distinguish good and bad predicted parsing tress.

We also calculate the perplexities of action n-gram models on various data sets and

conclude that action n-gram models could be well trained on limited training data.

Finally, we compare performances of the resulting parser with much precious work.

The resulting parser achieves accuracy improvements over a state-of-the-art baseline

parser on three treebanks; by 0.7% absolute improvement on CTB2.0, 0.5% on CTB5.1

and 0.5% on WSJ.

The rest of this paper is organized as follows. In Section 2, we review the related

work. Section 3 introduces the framework of shift-reduce constituent parser. Section 4

represents the main idea of the action n-gram model. Section 5 introduces three strate-

gies we adopted to incorporate action n-gram models into parsing. Section 6 shows the

advantages of action n-gram models experiments. Section 6.4 discusses the usage of

the action n-gram model in more details. Finally, we conclude the paper and brieﬂy

outline future work in Section 7.

2. RELATED WORK

2.1. Exploit the Context of Shift-Reduce Parsing

Much work has been proposed for deep exploiting the parsing context, which achieved

accuracy improvements in both constituent parsing and dependency parsing. Shen et

al. [2008] proposed a dependency language model to exploit the long distance word

relations in statistic machine translation. The n-gram dependency language model

predicts the next child of a head based on the n-1 immediate previous children. The

dependency language model was also used by Chen et al. [2012] to enhance accura-

cies of dependency parsing. Zhu et al. [2013] used the dependency language model

to capture structural relations in shift-reduce constituent parser. Besides the depen-

dency language model, they also employed word clustering and lexical dependencies in

their system. Zhang and Nivre [2011] attempted to enhance the dependency parsing

by employing a lot of high-order dependency arc and arc label features.

Features from different natural language processing tasks and multi-lingual data

are also used in parsing. For example, joint segmentation, part-of-speech (POS) tagging

and parsing model [Hatori et al. 2012; Zhang et al. 2013] exploit larger context by

making using of features from the lexical analysis tasks to achieve better performance.

Huang and et al. [2009] proposed to use the bilingual word alignment and ordering

information as constraint features to guide the shift-reduce dependency parsing.

Reranking methods [Charniak and Johnson 2005; McClosky et al. 2006; Huang

2008] upon generative chart-parser offer a different way to explore the effects of non-

local features in syntax parsing. The state-of-the-art reranking parsers achieved much

higher accuracy than other parsers which indicates that the non-local features are

very helpful to syntax parsing.

2.2. Using Action for Parsing Disambiguation

Briscoe and Carrol [1993] described work towards the construction of a probabilis-

tic parsing system for natural language, based on the LR parsing technique. They

proposed to associate probabilities to transitions in a generalized LR parsing frame-

work [Tomita 1987]. They combined parsing action with current parsing state, looka-

ACM Transactions on Asian Language Information Processing, Vol. 9, No. 4, Article 1, Publication date: September YYYY.

剩余16页未读，继续阅读

评论收藏

内容反馈

weixin_38625708

粉丝: 4
资源: 944

使用Action N-Gram模型增强Shift-Reduce成分分析

N-gram语言模型

nlp数据包 用于分词，n-gram模型，情感分析等

一种基于N-gram模型和机器学习的汉语分词算法.pdf

word2vec Skip-Gram模型的简单实现

基于N-Gram的计算机病毒特征码自动提取的改进方法.7z

N-gram特征提取

基于n-gram的文本分类

ngram模型分词与统计算法.zip_NGram 算法_ngram 分词_ngram模型分词与统计算法_n元模型_按n-gram

n-gram-tree:用Java编写的n-gram模型

第五章n-gram.ppt

NLP实验，实现了词频统计，句子生成和n-gram模型进行分词

哈工大 智能技术与自然语言处理技术课程 NLP系列课程 第05章 n-gram语言模型 共78页.ppt

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

最新资源

nlp数据包用于分词，n-gram模型，情感分析等

哈工大智能技术与自然语言处理技术课程 NLP系列课程第05章 n-gram语言模型共78页.ppt