散布的语义扩散内核，可消除词义歧义

需积分: 5 119 浏览量 2021-05-11 12:02:01 上传评论收藏 839KB PDF 举报

### 散布的语义扩散内核：一种用于词义消歧的新方法 #### 引言在自然语言处理（NLP）领域中，词义消歧（Word Sense Disambiguation, WSD）是一项长期的研究目标。其核心任务是识别文本中单词的具体含义。随着人工智能的发展和技术的进步，词义消歧技术对于提高计算机理解人类语言的能力至关重要。本文将深入探讨一种名为“散布的语义扩散内核”（Sprinkled Semantic Diffusion Kernel, SSDK）的方法，并分析它如何通过利用监督分类场景下的类别信息来改进词义消歧的效果。 #### 词义消歧（WSD）的重要性词义消歧是自然语言处理中的一个关键问题，因为它直接影响到机器翻译、信息检索、情感分析等众多领域的应用效果。例如，在英语中，“bank”这个词就有多个意义，可以指“银行”，也可以指“河岸”。当计算机处理含有这类多义词的文本时，准确地识别出具体含义对于理解和生成正确的上下文至关重要。 #### 传统方法与局限性传统的词义消歧方法通常依赖于词典或统计信息。其中，基于词典的方法需要大量的人工标注数据，成本较高；而基于统计的方法虽然可以自动获取信息，但在处理稀有词汇或多义词时往往效果不佳。此外，这些方法通常缺乏对词语上下文的理解能力，导致准确性受限。 #### 语义扩散内核（Semantic Diffusion Kernel, SDK）为了解决上述问题，研究者们提出了一种名为语义扩散内核的方法。这种方法通过在一个由词典和共现信息定义的图上执行扩散过程来建模语义相似度，从而平滑典型的“词袋”（Bag of Words, BOW）表示。这种方式有效地利用了上下文信息，提高了词义消歧的准确性。然而，这种扩散过程本质上是无监督的，无法充分利用训练文档中的类别信息。 #### 散布的语义扩散内核（SSDK）为了克服语义扩散内核的局限性，研究人员提出了散布的语义扩散内核（SSDK）。SSDK的基本思想是在术语-文档矩阵中加入类别信息作为额外的术语，并将其附加到训练文档中。这样一来，在增强后的术语-文档矩阵上进行扩散时，属于同一类别的词语会间接地被拉得更近，从而加强了类别特定的词语关联性。具体步骤如下： 1. **构建增强的术语-文档矩阵**：根据训练文档的类别信息构建一个增强的术语-文档矩阵。这一步骤的关键在于将类别信息编码为额外的术语，并将这些术语添加到原始文档中。 2. **执行扩散过程**：在增强的术语-文档矩阵上执行扩散过程。这一过程有助于平滑词向量之间的距离，并且由于加入了类别信息，使得属于相同类别的词语更加接近。 3. **利用支持向量机（SVM）进行分类**：使用支持向量机（SVM）或其他合适的分类器来识别文本中词语的具体意义。 #### 实验结果与评估研究人员使用多种基准数据集（如Senseval/Semeval）对SSDK进行了广泛的评估。实验结果表明，与传统的语义扩散内核相比，散布的语义扩散内核在词义消歧任务上的表现更为出色。特别是在处理那些具有多个可能意义的词语时，SSDK能够更好地利用训练数据中的类别信息，从而提高模型的准确性。 #### 结论散布的语义扩散内核是一种创新性的词义消歧方法，它通过在扩散过程中引入类别信息来增强同类词语之间的关联性，进而提高了词义消歧的准确性。未来的研究方向可以进一步探索如何结合更多类型的上下文信息以及开发更加高效的扩散算法，以期在自然语言处理领域取得更大的突破。

资源推荐

资源详情

资源评论

Engineering Applications of Artificial Intelligence 64 (2017) 43–51

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai

Sprinkled semantic diffusion kernel for word sense disambiguation

Tinghua Wang

a,b,

*, Wei Li

, Fulai Liu

, Jialin Hua

School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, PR China

Decision Systems and e-Service Intelligence Laboratory, Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology

Sydney, Broadway NSW 2007, Australia

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China

a r t i c l e i n f o

Keywords:

Word sense disambiguation (WSD)

Semantic diffusion kernel

Class information

Support vector machine (SVM)

Kernel method

a b s t r a c t

Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has

been a long-standing research objective for natural language processing (NLP). In this paper, we are concerned

with kernel methods for automatic WSD. Under this framework, the main difficulty is to design an appropriate

kernel function to represent the sense distinction knowledge. Semantic diffusion kernel, which models semantic

similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information to smooth

the typical ‘‘Bag of Words’’ (BOW) representation, has been successfully applied to WSD. However, the diffusion

is an unsupervised process, which fails to exploit the class information in a supervised classification scenario. To

address the limitation, we present a sprinkled semantic diffusion kernel to make use of the class knowledge of

training documents in addition to the co-occurrence knowledge. The basic idea is to construct an augmented term-

document matrix by encoding class information as additional terms and appending them to training documents.

Diffusion is then performed on the augmented term-document matrix. In this way, the words belonging to the

same class are indirectly drawn closer to each other, hence the class-specific word correlations are strengthened.

We evaluate our method on several Senseval/Semeval benchmark examples with support vector machine (SVM),

and show that the proposed kernel can significantly improve the disambiguation performance over semantic

diffusion kernel in terms of different measures and yield a competitive result with the state-of-the-art kernel

methods for WSD.

1. Introduction

Ambiguity is inherent to human language. Particularly, word sense

ambiguity is prevalent in all natural languages, with a large number of

words having more than one meaning. For instance, the English noun

bank can mean ‘‘sloping raised land, especially along the sides of a river’’ or

‘‘an organization where people and businesses can invest or borrow money,

convert to foreign money, etc. or a building where these services are offered’’.

The correct sense of an ambiguous word can be determined based on

the context where it occurs, and correspondingly the problem of word

sense disambiguation (WSD) is defined as the task of automatically

assigning the most appropriate meaning to a polysemous word in a given

context (Navigli, 2009). As a fundamental semantic understanding

task at the lexical level in natural language processing (NLP), WSD

can benefit many applications such as machine translation, information

retrieval, parsing, and question answering. WSD is considered to be a

key step in order to approach language understanding beyond keyword

Corresponding author at: School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, PR China.

E-mail address: wthgnnu@163.com (T. Wang).

matching (Agirre et al., 2014). Although WSD for human is essentially

a subconscious process and presents no difficulties, it is very difficult

to formalize the computational process of disambiguation since it is

classified among ‘‘AI-complete’’ problems (Turdakov, 2010), that is, it

is a task whose solution is at least as hard as the most difficult problems

in artificial intelligence.

Generally, WSD methods can be classified into two types: knowledge-

based and machine learning (Navigli, 2009; Raviv and Markovitch,

2012). Knowledge-based WSD systems exploit the information in a

lexical knowledge base, such as WordNet and Wikipedia, to perform

WSD. These approaches usually pick the sense whose definition is most

similar to the context of the ambiguous word, by means of textual

overlap or using graph-based measures (Abualhaija and Zimmermann,

2016; Agirre et al., 2009; Navigli and Lapata, 2010). Machine learning

approaches, also called corpus-based approaches, do not make use of

any knowledge resources for disambiguation. These approaches range

http://dx.doi.org/10.1016/j.engappai.2017.05.010

Received 30 December 2015; Received in revised form 9 December 2016; Accepted 15 May 2017

T. Wang et al. Engineering Applications of Artificial Intelligence 64 (2017) 43–51

from supervised learning in which a classifier is trained for each distinct

word on a corpus of manually sense-annotated examples, to completely

unsupervised methods that cluster occurrence of words, thereby induc-

ing senses. Recent advances in WSD have benefited greatly from the

availability of corpora annotated with word senses. Most accurate WSD

systems to date exploit supervised methods which automatically learn

cues useful for disambiguation from manually sense-annotated data.

For machine learning-based WSD systems, commonly used algo-

rithms include Naïve Bayesian model, decision trees, maximum en-

tropy, support vector machine (SVM), and so on. Among which, kernel

methods (Hofmann et al., 2008; Shawe-Taylor and Cristianini, 2004),

such as SVM, regularized least-squares classification (RLSC) and kernel

principal component analysis (KPCA), have demonstrated excellent

performance in terms of accuracy and robustness (Giuliano et al., 2009;

Gliozzo et al., 2005; Jin et al., 2008; Joshi et al., 2006; Lee and Ng, 2002;

Lee et al., 2004; Pahikkala et al., 2009; Popescu, 2004; Wang et al.,

2014,2013a,2015; Wu et al., 2004). Recently, Li et al. (2016) presented

an extensive survey of state-of-the-art in the field of kernel methods

for WSD, concentrating on issues such as data representation, kernel

selection and learning algorithms. Basically, kernel methods work by

mapping the data from the input space into a high-dimensional (possibly

infinite) feature space, which is usually chosen to be a reproducing

kernel Hilbert space (RKHS), and then building linear algorithms in the

feature space to implement nonlinear counterparts in the input space.

The mapping, rather than being given in an explicit form, is determined

implicitly by specifying a kernel function, which computes the inner

product between each pair of data points in the feature space. There are

several reasons that make kernel methods applicable to WSD and other

NLP problems (Li et al., 2016; Wang et al., 2015). Firstly, instead of man-

ual construction of feature space for the learning task, kernel functions

provide an alternative way to design useful features in the feature space

automatically, therefore, ensuring necessary representational power.

Secondly, kernel methods offer a flexible and efficient way to define

application-specific kernels for introducing background knowledge and

modeling explicitly linguistic insights. This property allows to notably

improve the performance of the general learning methods and their

simple adaptation to the specific application. Finally, kernel methods

can be naturally applied to the non-vectorial types of data, thus taking

into account the structure of the data and greatly reducing the need for

careful feature engineering in these structures.

From the point of view of modularization, kernel methods consist of

two main components, namely the kernel and actual learning algorithm.

The kernel can be considered as an interface between the input data and

the learning algorithm, and is the key component to ensure the good

performance of kernel methods (Shawe-Taylor and Cristianini, 2004;

Wang et al., 2009). Actually, for real applications, kernel is the only

task-specific component of kernel methods. In the domain of WSD, the

widely used kernel is the ‘‘Bag of Words’’ (BOW) kernel (Shawe-Taylor

and Cristianini, 2004), which is based on the BOW representation of the

context in which an ambiguous word occurs. In this representation, each

word or term constitutes a dimension in a vector space, independent

of other terms in the same context. Despite its ease of use, this kernel

suffers from well-known limitations, mostly due to its inability to

exploit semantic similarity between terms: contexts sharing terms that

are different but semantically related will be considered as unrelated.

To address this problem, a number of attempts have been made to

incorporate semantic knowledge into the BOW kernel, resulting in the

so-called semantic kernels (Shawe-Taylor and Cristianini, 2004). For

example: the semantic kernels that use the external semantic knowledge

provided by word thesauri or ontology were proposed to improve the

kernel-based WSD system (Jin et al., 2008; Joshi et al., 2006). In the

absence of external semantic knowledge, Latent Semantic Indexing (LSI)

technology was applied to capture the semantic relations between terms

(Giuliano et al., 2009; Gliozzo et al., 2005). More information about the

semantic kernel can be found in text categorization (Altınel et al., 2015;

Cristianini et al., 2002), which is a more general application domain

over WSD.

In our previous studies (Wang et al., 2014,2013a), we proposed to

apply the semantic diffusion kernel (Kandola et al., 2003) to improve

the WSD system. Semantic diffusion kernel can be obtained through

a matrix exponentiation transformation on the given kernel matrix,

and virtually exploits higher order co-occurrences to infer semantic

similarity between terms. Geometrically, this kernel models semantic

similarities by means of a diffusion process on a graph defined by

lexicon and co-occurrence information. However, the diffusion is an

unsupervised process, which fails to exploit the class information in a

classification scenario and may not be optimal for the supervised WSD

system. Chakraborti et al. (2006, 2007) introduced a simple approach

called ‘‘sprinkling’’ to incorporate class labels of documents into LSI. In

sprinkling, a set of class-specific artificial terms are appended to the rep-

resentations of documents of the corresponding class. LSI is then applied

on the sprinkled term-document space resulting in a concept space that

better reflects the underlying class distribution of documents. Recently,

this approach was also applied to sprinkle Latent Dirichlet Allocation

(LDA) topics for weakly supervised text classification (Hingmire and

Chakraborti, 2014). The inherent reason for this approach is that the

sprinkled term can add contribution to exploit the class information of

text documents in a classification procedure. Motivated by these works,

in this paper we present a sprinkled semantic diffusion kernel with

application to WSD. The basic idea is to construct an augmented term-

document matrix by encoding class information as additional terms and

appending them to training documents. Diffusion is then performed

on the augmented term-document matrix to learn the semantic matrix,

which is the key component of semantic kernels. In this way, the words

belonging to the same class are indirectly drawn closer to each other,

hence the class-specific word correlations are strengthened. Although

the idea behind the sprinkled semantic diffusion kernel is very similar to

that of sprinkled LSI, to the best of our knowledge, our work is the first

time to simultaneously exploit higher order co-occurrences and class

information to construct semantic smoothing kernel with application to

the supervised WSD task.

The remainder of this article is outlined as follows. Section 2

briefly introduces the kernel methods in general and SVM in particular.

Section 3 then details the proposed sprinkled semantic diffusion kernel

with application to WSD. The proposed kernel is demonstrated with

several Senseval/Semeval benchmark examples in Section 4, followed

by conclusions with some potential future points of the current work.

2. Kernel methods and SVM

Kernel methods have been highly successful in solving various

problems in machine learning and data mining community (Hofmann et

al., 2008; Shawe-Taylor and Cristianini, 2004). These methods map data

points from the input space to some feature space where even relatively

simple algorithms such as linear methods can deliver very impressive

performance. The most attractive feature of kernel methods is that they

can be applied in high dimensional feature spaces without suffering from

the high cost of explicitly computing the feature map. This is possible

with the kernel trick, i.e., using a valid kernel function 𝑘 on any set

X (input space). A function 𝑘(x, x

′

) is a valid kernel if and only if for

any finite set it produces symmetric and positive semi-definite Gram

matrices. For such 𝑘 ∶ X × X → R (R denotes the set of real numbers), it

is known that a mapping 𝝓 ∶ X → F (F denotes the feature space induced

by a kernel function) into a reproducing kernel Hilbert space (RKHS),

such that 𝑘(x, x

′

) = 𝜙(x), 𝜙(x

′

)for any x, x

′

∈ X.

Popular kernel

functions include linear kernel, polynomial kernel and Gaussian kernel.

We here consider the SVM, the most well-known kernel method

in practice. In a binary classification problem, we are given 𝑙 pairs of

training samples {(x

𝑖

, 𝑦

𝑖

)}

𝑙

𝑖=1

, where x

𝑖

∈ X and 𝑦

𝑖

∈ {+1, −1}. The

With a kernel 𝑘(x

𝑖

, x

𝑗

) for any x

𝑖

, x

𝑗

∈ X , the Gram matrix or kernel matrix is given





𝑖,𝑗

= 𝑘(𝑥

𝑖

, 𝑥

𝑗

) . Since the Gram matrix and kernel function are essentially equivalent,

we can refer to one or the other as ‘‘kernel’’ without risk of confusion.

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38584043

粉丝: 4
资源: 946

散布的语义扩散内核，可消除词义歧义

散布熵matlab代码

多尺度散布熵(matlab).rar

中文3DMAX四边形散布插件下载

信号处理、特征提取、精细复合多尺度散布熵RCMDE.m

散布图的绘制与计算.docx

散布图相关系数判定PPT教案.pptx

AI插件_心形散布

第十周QCC讲义(直方图、散布图).pptx

散布光斑笔刷.zip

基于Weibull分布的黄榆种子散布机制研究

石林喀斯特5种群落类型植物果实组成与种子散布特征数据集.rar

中文3DMAX四边形散布插件

一种步进的单元散布拥塞消除算法

正态散布随机数生成算法.pdf

高斯扩散模型matlab代码-MrVox2D:用Matlab编写的工具包，考虑了磁化率和水扩散的影响，可以模拟磁共振信号

类内和类间散布矩阵：该函数计算两个重要的矩阵，内（W）和之间（B）散布矩阵-matlab开发

基于CORDIC的反正弦和反余弦计算的FPGA实现

使用3DCNN和卷积LSTM进行手势识别学习时空特征

BA无标度网络中的SIR模型

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

最新资源