没有合适的资源?快使用搜索试试~ 我知道了~
bilstm_crf论文1
试读
11页
需积分: 0 0 下载量 135 浏览量
更新于2022-08-03
收藏 408KB PDF 举报
标题“bilstm_crf论文1”涉及的是自然语言处理(NLP)领域中的命名实体识别(NER)技术,这是一种识别文本中具有特定意义的实体,如人名、地名、组织名等的任务。该论文提出了两种新的神经网络架构,旨在解决传统NER系统对手工特征和领域专业知识的依赖,以及在小规模监督训练数据上的学习效果问题。
1. 双向LSTM-CRF架构:
这种架构结合了双向长短期记忆网络(BiLSTM)和条件随机场(CRF)。BiLSTM是一种循环神经网络(RNN)变体,它能够同时考虑一个词的前后上下文信息,对于序列标注任务非常有效。CRF则是一种统计建模方法,用于预测序列中每个元素的标记,它可以捕捉标记之间的全局依赖性,避免局部最优解。在NER中,BiLSTM用于捕捉词的语义表示,而CRF则负责序列化的标签决策,以提高整体预测的准确性。
2. 基于转移的Stack-LSTM架构:
这种方法受到移位减少解析器的启发,采用转换基础的方法来构造和标记词段。Stack-LSTM(堆栈LSTM)通过维护一个词栈和操作历史,模仿了移位减少过程。这种方法可以动态地生成和更新实体边界,适应NER中不同长度和类型的实体。
3. 字符级和词级表示学习:
论文中的模型利用两种来源的信息来表示单词:一是从监督数据中学习的字符级表示,这有助于捕获词汇的形态信息;二是从未经注释的大量文本中学习的无监督词表示,如词嵌入,可以提供更广泛的语义理解。这两种表示相结合,提高了模型在小规模标注数据上的泛化能力。
4. 语言无关性和资源效率:
提出的模型在没有使用任何特定语言知识或资源(如地名词典)的情况下,在四种语言的NER任务上达到了最先进的性能。这表明,这些模型对新语言和新领域的适应性更强,降低了NER系统开发的门槛。
5. 少样本学习和无监督预训练:
论文强调了从无标注数据中进行无监督学习的重要性,以提高在少量监督数据上的泛化性能。这种方法对于资源有限的语言和领域尤其有价值,因为它允许模型从大规模未标注数据中学习通用的表示和模式。
"bilstm_crf论文1"主要贡献了两个创新的神经网络模型,即BiLSTM-CRF和基于Stack-LSTM的NER系统,它们有效地融合了深度学习与统计建模,减少了对人工特征和特定领域知识的依赖,实现了在多种语言上的优秀NER性能,并且具有良好的语言无关性和资源效率。
Neural Architectures for Named Entity Recognition
Guillaume Lample
♠
Miguel Ballesteros
♣♠
Sandeep Subramanian
♠
Kazuya Kawakami
♠
Chris Dyer
♠
♠
Carnegie Mellon University
♣
NLP Group, Pompeu Fabra University
{glample,sandeeps,kkawakam,cdyer}@cs.cmu.edu,
miguel.ballesteros@upf.edu
Abstract
State-of-the-art named entity recognition sys-
tems rely heavily on hand-crafted features and
domain-specific knowledge in order to learn
effectively from the small, supervised training
corpora that are available. In this paper, we
introduce two new neural architectures—one
based on bidirectional LSTMs and conditional
random fields, and the other that constructs
and labels segments using a transition-based
approach inspired by shift-reduce parsers.
Our models rely on two sources of infor-
mation about words: character-based word
representations learned from the supervised
corpus and unsupervised word representa-
tions learned from unannotated corpora. Our
models obtain state-of-the-art performance in
NER in four languages without resorting to
any language-specific knowledge or resources
such as gazetteers.
1
1 Introduction
Named entity recognition (NER) is a challenging
learning problem. One the one hand, in most lan-
guages and domains, there is only a very small
amount of supervised training data available. On the
other, there are few constraints on the kinds of words
that can be names, so generalizing from this small
sample of data is difficult. As a result, carefully con-
structed orthographic features and language-specific
knowledge resources, such as gazetteers, are widely
used for solving this task. Unfortunately, language-
specific resources and features are costly to de-
velop in new languages and new domains, making
NER a challenge to adapt. Unsupervised learning
1
The code of the LSTM-CRF and Stack-LSTM NER
systems are available at https://github.com/
glample/tagger and https://github.com/clab/
stack-lstm-ner
from unannotated corpora offers an alternative strat-
egy for obtaining better generalization from small
amounts of supervision. However, even systems
that have relied extensively on unsupervised fea-
tures (Collobert et al., 2011; Turian et al., 2010;
Lin and Wu, 2009; Ando and Zhang, 2005b, in-
ter alia) have used these to augment, rather than
replace, hand-engineered features (e.g., knowledge
about capitalization patterns and character classes in
a particular language) and specialized knowledge re-
sources (e.g., gazetteers).
In this paper, we present neural architectures
for NER that use no language-specific resources
or features beyond a small amount of supervised
training data and unlabeled corpora. Our mod-
els are designed to capture two intuitions. First,
since names often consist of multiple tokens, rea-
soning jointly over tagging decisions for each to-
ken is important. We compare two models here,
(i) a bidirectional LSTM with a sequential condi-
tional random layer above it (LSTM-CRF; §2), and
(ii) a new model that constructs and labels chunks
of input sentences using an algorithm inspired by
transition-based parsing with states represented by
stack LSTMs (S-LSTM; §3). Second, token-level
evidence for “being a name” includes both ortho-
graphic evidence (what does the word being tagged
as a name look like?) and distributional evidence
(where does the word being tagged tend to oc-
cur in a corpus?). To capture orthographic sen-
sitivity, we use character-based word representa-
tion model (Ling et al., 2015b) to capture distribu-
tional sensitivity, we combine these representations
with distributional representations (Mikolov et al.,
2013b). Our word representations combine both of
these, and dropout training is used to encourage the
model to learn to trust both sources of evidence (§4).
Experiments in English, Dutch, German, and
Spanish show that we are able to obtain state-
arXiv:1603.01360v3 [cs.CL] 7 Apr 2016
of-the-art NER performance with the LSTM-CRF
model in Dutch, German, and Spanish, and very
near the state-of-the-art in English without any
hand-engineered features or gazetteers (§5). The
transition-based algorithm likewise surpasses the
best previously published results in several lan-
guages, although it performs less well than the
LSTM-CRF model.
2 LSTM-CRF Model
We provide a brief description of LSTMs and CRFs,
and present a hybrid tagging architecture. This ar-
chitecture is similar to the ones presented by Col-
lobert et al. (2011) and Huang et al. (2015).
2.1 LSTM
Recurrent neural networks (RNNs) are a family
of neural networks that operate on sequential
data. They take as input a sequence of vectors
(x
1
, x
2
, . . . , x
n
) and return another sequence
(h
1
, h
2
, . . . , h
n
) that represents some information
about the sequence at every step in the input.
Although RNNs can, in theory, learn long depen-
dencies, in practice they fail to do so and tend to
be biased towards their most recent inputs in the
sequence (Bengio et al., 1994). Long Short-term
Memory Networks (LSTMs) have been designed to
combat this issue by incorporating a memory-cell
and have been shown to capture long-range depen-
dencies. They do so using several gates that control
the proportion of the input to give to the memory
cell, and the proportion from the previous state to
forget (Hochreiter and Schmidhuber, 1997). We use
the following implementation:
i
t
= σ(W
xi
x
t
+ W
hi
h
t−1
+ W
ci
c
t−1
+ b
i
)
c
t
= (1 − i
t
) c
t−1
+
i
t
tanh(W
xc
x
t
+ W
hc
h
t−1
+ b
c
)
o
t
= σ(W
xo
x
t
+ W
ho
h
t−1
+ W
co
c
t
+ b
o
)
h
t
= o
t
tanh(c
t
),
where σ is the element-wise sigmoid function, and
is the element-wise product.
For a given sentence (x
1
, x
2
, . . . , x
n
) containing
n words, each represented as a d-dimensional vector,
an LSTM computes a representation
−→
h
t
of the left
context of the sentence at every word t. Naturally,
generating a representation of the right context
←−
h
t
as well should add useful information. This can be
achieved using a second LSTM that reads the same
sequence in reverse. We will refer to the former as
the forward LSTM and the latter as the backward
LSTM. These are two distinct networks with differ-
ent parameters. This forward and backward LSTM
pair is referred to as a bidirectional LSTM (Graves
and Schmidhuber, 2005).
The representation of a word using this model is
obtained by concatenating its left and right context
representations, h
t
= [
−→
h
t
;
←−
h
t
]. These representa-
tions effectively include a representation of a word
in context, which is useful for numerous tagging ap-
plications.
2.2 CRF Tagging Models
A very simple—but surprisingly effective—tagging
model is to use the h
t
’s as features to make indepen-
dent tagging decisions for each output y
t
(Ling et
al., 2015b). Despite this model’s success in simple
problems like POS tagging, its independent classifi-
cation decisions are limiting when there are strong
dependencies across output labels. NER is one such
task, since the “grammar” that characterizes inter-
pretable sequences of tags imposes several hard con-
straints (e.g., I-PER cannot follow B-LOC; see §2.4
for details) that would be impossible to model with
independence assumptions.
Therefore, instead of modeling tagging decisions
independently, we model them jointly using a con-
ditional random field (Lafferty et al., 2001). For an
input sentence
X = (x
1
, x
2
, . . . , x
n
),
we consider P to be the matrix of scores output by
the bidirectional LSTM network. P is of size n × k,
where k is the number of distinct tags, and P
i,j
cor-
responds to the score of the j
th
tag of the i
th
word
in a sentence. For a sequence of predictions
y = (y
1
, y
2
, . . . , y
n
),
we define its score to be
s(X, y) =
n
X
i=0
A
y
i
,y
i+1
+
n
X
i=1
P
i,y
i
剩余10页未读,继续阅读
资源推荐
资源评论
130 浏览量
148 浏览量
2023-10-24 上传
2021-02-04 上传
5星 · 资源好评率100%
2021-05-09 上传
2024-12-04 上传
2024-05-16 上传
2021-02-08 上传
113 浏览量
5星 · 资源好评率100%
176 浏览量
5星 · 资源好评率100%
2021-04-11 上传
2021-05-16 上传
5星 · 资源好评率100%
2021-02-11 上传
2021-05-24 上传
5星 · 资源好评率100%
126 浏览量
116 浏览量
2021-02-03 上传
2021-02-06 上传
2024-03-02 上传
166 浏览量
5星 · 资源好评率100%
108 浏览量
150 浏览量
资源评论
江水流春去
- 粉丝: 50
- 资源: 352
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 冷拉墙板制袋机(含工程图)sw20可编辑全套技术资料100%好用.zip
- 基于小程序的农业电商服务系统源码(小程序毕业设计完整源码+LW).zip
- 可调角度切割机sw18可编辑全套技术资料100%好用.zip
- 基于小程序的农产品自主供销小程序源码(小程序毕业设计完整源码+LW).zip
- 仓储系统web端 vue
- 基于JavaScript的签到管理系统设计源码
- 基于小程序的医笙小程序设计与前端开发源码(小程序毕业设计完整源码).zip
- 仓储系统APP端,uniapp
- 螺旋输送机sw17全套技术资料100%好用.zip
- 基于小程序的医院核酸检测预约挂号源码(小程序毕业设计完整源码+LW).zip
- 密封圈安装机sw18可编辑全套技术资料100%好用.zip
- 基于小程序的医院预约挂号系统小程序源码(小程序毕业设计完整源码+LW).zip
- 基于小程序的同城交易小程序源码(小程序毕业设计完整源码).zip
- 基于小程序的在线办公小程序源码(小程序毕业设计完整源码+LW).zip
- 面板自动上料热熔机(含DFM,BOM)sw17可编辑全套技术资料100%好用.zip
- 奶瓶灌装线step全套技术资料100%好用.zip
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功