【免费】bilstm_crf论文1资源-CSDN文库

需积分: 0 135 浏览量更新于2022-08-03 收藏 408KB PDF 举报

标题“bilstm_crf论文1”涉及的是自然语言处理（NLP）领域中的命名实体识别（NER）技术，这是一种识别文本中具有特定意义的实体，如人名、地名、组织名等的任务。该论文提出了两种新的神经网络架构，旨在解决传统NER系统对手工特征和领域专业知识的依赖，以及在小规模监督训练数据上的学习效果问题。 1. 双向LSTM-CRF架构：这种架构结合了双向长短期记忆网络（BiLSTM）和条件随机场（CRF）。BiLSTM是一种循环神经网络（RNN）变体，它能够同时考虑一个词的前后上下文信息，对于序列标注任务非常有效。CRF则是一种统计建模方法，用于预测序列中每个元素的标记，它可以捕捉标记之间的全局依赖性，避免局部最优解。在NER中，BiLSTM用于捕捉词的语义表示，而CRF则负责序列化的标签决策，以提高整体预测的准确性。 2. 基于转移的Stack-LSTM架构：这种方法受到移位减少解析器的启发，采用转换基础的方法来构造和标记词段。Stack-LSTM（堆栈LSTM）通过维护一个词栈和操作历史，模仿了移位减少过程。这种方法可以动态地生成和更新实体边界，适应NER中不同长度和类型的实体。 3. 字符级和词级表示学习：论文中的模型利用两种来源的信息来表示单词：一是从监督数据中学习的字符级表示，这有助于捕获词汇的形态信息；二是从未经注释的大量文本中学习的无监督词表示，如词嵌入，可以提供更广泛的语义理解。这两种表示相结合，提高了模型在小规模标注数据上的泛化能力。 4. 语言无关性和资源效率：提出的模型在没有使用任何特定语言知识或资源（如地名词典）的情况下，在四种语言的NER任务上达到了最先进的性能。这表明，这些模型对新语言和新领域的适应性更强，降低了NER系统开发的门槛。 5. 少样本学习和无监督预训练：论文强调了从无标注数据中进行无监督学习的重要性，以提高在少量监督数据上的泛化性能。这种方法对于资源有限的语言和领域尤其有价值，因为它允许模型从大规模未标注数据中学习通用的表示和模式。 "bilstm_crf论文1"主要贡献了两个创新的神经网络模型，即BiLSTM-CRF和基于Stack-LSTM的NER系统，它们有效地融合了深度学习与统计建模，减少了对人工特征和特定领域知识的依赖，实现了在多种语言上的优秀NER性能，并且具有良好的语言无关性和资源效率。

Neural Architectures for Named Entity Recognition

Guillaume Lample

♠

Miguel Ballesteros

♣♠

Sandeep Subramanian

♠

Kazuya Kawakami

♠

Chris Dyer

♠

Carnegie Mellon University

♣

NLP Group, Pompeu Fabra University

{glample,sandeeps,kkawakam,cdyer}@cs.cmu.edu,

miguel.ballesteros@upf.edu

Abstract

State-of-the-art named entity recognition sys-

tems rely heavily on hand-crafted features and

domain-speciﬁc knowledge in order to learn

effectively from the small, supervised training

corpora that are available. In this paper, we

introduce two new neural architectures—one

based on bidirectional LSTMs and conditional

random ﬁelds, and the other that constructs

and labels segments using a transition-based

approach inspired by shift-reduce parsers.

Our models rely on two sources of infor-

mation about words: character-based word

representations learned from the supervised

corpus and unsupervised word representa-

tions learned from unannotated corpora. Our

models obtain state-of-the-art performance in

NER in four languages without resorting to

any language-speciﬁc knowledge or resources

such as gazetteers.

1 Introduction

Named entity recognition (NER) is a challenging

learning problem. One the one hand, in most lan-

guages and domains, there is only a very small

amount of supervised training data available. On the

other, there are few constraints on the kinds of words

that can be names, so generalizing from this small

sample of data is difﬁcult. As a result, carefully con-

structed orthographic features and language-speciﬁc

knowledge resources, such as gazetteers, are widely

used for solving this task. Unfortunately, language-

speciﬁc resources and features are costly to de-

velop in new languages and new domains, making

NER a challenge to adapt. Unsupervised learning

The code of the LSTM-CRF and Stack-LSTM NER

systems are available at https://github.com/

glample/tagger and https://github.com/clab/

stack-lstm-ner

from unannotated corpora offers an alternative strat-

egy for obtaining better generalization from small

amounts of supervision. However, even systems

that have relied extensively on unsupervised fea-

tures (Collobert et al., 2011; Turian et al., 2010;

Lin and Wu, 2009; Ando and Zhang, 2005b, in-

ter alia) have used these to augment, rather than

replace, hand-engineered features (e.g., knowledge

about capitalization patterns and character classes in

a particular language) and specialized knowledge re-

sources (e.g., gazetteers).

In this paper, we present neural architectures

for NER that use no language-speciﬁc resources

or features beyond a small amount of supervised

training data and unlabeled corpora. Our mod-

els are designed to capture two intuitions. First,

since names often consist of multiple tokens, rea-

soning jointly over tagging decisions for each to-

ken is important. We compare two models here,

(i) a bidirectional LSTM with a sequential condi-

tional random layer above it (LSTM-CRF; §2), and

(ii) a new model that constructs and labels chunks

of input sentences using an algorithm inspired by

transition-based parsing with states represented by

stack LSTMs (S-LSTM; §3). Second, token-level

evidence for “being a name” includes both ortho-

graphic evidence (what does the word being tagged

as a name look like?) and distributional evidence

(where does the word being tagged tend to oc-

cur in a corpus?). To capture orthographic sen-

sitivity, we use character-based word representa-

tion model (Ling et al., 2015b) to capture distribu-

tional sensitivity, we combine these representations

with distributional representations (Mikolov et al.,

2013b). Our word representations combine both of

these, and dropout training is used to encourage the

model to learn to trust both sources of evidence (§4).

Experiments in English, Dutch, German, and

Spanish show that we are able to obtain state-

arXiv:1603.01360v3 [cs.CL] 7 Apr 2016

of-the-art NER performance with the LSTM-CRF

model in Dutch, German, and Spanish, and very

near the state-of-the-art in English without any

hand-engineered features or gazetteers (§5). The

transition-based algorithm likewise surpasses the

best previously published results in several lan-

guages, although it performs less well than the

LSTM-CRF model.

2 LSTM-CRF Model

We provide a brief description of LSTMs and CRFs,

and present a hybrid tagging architecture. This ar-

chitecture is similar to the ones presented by Col-

lobert et al. (2011) and Huang et al. (2015).

2.1 LSTM

Recurrent neural networks (RNNs) are a family

of neural networks that operate on sequential

data. They take as input a sequence of vectors

, x

, . . . , x

) and return another sequence

, h

, . . . , h

) that represents some information

about the sequence at every step in the input.

Although RNNs can, in theory, learn long depen-

dencies, in practice they fail to do so and tend to

be biased towards their most recent inputs in the

sequence (Bengio et al., 1994). Long Short-term

Memory Networks (LSTMs) have been designed to

combat this issue by incorporating a memory-cell

and have been shown to capture long-range depen-

dencies. They do so using several gates that control

the proportion of the input to give to the memory

cell, and the proportion from the previous state to

forget (Hochreiter and Schmidhuber, 1997). We use

the following implementation:

= σ(W

+ W

t−1

+ W

t−1

+ b

)

= (1 − i

)  c

t−1

 tanh(W

+ W

t−1

+ b

)

= σ(W

+ W

t−1

+ W

+ b

)

= o

 tanh(c

where σ is the element-wise sigmoid function, and

 is the element-wise product.

For a given sentence (x

, x

, . . . , x

) containing

n words, each represented as a d-dimensional vector,

an LSTM computes a representation

−→

of the left

context of the sentence at every word t. Naturally,

generating a representation of the right context

←−

as well should add useful information. This can be

achieved using a second LSTM that reads the same

sequence in reverse. We will refer to the former as

the forward LSTM and the latter as the backward

LSTM. These are two distinct networks with differ-

ent parameters. This forward and backward LSTM

pair is referred to as a bidirectional LSTM (Graves

and Schmidhuber, 2005).

The representation of a word using this model is

obtained by concatenating its left and right context

representations, h

= [

−→

;

←−

]. These representa-

tions effectively include a representation of a word

in context, which is useful for numerous tagging ap-

plications.

2.2 CRF Tagging Models

A very simple—but surprisingly effective—tagging

model is to use the h

’s as features to make indepen-

dent tagging decisions for each output y

(Ling et

al., 2015b). Despite this model’s success in simple

problems like POS tagging, its independent classiﬁ-

cation decisions are limiting when there are strong

dependencies across output labels. NER is one such

task, since the “grammar” that characterizes inter-

pretable sequences of tags imposes several hard con-

straints (e.g., I-PER cannot follow B-LOC; see §2.4

for details) that would be impossible to model with

independence assumptions.

Therefore, instead of modeling tagging decisions

independently, we model them jointly using a con-

ditional random ﬁeld (Lafferty et al., 2001). For an

input sentence

X = (x

, x

, . . . , x

we consider P to be the matrix of scores output by

the bidirectional LSTM network. P is of size n × k,

where k is the number of distinct tags, and P

i,j

cor-

responds to the score of the j

tag of the i

word

in a sentence. For a sequence of predictions

y = (y

, y

, . . . , y

we deﬁne its score to be

s(X, y) =

i=0

i+1

i=1

i,y

剩余10页未读，继续阅读

资源推荐

资源评论

江水流春去

粉丝: 50
资源: 352

bilstm_crf论文1

经典lstm和crf机器学习论文

dataset and bilstm_crf

自然语言处理实体抽取算法基于pytorch框架bert+bilstm+crf

bilstm_crf.zip

基于 pytorch 实现 bert-bilstm-crf-ner 命名实体识别 完整代码+数据 可直接运行

BERT-BiLSTM-CRF-NER:NER任务的Tensorflow解决方案将BiLSTM-CRF模型与Google BERT微调和私有服务器服务结合使用

毕业设计基于Bert-Position-BiLSTM-Attention-CRF-LSTMDecoder的法律文书要素识别.zip

基于BERT-BILSTM-CRF进行中文命名实体识别python源码+数据（高分源码）.rar

结合GAN与BiLSTM_Att_省略_ion_CRF的领域命名实体识别

ChineseNER-master.zip_chinese ner bilstm_chinesener python_crf n

lstm+crf、bilstm+crf 、LSTM CRF pytorch 命名实体识别代码

基于pytorch的bert-bilstm-crf中文命名实体识别项目源码+文档说明.zip

KBQA-for-Diagnosis:知识图，问答系统，包括bilstm-crf，实体归一化，CasRel关系提取，SlotGate-SLU NLU模型等。

NER-Sequence-labeling--Textcnn-bilstm-crf-pytorch:pytorch用Textcnn-bilstm-crf模型实现命名实体识别

基于大数据和BiLSTM+CRF的网络空间安全领域命名实体识别研究.zip

名称命名识别：专注于研究CONLL2003数据库上各种NER系统的研究论文：Bi-LSTM-CRF，单词嵌入

scite:基于转移转移嵌入的自专心BiLSTM-CRF的因果关系提取

论文《End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF》的代码实现

bilstm_crf_code.tar.gz

named_entity_recognition：中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM + CRF的具体实现）

zh-NER-TF：用于中文命名实体识别的非常简单的BiLSTM-CRF模型中文命名实体识别（TensorFlow）

本科毕业设计：个人简历文档类型的抽取（Bilstm+CRF）、知识库建立（neo4j）、检索填报的全流程.zip

基于注意力的BiLSTM-CRF模型在中国临床命名实体识别中的应用

基于bert-wmm的微博评论情感分析

课程大作业-基于Bi-LSTM和CRF的中文语义角色标注python源码+文档说明

LSTM-CNNs-CRF.rar

最新资源

基于 pytorch 实现 bert-bilstm-crf-ner 命名实体识别完整代码+数据可直接运行