【免费】（有启发意义的文章）EEwithGenerativeAdversarialImitationLearning1资源-CSDN文库

需积分: 0 165 浏览量 2022-08-03 12:31:08 上传评论收藏 963KB PDF 举报

资源推荐

资源详情

资源评论

Joint Entity and Event Extraction with

Generative Adversarial Imitation Learning

Tongtao Zhang

†

, Heng Ji

†

and Avirup Sil

∗

†

Computer Science Department, Rensselaer Polytechnic Institute

∗

IBM Research AI

{zhangt13, jih}@rpi.edu and {[email protected]}

Abstract

We propose a new framework for entity and

event extraction based on generative adversar-

ial imitation learning – an inverse reinforce-

ment learning method using generative adver-

sarial network (GAN). We assume that in-

stances and labels yield to various extents of

difﬁculty and the gains and penalties (rewards)

are expected to be diverse. We utilize discrim-

inators to estimate proper rewards according

to the difference between the labels committed

by the ground-truth (expert) and the extractor

(agent). Our experiments demonstrate that the

proposed framework outperforms state-of-the-

art methods.

1 Introduction

Event extraction (EE) is a crucial information ex-

traction (IE) task that focuses on extracting struc-

tured information (i.e., a structure of event trigger

and arguments, “what is happening”, and “who or

what is involved”) from unstructured texts. For

example, in the sentence “Masih’s alleged com-

ments of blasphemy are punishable by death un-

der Pakistan Penal Code” shown in Figure 1,

there is a Sentence event (“punishable”) and

Execute event (“death”) involving the person

entity “Masih”. Most event extraction research

has been in the context of the 2005 NIST Au-

tomatic Content Extraction (ACE) sentence-level

event mention task (Walker et al., 2006), which

also provides the standard corpus. The annotation

guideline of the ACE program deﬁnes an event as

a speciﬁc occurrence of something that happens

involving participants, often described as a change

of state. More recently, the TAC KBP community

has introduced document-level event argument ex-

traction shared tasks for 2014 and 2015 (KBP EA).

In the last ﬁve years, many event extraction

approaches have brought forth encouraging re-

sults by retrieving additional related text docu-

ments (Song et al., 2015), introducing rich fea-

tures of multiple categories (Li et al., 2013; Zhang

et al., 2017b), incorporating relevant information

within or beyond context (Yang and Mitchell,

2016; Judea and Strube, 2016; Yang and Mitchell,

2017; Duan et al., 2017) and adopting neural net-

work frameworks (Chen et al., 2015; Nguyen and

Grishman, 2015; Feng et al., 2016; Nguyen et al.,

2016; Huang et al., 2016; Nguyen and Grish-

man, 2018; Sha et al., 2018; Huang et al., 2018;

Hong et al., 2018; Zhao et al., 2018; Nguyen and

Nguyen, 2018).

However, there are still challenging cases: for

example, in the following sentences: “Masih’s al-

leged comments of blasphemy are punishable by

death under Pakistan Penal Code” and “Scott is

charged with ﬁrst-degree homicide for the death

of an infant.”, the word death can trigger an

Execute event in the former sentence and a Die

event in the latter one. With similar local infor-

mation (word embeddings) or contextual features

(both sentences include legal events), supervised

models pursue the probability distribution which

resembles that in the training set (e.g., we have

overwhelmingly more Die annotation on death

than Execute), and will label both as a Die

event, causing error in the former instance.

Such mistake is due to the lack of a mecha-

nism that explicitly deals with wrong and confus-

ing labels. Many multi-classiﬁcation approaches

utilize cross-entropy loss, which aims at boost-

ing the probability of the correct labels and usu-

ally treat wrong labels equally and merely inhibits

them indirectly. Models are trained to capture fea-

tures and weights to pursue correct labels, but will

become vulnerable and unable to avoid mistakes

when facing ambiguous instances, where the prob-

abilities of the confusing and wrong labels are not

sufﬁciently “suppressed”. Therefore, exploring in-

formation from wrong labels is a key to make the

Masih'sallegedcommentsofblasphemyarepunishablebydeathunder...

Epoch 1 Epoch 10 Epoch 20

Epoch 30

Epoch 40

Reward

Estimator

Model

Sentence ExecutePER

Person

-0.9

Execute

Die

PER

Execute

Die

PER

Defendant

Reward

Estimator

Model

Execute

Die

PER

Execute

Die

PER

-2.3

Reward

Estimator

Model

Masih's

...death...

-5.7

Reward

Estimator

Model

Execute

Die

PER

Execute

Die

PER

6.4

Reward

Estimator

Model

...are

punishableby

...

Execute

Sentence

Execute

Sentence

1.7

Reward

Estimator

Model

Execute

Die

PER

Execute

Die

PER

-5.5

...bydeath

under...

...bydeath

under...

...bydeath

under...

...bydeath

under...

Place

N/A

Person

Agent

Place

N/A

Person

Agent

Figure 1: Our framework includes a reward estimator based on GAN to issue dynamic rewards with regard to the

labels (actions) committed by event extractor (agent). The reward estimator is trained upon the difference between

the labels from ground truth (expert) and extractor (agent). If the extractor repeatedly misses Execute label

for “death”, the penalty (negative reward values) is strengthened; if the extractor make surprising mistakes: label

“death” as Person or label Person “Masih” as Place role in Sentence event, the penalty is also strong.

For cases where extractor is correct, simpler cases such as Sentence on “death” will take a smaller gain while

difﬁcult cases Execute on “death” will be awarded with larger reward values.

models robust.

In this paper, to combat the problems of previ-

ous approaches towards this task, we propose a dy-

namic mechanism – inverse reinforcement learn-

ing – to directly assess correct and wrong labels

on instances in entity and event extraction. We as-

sign explicit scores on cases – or rewards in terms

of Reinforcement Learning (RL). We adopt dis-

criminators from generative adversarial networks

(GAN) to estimate the reward values. Discrimi-

nators ensures the highest reward for ground-truth

(expert) and the extractor attempts to imitate the

expert by pursuing highest rewards. For chal-

lenging cases, if the extractor continues selecting

wrong labels, the GAN keeps expanding the mar-

gins between rewards for ground-truth labels and

(wrong) extractor labels and eventually deviates

the extractor from wrong labels.

The main contributions of this paper can be

summarized as follows:

• We apply reinforcement learning framework to

event extraction tasks, and the proposed frame-

work is an end-to-end and pipelined approach

that extracts entities and event triggers and de-

termines the argument roles for detected enti-

ties.

• With inverse reinforcement learning propelled

by GAN, we demonstrate that a dynamic reward

function ensures more optimal performance in a

complicated RL task.

2 Task and Term Preliminaries

In this paper we follow the schema of Automatic

Content Extraction (ACE) (Walker et al., 2006) to

detect the following elements from unstructured

natural language data:

• Entity: word or phrase that describes a real

world object such as a person (“Masih” as PER

in Figure 1). ACE schema deﬁnes 7 types of

entities.

• Event Trigger: the word that most clearly ex-

presses an event (interaction or change of sta-

tus). ACE schema deﬁnes 33 types of events

such as Sentence (“punishable” in Figure 1)

and Execute (“death”).

• Event argument: an entity that serves as a par-

ticipant or attribute with a speciﬁc role in an

event mention, in Figure 1 e.g., a PER “Masih”

serves as a Defendant in a Sentence event

triggered by “punishable”.

The ACE schema also comes with a data set –

ACE2005

– which has been used as a benchmark

for information extraction frameworks and we will

introduce this data set in Section 6.

https://catalog.ldc.upenn.edu/

LDC2006T06

For broader readers who might not be familiar

with reinforcement learning, we brieﬂy introduce

by their counterparts or equivalent concepts in su-

pervised models with the RL terms in the paren-

theses: our goal is to train an extractor (agent A)

to label entities, event triggers and argument roles

(actions a) in text (environment e); to commit cor-

rect labels, the extractor consumes features (state

s) and follow the ground truth (expert E); a re-

ward R will be issued to the extractor according

to whether it is different from the ground truth and

how serious the difference is – as shown in Fig-

ure 1, a repeated mistake is deﬁnitely more serious

– and the extractor improves the extraction model

(policy π) by pursuing maximized rewards.

Our framework can be brieﬂy described as fol-

lows: given a sentence, our extractor scans the

sentence and determines the boundaries and types

of entities and event triggers using Q-Learning

(Section 3.1); meanwhile, the extractor determines

the relations between triggers and entities – argu-

ment roles with policy gradient (Section 3.2). Dur-

ing the training epochs, GANs estimate rewards

which stimulate the extractor to pursue the most

optimal joint model (Section 4).

3 Framework and Approach

3.1 Q-Learning for Entities and Triggers

The entity and trigger detection is often mod-

eled as a sequence labeling problem, where long-

term dependency is a core nature; and reinforce-

ment learning is a well-suited method (Maes et al.,

2007).

From RL perspective, our extractor (agent A)

is exploring the environment, or unstructured nat-

ural language sentences when going through the

sequences and committing labels (actions a) for

the tokens. When the extractor arrives at tth to-

ken in the sentence, it observes information from

the environment and its previous action a

t−1

as its

current state s

; the extractor commits a current

action a

and moves to the next token, it has a new

state s

t+1

. The information from the environment

is token’s context embedding v

, which is usually

acquired from Bi-LSTM (Hochreiter and Schmid-

huber, 1997) outputs; previous action a

t−1

may

impose some constraint for current action a

, e.g.,

I-ORG does not follow B-PER

. With the afore-

In this work, we use BIO, e.g., “B-Meet” indicates the

token is beginning of Meet trigger, “I-ORG” means that the

token is inside an organization phrase, and “O” denotes null.

mentioned notations, we have

=< v

, a

t−1

> . (1)

To determine the current action a

, we generate

a series of Q-tables with

, a

) = f

t−1

, s

t−2

, . . . , a

t−1

, a

t−2

, . . .),

(2)

where f

(·) denotes a function that determine the

Q-values using the current state as well as previ-

ous states and actions. Then we achieve

ˆa

= arg max

, a

). (3)

Equation 2 and 3 suggest that an RNN-based

framework which consumes current input and pre-

vious inputs and outputs can be adopted, and we

use a unidirectional LSTM as (Bakker, 2002). We

have a full pipeline as illustrated in Figure 2.

For each label (action a

) with regard to s

, a

reward r

= r(s

, a

) is assigned to the extractor

(agent). We use Q-learning to pursue the most op-

timal sequence labeling model (policy π) by max-

imizing the expected value of the sum of future re-

wards E(R

), where R

represents the sum of dis-

counted future rewards r

+ γr

t+1

+ γ

t+2

+ . . .

with a discount factor γ, which determines the in-

ﬂuence between current and next states.

We utilize Bellman Equation to update the Q-

value with regard to the current assigned label to

approximate an optimal model (policy π

∗

, a

) = r

+ γ max

t+1

, a

t+1

). (4)

As illustrated in Figure 3, when the extractor

assigns a wrong label on the “death” token be-

cause the Q-value of Die ranks ﬁrst, Equation 4

will penalize the Q-value with regard to the wrong

label; while in later epochs, if the extractor com-

mits a correct label of Execute, the Q-value will

be boosted and make the decision reinforced.

We minimize the loss in terms of mean squared

error between the original and updated Q-values

notated as Q

, a

) − Q

, a

))

(5)

and apply back propagation to optimize the param-

eters in the neural network.

剩余12页未读，继续阅读

评论收藏

内容反馈

食色也

粉丝: 29
资源: 351

（有启发意义的文章）EE with Generative Adversarial Imitation Learning1

最新资源

（有启发意义的文章）EE with Generative Adversarial Imitation Learning1

Generative Adversarial Imitation Learning 生成对抗的模仿学习

Generative Adversarial Imitation Learning.pdf

Learning Generative Adversarial Networks 无水印pdf转化版

Learning Generative Adversarial Networks

Learning Generative Adversarial Networks epub

Generative Adversarial Network (GAN)

Generative Adversarial Networks Cookbook.zip

Deep Convolutional Generative Adversarial Networks

Deep Convolution Generative Adversarial Networks 源码

Generative Adversarial Nets.pdf

Generative Adversarial Nets

Conditional Generative Adversarial Nets.pdf

StarGAN Unified Generative Adversarial Networks.pdf

Generative Adversarial Networks

Ian Goodfellow introduction to GANs ( Generative Adversarial Networks)

Connecting Generative Adversarial Network and Actor-Critic Methods.pdf

Learning generative adversarial networks : next-generation deep learning

（CGAN）Conditional Generative Adversarial Nets

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

软件工程导论(第六版)课后习题答案1

goby红队&社区版-win-64-2.4.7

最新资源