【免费】[文本匹配][P19-1465][SimpleandEffectiveTextMatchingwithRicher资源-CSDN文库

需积分: 0 114 浏览量 2022-08-03 23:46:27 上传评论收藏 567KB PDF 举报

资源详情

资源评论

资源推荐

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4699–4709

Florence, Italy, July 28 - August 2, 2019.

2019 Association for Computational Linguistics

4699

Simple and Effective Text Matching with Richer Alignment Features

Runqi Yang

, Jianhai Zhang

, Xing Gao

, Feng Ji

, Haiqing Chen

Department of Computer Science and Technology, Nanjing University, China

[email protected]

Alibaba Group, Hangzhou, China

{tanfan.zjh,gaoxing.gx,zhongxiu.jf,

haiqing.chenhq}@alibaba-inc.com

Abstract

In this paper, we present a fast and strong neu-

ral approach for general purpose text matching

applications. We explore what is sufﬁcient to

build a fast and well-performed text matching

model and propose to keep three key features

available for inter-sequence alignment: origi-

nal point-wise features, previous aligned fea-

tures, and contextual features while simplify-

ing all the remaining components. We conduct

experiments on four well-studied benchmark

datasets across tasks of natural language in-

ference, paraphrase identiﬁcation and answer

selection. The performance of our model is

on par with the state-of-the-art on all datasets

with much fewer parameters and the inference

speed is at least 6 times faster compared with

similarly performed ones.

1 Introduction

Text matching is a core research area in natural

language processing with a long history. In text

matching tasks, a model takes two text sequences

as input and predicts a category or a scala value in-

dicating their relationship. A wide range of tasks,

including natural language inference (also known

as recognizing textual entailment) (Bowman et al.,

2015; Khot et al., 2018), paraphrase identiﬁcation

(Wang et al., 2017), answer selection (Yang et al.,

2015), and so on, can be seen as speciﬁc forms

of text matching problems. Research on general

purpose text matching algorithm is beneﬁcial to a

large number of relevant applications.

Deep neural networks are the most popular

choices for text matching nowadays. Semantic

alignment and comparison of two text sequences

are the keys in neural text matching. Many pre-

vious deep neural networks contain a single inter-

sequence alignment layer. To make full use of this

only alignment process, the model has to take rich

external syntactic features or hand-designed align-

ment features as additional inputs of the alignment

layer (Chen et al., 2017; Gong et al., 2018), adopt

a complicated alignment mechanism (Wang et al.,

2017; Tan et al., 2018), or build a vast amount of

post-processing layers to analyze the alignment re-

sult (Tay et al., 2018b; Gong et al., 2018).

More powerful models can be built with mul-

tiple inter-sequence alignment layers. Instead of

making a prediction based on the comparison re-

sult of a single alignment process, a stacked model

with multiple alignment layers maintains its in-

termediate states and gradually reﬁnes its predic-

tions. However, suffering from inefﬁcient propa-

gation of lower-level features and vanishing gradi-

ents, these deeper architectures are harder to train.

Recent works have come up with ways of connect-

ing stacked building blocks including dense con-

nection (Tay et al., 2018a; Kim et al., 2018) and

recurrent neural networks (Liu et al., 2018), which

strengthen the propagation of lower-level features

and yield better results than those with a single

alignment process.

This paper presents RE2, a fast and strong neu-

ral architecture with multiple alignment processes

for general purpose text matching. We question

the necessity of many slow components in text

matching approaches presented in previous liter-

ature, including complicated multi-way alignment

mechanisms, heavy distillations of alignment re-

sults, external syntactic features, or dense connec-

tions to connect stacked blocks when the model

is going deep. These design choices slow down

the model by a large amount and can be replaced

by much more lightweight and equally effective

ones. Meanwhile, we highlight three key compo-

nents for an efﬁcient text matching model. These

components, which the name RE2 stands for, are

previous aligned features (Residual vectors), orig-

inal point-wise features (Embedding vectors), and

contextual features (Encoded vectors). The re-

4700

maining components can be as simple as possible

to keep the model fast while still yielding strong

performance.

The general architecture of RE2 is illustrated in

Figure 1. An embedding layer ﬁrst embeds dis-

crete tokens. Several same-structured blocks con-

sisting of encoding, alignment and fusion layers

then process the sequences consecutively. These

blocks are connected by an augmented version of

residual connections (see section 2.1). A pooling

layer aggregates sequential representations into

vectors which are ﬁnally processed by a predic-

tion layer to give the ﬁnal prediction. The imple-

mentation of each layer is kept as simple as pos-

sible, and the whole model, as a well-organized

combination, is quite powerful and lightweight at

the same time.

Our proposed method achieves the performance

on par with the state-of-the-art on four bench-

mark datasets across three different tasks, namely

SNLI and SciTail for natural language inference,

Quora Question Pairs for paraphrase identiﬁca-

tion, and WikiQA for answer selection. Further-

more, our model has the least number of parame-

ters and the fastest inference speed in all similarly-

performed models. We also conduct an ablation

study to compare with alternative implementations

of most components, perform robustness checks

to see whether the model is robust to changes of

structural hyperparameters, explore what roles the

three key features in RE2 play by comparing their

occlusion sensitivity and show the evolution of

alignment results by a case study. We release the

source code

of our experiments for reproducibil-

ity and hope to facilitate future researches.

2 Our Approach

In this section, we introduce our proposed ap-

proach RE2 for text matching. Figure 1 gives an

illustration of the overall architecture. Two text

sequences are processed symmetrically before the

prediction layer, and all parameters except those

in the prediction layer are shared between the two

sequences. For conciseness, we omit the part for

the other sequence in the ﬁgure.

In RE2, tokens in each sequence are ﬁrst em-

bedded by the embedding layer and then processed

consecutively by N same-structured blocks with

independent parameters (dashed boxes in Figure

https://github.com/hitvoice/RE2, under the Apache Li-

cense 2.0.

Figure 1: An overview of RE2. There are three parts in

the input of alignment and fusion layers: original point-

wise features (Embedding vectors, denoted by blank

rectangles), previous aligned features (Residual vec-

tors, denoted by rectangles with diagonal stripes), and

contextual features (Encoded vectors, denoted by solid

rectangles). The architecture on the right is the same as

the one on the left so it’s omitted for conciseness.

1) connected by augmented residual connections.

Inside each block, a sequence encoder ﬁrst com-

putes contextual features of the sequence (solid

rectangles in Figure 1). The input and output of

the encoder are concatenated and then fed into an

alignment layer to model the alignment and inter-

action between the two sequences. A fusion layer

fuses the input and output of the alignment layer.

The output of the fusion layer is considered as the

output of this block. The output of the last block

is sent to the pooling layer and transformed into

a ﬁxed-length vector. The prediction layer takes

the two vectors as input and predicts the ﬁnal tar-

get. The cross entropy loss is optimized to train

the model in classiﬁcation tasks.

The implementation of each layer is kept as sim-

ple as possible. We use only word embeddings

in the embedding layer, without character embed-

dings or syntactic features. Vanilla multi-layer

convolutional networks with same padding (Col-

lobert et al., 2011) are adopted as the encoder.

Recurrent networks are slower and do not lead

to further improvements, so they are not adopted

here. A max-over-time pooling operation (Col-

lobert et al., 2011) is used in the pooling layer.

4701

The details of augmented residual connections and

other layers are introduced as follows.

2.1 Augmented Residual Connections

To provide richer features for alignment processes,

RE2 adopts an augmented version of residual con-

nections to connect consecutive blocks. For a se-

quence of length l, We denote the input and output

of the n-th block as x

(n)

= (x

(n)

, x

(n)

, . . . , x

(n)

)

and o

(n)

= (o

(n)

, o

(n)

, . . . , o

(n)

), respectively. Let

(0)

be a sequence of zero vectors. The input of the

ﬁrst block x

(1)

, as mentioned before, is the output

of the embedding layer (denoted by blank rectan-

gles in Figure 1). The input of the n-th block x

(n)

(n ≥ 2), is the concatenation of the input of the

ﬁrst block x

(1)

and the summation of the output of

previous two blocks (denoted by rectangles with

diagonal stripes in Figure 1):

(n)

= [x

(1)

; o

(n−1)

+ o

(n−2)

], (1)

where [; ] denotes the concatenation operation.

With augmented residual connections, there are

three parts in the input of alignment and fusion

layers, namely original point-wise features kept

untouched along the way (Embedding vectors),

previous aligned features processed and reﬁned by

previous blocks (Residual vectors), and contextual

features from the encoder layer (Encoded vectors).

Each of these three parts plays a complementing

role in the text matching process.

2.2 Alignment Layer

A simple form of alignment based on the attention

mechanism is used following Parikh et al. (2016)

with minor modiﬁcations. The alignment layer, as

shown in Figure 1, takes features from the two se-

quences as input and computes the aligned repre-

sentations as output. Input from the ﬁrst sequence

of length l

is denoted as a = (a

, a

, . . . , a

)

and input from the second sequence of length l

is denoted as b = (b

, b

, . . . , b

). The similarity

score e

between a

and b

is computed as the dot

product of the projected vectors:

= F (a

)

F (b

). (2)

F is an identity function or a single-layer feed-

forward network. The choice is treated as a hyper-

parameter.

The output vectors a

and b

are computed

by weighted summation of representations of the

other sequence. The summation is weighted by

similarity scores between the current position and

the corresponding positions in the other sequence:

j=1

exp(e

)

k=1

exp(e

)

i=1

exp(e

)

k=1

exp(e

)

(3)

2.3 Fusion Layer

The fusion layer compares local and aligned repre-

sentations in three perspectives and then fuse them

together. The output of the fusion layer for the ﬁrst

sequence ¯a is computed by

¯a

= G

([a

; a

]),

¯a

= G

([a

; a

− a

]),

¯a

= G

([a

; a

◦ a

]),

¯a

= G([¯a

; ¯a

]),

(4)

where G

, G

, and G are single-layer feed-

forward networks with independent parameters

and ◦ denotes element-wise multiplication. The

subtraction operator highlights the difference be-

tween the two vectors while the multiplication

highlights similarity. Formulations for

b are simi-

lar and omitted here.

2.4 Prediction Layer

The prediction layer takes the vector representa-

tions of the two sequences v

and v

from the pool-

ing layers as input and predicts the ﬁnal target fol-

lowing Mou et al. (2016):

y = H([v

; v

− v

; v

◦ v

]). (5)

H is a multi-layer feed-forward neural network.

In a classiﬁcation task,

y ∈ R

represents the un-

normalized predicted scores for all classes where

C is the number of classes. The predicted class

is ˆy = argmax

. In a regression task,

y is the

predicted scala value.

In symmetric tasks like paraphrase identiﬁca-

tion, a symmetric version of the prediction layer

is used for better generalization:

y = H([v

; v

; |v

− v

|; v

◦ v

]). (6)

We also provide a simpliﬁed version of the pre-

diction layer. Which version to use is treated as

a hyperparameter. The simpliﬁed prediction layer

can be expressed as:

y = H([v

; v

]). (7)

剩余10页未读，继续阅读

评论收藏

内容反馈

XiZi

粉丝: 60
资源: 325

[文本匹配][P19-1465][Simple and Effective Text Matching with Richer

评论0

最新资源

[文本匹配][P19-1465][Simple and Effective Text Matching with Richer

评论0

text_matching-master.zip_text matching_文本匹配

SGM-Nets: Semi-global matching with neural networks

Cross-Scale Cost Aggregation for Stereo Matching 源代码

天线阻抗匹配 Non-Foster_Reactance_Matching

histogram-equalization-and-matching.rar_histogram matching_直方图 匹

Effective awk Programming_ Universal Text Processing and Pattern Matching.epub

MapMatching.zip_ST-matching_ST-matching代码_ST-matching算_mapmatchi

map-matching-0.8.0.zip_Map matching_gps 地图匹配_graphhopper工具_map-m

Object retrieval with large vocabularies and fast spatial matching

halcon-10.0-solution-guide-ii-b-matching

image-matching.rar_image matching_image-matching_图像匹配_图片匹配

SGM 2008 PAMI - Stereo Processing by Semiglobal Matching and Mutual Informtion

awk.Effective.awk.Programming.Universal.Text.Processing.and.Pattern.Matching.4th

Efficient string matching with wildcards and length constraints

D-FROST: Distributed Frequency Reuse-Based Opportunistic Spectrum Trading via Matching With Evolving Preferences

基于分割的立体匹配及算法-Segment_Based_Stereo_Matching.part1.rar

hog的代码matlab-Multi-sensor-images-matching:Multi-sensorimagesmatching（异源

中文文本匹配数据集（ LCQMC、BQ-Corpus、STS-B、ATEC ）

Presentazione MCC（Minutia Cylinder-Code: A New Representation and Matching Tech）

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

网络安全+《2024网络安全报告》

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

最新资源

histogram-equalization-and-matching.rar_histogram matching_直方图匹