【作者代码】AdversarialMulti-taskLearningforTextClassification-ACL2017

共54个文件

unlabel：16个

train：16个

test：16个

深度学习

自然语言处理

多任务学习

情感分类

需积分: 50 36 浏览量 2018-06-12 21:34:58 上传评论 3 收藏 15.51MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

论文代码及数据集.zip （54个子文件）

论文代码及数据集

.DS_Store 8KB

mtl-dataset

kitchen_housewares.task.train 745KB

baby.task.test 215KB

books.task.test 344KB

imdb.task.train 2.15MB

apparel.task.unlabel 559KB

imdb.task.test 475KB

magazines.task.test 257KB

toys_games.task.test 190KB

dvd.task.test 393KB

MR.task.unlabel 225KB

camera_photo.task.unlabel 1.41MB

sports_outdoors.task.test 212KB

kitchen_housewares.task.unlabel 939KB

kitchen_housewares.task.test 189KB

books.task.unlabel 1.73MB

MR.task.test 46KB

health_personal_care.task.unlabel 815KB

music.task.unlabel 1.53MB

toys_games.task.train 767KB

software.task.train 1.08MB

health_personal_care.task.test 169KB

dvd.task.train 1.51MB

imdb.task.unlabel 2.73MB

apparel.task.train 507KB

sports_outdoors.task.unlabel 959KB

toys_games.task.unlabel 901KB

electronics.task.train 897KB

sports_outdoors.task.train 808KB

MR.task.train 187KB

baby.task.unlabel 1.01MB

video.task.unlabel 1.83MB

baby.task.train 830KB

magazines.task.unlabel 1.33MB

electronics.task.unlabel 1.01MB

camera_photo.task.train 1.03MB

software.task.test 264KB

video.task.train 1.24MB

music.task.test 269KB

music.task.train 1.12MB

software.task.unlabel 323KB

health_personal_care.task.train 719KB

books.task.train 1.45MB

magazines.task.train 1003KB

apparel.task.test 124KB

camera_photo.task.test 253KB

video.task.test 333KB

dvd.task.unlabel 1.87MB

electronics.task.test 218KB

adv-mtl(代码）

data.py 9KB

utils.py 6KB

run.sh 82B

mtl-lstm.py 53KB

Adversarial Multi-task Learning for Text Classification.pdf 944KB

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1–10

Vancouver, Canada, July 30 - August 4, 2017.

2017 Association for Computational Linguistics

https://doi.org/10.18653/v1/P17-1001

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1–10

Vancouver, Canada, July 30 - August 4, 2017.

2017 Association for Computational Linguistics

https://doi.org/10.18653/v1/P17-1001

Adversarial Multi-task Learning for Text Classiﬁcation

Pengfei Liu Xipeng Qiu Xuanjing Huang

Shanghai Key Laboratory of Intelligent Information Processing, Fudan University

School of Computer Science, Fudan University

825 Zhangheng Road, Shanghai, China

{pﬂiu14,xpqiu,xjhuang}@fudan.edu.cn

Abstract

Neural network models have shown their

promising opportunities for multi-task

learning, which focus on learning the

shared layers to extract the common and

task-invariant features. However, in most

existing approaches, the extracted shared

features are prone to be contaminated by

task-speciﬁc features or the noise brought

by other tasks. In this paper, we propose

an adversarial multi-task learning frame-

work, alleviating the shared and private la-

tent feature spaces from interfering with

each other. We conduct extensive exper-

iments on 16 different text classiﬁcation

tasks, which demonstrates the beneﬁts of

our approach. Besides, we show that the

shared knowledge learned by our proposed

model can be regarded as off-the-shelf

knowledge and easily transferred to new

tasks. The datasets of all 16 tasks are pub-

licly available at http://nlp.fudan.

edu.cn/data/

1 Introduction

Multi-task learning is an effective approach to

improve the performance of a single task with

the help of other related tasks. Recently, neural-

based models for multi-task learning have be-

come very popular, ranging from computer vision

(Misra et al., 2016; Zhang et al., 2014) to natural

language processing (Collobert and Weston, 2008;

Luong et al., 2015), since they provide a conve-

nient way of combining information from multiple

tasks.

However, most existing work on multi-task

learning (Liu et al., 2016c,b) attempts to divide the

features of different tasks into private and shared

spaces, merely based on whether parameters of

A B

(a) Shared-Private Model

A B

(b) Adversarial Shared-Private Model

Figure 1: Two sharing schemes for task A and task

B. The overlap between two black circles denotes

shared space. The blue triangles and boxes repre-

sent the task-speciﬁc features while the red circles

denote the features which can be shared.

some components should be shared. As shown in

Figure 1-(a), the general shared-private model in-

troduces two feature spaces for any task: one is

used to store task-dependent features, the other is

used to capture shared features. The major lim-

itation of this framework is that the shared fea-

ture space could contain some unnecessary task-

speciﬁc features, while some sharable features

could also be mixed in private space, suffering

from feature redundancy.

Taking the following two sentences as exam-

ples, which are extracted from two different senti-

ment classiﬁcation tasks: Movie reviews and Baby

products reviews.

The infantile cart is simple and easy to use.

This kind of humour is infantile and boring.

The word “infantile” indicates negative senti-

ment in Movie task while it is neutral in Baby task.

However, the general shared-private model could

place the task-speciﬁc word “infantile” in a

shared space, leaving potential hazards for other

tasks. Additionally, the capacity of shared space

could also be wasted by some unnecessary fea-

tures.

To address this problem, in this paper we

propose an adversarial multi-task framework, in

which the shared and private feature spaces are in-

herently disjoint by introducing orthogonality con-

straints. Speciﬁcally, we design a generic shared-

private learning framework to model the text se-

quence. To prevent the shared and private latent

feature spaces from interfering with each other, we

introduce two strategies: adversarial training and

orthogonality constraints. The adversarial training

is used to ensure that the shared feature space sim-

ply contains common and task-invariant informa-

tion, while the orthogonality constraint is used to

eliminate redundant features from the private and

shared spaces.

The contributions of this paper can be summa-

rized as follows.

1. Proposed model divides the task-speciﬁc and

shared space in a more precise way, rather

than roughly sharing parameters.

2. We extend the original binary adversarial

training to multi-class, which not only en-

ables multiple tasks to be jointly trained, but

allows us to utilize unlabeled data.

3. We can condense the shared knowledge

among multiple tasks into an off-the-shelf

neural layer, which can be easily transferred

to new tasks.

2 Recurrent Models for Text

Classiﬁcation

There are many neural sentence models, which

can be used for text modelling, involving recurrent

neural networks (Sutskever et al., 2014; Chung

et al., 2014; Liu et al., 2015a), convolutional neu-

ral networks (Collobert et al., 2011; Kalchbren-

ner et al., 2014), and recursive neural networks

(Socher et al., 2013). Here we adopt recurrent neu-

ral network with long short-term memory (LSTM)

due to their superior performance in various NLP

tasks (Liu et al., 2016a; Lin et al., 2017).

Long Short-term Memory Long short-term

memory network (LSTM) (Hochreiter and

Schmidhuber, 1997) is a type of recurrent neural

network (RNN) (Elman, 1990), and speciﬁcally

addresses the issue of learning long-term de-

pendencies. While there are numerous LSTM

variants, here we use the LSTM architecture used

by (Jozefowicz et al., 2015), which is similar to

the architecture of (Graves, 2013) but without

peep-hole connections.

We deﬁne the LSTM units at each time step t to

be a collection of vectors in R

: an input gate i

forget gate f

, an output gate o

,amemory cell c

and a hidden state h

. d is the number of the LSTM

units. The elements of the gating vectors i

, f

and

are in [0, 1].

The LSTM is precisely speciﬁed as follows.

˜c

tanh



✓



t1



+ b

◆

, (1)

= ˜c

 i

+ c

t1

 f

, (2)

= o

 tanh (c

) , (3)

where x

2 R

is the input at the current time step;

2 R

4d⇥(d+e)

and b

2 R

are parameters of

afﬁne transformation;  denotes the logistic sig-

moid function and  denotes elementwise multi-

plication.

The update of each LSTM unit can be written

precisely as follows:

= LSTM(h

t1

, x

,✓

). (4)

Here, the function LSTM(·, ·, ·, ·) is a shorthand

for Eq. (1-3), and ✓

represents all the parameters

of LSTM.

Text Classiﬁcation with LSTM Given a text

sequence x = {x

, ··· ,x

}, we ﬁrst use a

lookup layer to get the vector representation (em-

beddings) x

of the each word x

. The output at

the last moment h

can be regarded as the repre-

sentation of the whole sequence, which has a fully

connected layer followed by a softmax non-linear

layer that predicts the probability distribution over

classes.

y = soft max (Wh

+ b) (5)

where

y is prediction probabilities, W is the

weight which needs to be learned, b is a bias term.

Given a corpus with N training samples

), the parameters of the network are trained

to minimise the cross-entropy of the predicted and

true distributions.

L(ˆy, y)=

i=1

j=1

log(ˆy

), (6)

where y

is the ground-truth label; ˆy

is prediction

probabilities, and C is the class number.

softmax

task

LSTM

softmax

task

(a) Fully Shared Model (FS-MTL)

LSTM

softmax

task

(b) Shared-Private Model (SP-MTL)

Figure 2: Two architectures for learning multiple

tasks. Yellow and gray boxes represent shared and

private LSTM layers respectively.

3 Multi-task Learning for Text

Classiﬁcation

The goal of multi-task learning is to utilizes the

correlation among these related tasks to improve

classiﬁcation by learning tasks in parallel. To facil-

itate this, we give some explanation for notations

used in this paper. Formally, we refer to D

as a

dataset with N

samples for task k. Speciﬁcally,

= {(x

)}

i=1

(7)

where x

and y

denote a sentence and corre-

sponding label for task k.

3.1 Two Sharing Schemes for Sentence

Modeling

The key factor of multi-task learning is the sharing

scheme in latent feature space. In neural network

based model, the latent features can be regarded as

the states of hidden neurons. Speciﬁc to text clas-

siﬁcation, the latent features are the hidden states

of LSTM at the end of a sentence. Therefore, the

sharing schemes are different in how to group the

shared features. Here, we ﬁrst introduce two shar-

ing schemes with multi-task learning: fully-shared

scheme and shared-private scheme.

Fully-Shared Model (FS-MTL) In fully-shared

model, we use a single shared LSTM layer to ex-

tract features for all the tasks. For example, given

two tasks m and n, it takes the view that the fea-

tures of task m can be totally shared by task n and

vice versa. This model ignores the fact that some

features are task-dependent. Figure 2a illustrates

the fully-shared model.

Shared-Private Model (SP-MTL) As shown in

Figure 2b, the shared-private model introduces

two feature spaces for each task: one is used to

store task-dependent features, the other is used

to capture task-invariant features. Accordingly, we

can see each task is assigned a private LSTM layer

and shared LSTM layer. Formally, for any sen-

tence in task k, we can compute its shared rep-

resentation s

and task-speciﬁc representation h

as follows:

= LSTM(x

, s

t1

,✓

), (8)

= LSTM(x

, h

t1

,✓

) (9)

where LSTM ( ., ✓) is deﬁned as Eq. (4).

The ﬁnal features are concatenation of the fea-

tures from private space and shared space.

3.2 Task-Speciﬁc Output Layer

For a sentence in task k, its feature h

(k)

, emitted

by the deep muti-task architectures, is ultimately

fed into the corresponding task-speciﬁc softmax

layer for classiﬁcation or other tasks.

The parameters of the network are trained to

minimise the cross-entropy of the predicted and

true distributions on all the tasks. The loss L

task

can be computed as:

Task

k=1

↵

L(ˆy

(k)

) (10)

where ↵

is the weights for each task k respec-

tively. L(ˆy, y) is deﬁned as Eq. 6.

4 Incorporating Adversarial Training

Although the shared-private model separates the

feature space into the shared and private spaces,

there is no guarantee that sharable features can not

exist in private feature space, or vice versa. Thus,

some useful sharable features could be ignored in

shared-private model, and the shared feature space

is also vulnerable to contamination by some task-

speciﬁc information.

Therefore, a simple principle can be applied

into multi-task learning that a good shared feature

space should contain more common information

and no task-speciﬁc information. To address this

problem, we introduce adversarial training into

multi-task framework as shown in Figure 3 (ASP-

MTL).

评论收藏

内容反馈

图不灵

粉丝: 41
资源: 18

【作者代码】Adversarial Multi-task Learning for Text Classification-AC...

最新资源

【作者代码】Adversarial Multi-task Learning for Text Classification-AC...

multi_task_learning:多任务功能学习

aclImdb_v1.tar.gz(imdb电影评价数据集)

Multitask-Learning:很棒的多任务学习资源

Multi-Task-Learning:多任务学习的论文，代码和应用程​​序列表

Adversarial Attribute-Text Embedding for Person Search with Natural Language Que

Doubly Semi-supervised Multimodal Adversarial Learning for Classification, Generation and Retrieval

Adversarial Text-to-Image Synthesis A Review.pdf

Adversarial Text-to-Image Synthesis A Review.zip

Adversarial Cross-Modal Retrieval

Threat of Adversarial Attacks on Deep Learning in Computer Vision A Survey.pdf

adversarial-robustness-toolbox：adversarial-robustness-toolbox（ART）-用于机器学习安全性的Python库-规避，中毒，提取，推理

Decorrelated-Adversarial-Learning-master.zip

Python-AdversarialGeneratorEncoderNetworks论文代码

Super-Resolution-using-Generative-Adversarial-Networks-master.zip

Generative-Adversarial-Networks-Projects-master.zip

ACMR算法（Adversarial Cross-Modal Retrivieval）自己总结笔记

Threat of Adversarial Attacks on Deep Learning in Computer Visio

合作随机多智能体多武装土匪对抗性腐蚀的鲁棒性_Cooperative Stochastic Multi-agent Multi-

relgan_relational_generative_adversarial_networks_for_text_generation.pdf

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

最新资源

Multi-Task-Learning:多任务学习的论文，代码和应用程序列表