NAST一种用于无监督文本样式转换的具有单词对齐的非自回归生成器

版权申诉

149 浏览量 2022-01-18 09:06:00 上传评论收藏 808KB PDF 举报

NAST一种用于无监督文本样式转换的具有单词对齐的非自回归生成器_NAST A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer.pdf 《NAST：无监督文本样式转移的非自回归生成器》无监督文本样式转移是自然语言处理领域的一个重要研究方向，旨在改变文本的情感、语气等风格特征，同时保持其内容信息不变。近年来，自回归模型在这一领域得到了广泛应用，但它们在内容保留方面存在一个问题，即通常会忽略源句子的部分内容，生成与风格相关的不相关词汇。针对这一挑战，研究人员提出了一种新的无监督文本样式转移方法——非自回归生成器（NAST），它从两个方面解决了内容保留问题。 NAST引入了显式建模的词对齐策略。在转换后的句子中，大多数词汇可以与源句子中的相关词汇对齐。通过这种方式，NAST能够抑制不相关的词汇生成，从而更准确地保持原文本的核心内容。这一创新观察到在文本风格转换过程中，词与词之间的对应关系对于内容保留至关重要。现有的模型通常使用循环损失进行训练，将两个风格的文本在整体上进行对齐，但缺乏对单个词汇级别的精细控制。NAST的非自回归生成器则关注于对齐词汇之间的联系，学习在不同风格之间进行词级别的转换。这种精细化的处理方式提高了内容保留的能力，增强了模型的风格转换效果。在实验部分，NAST被整合到两个基础模型中，并在两个风格转移任务上进行了评估。结果表明，NAST显著提升了整体性能，同时提供了可解释的词对齐信息。此外，非自回归的生成机制在推理阶段实现了超过10倍的速度提升，极大地提高了效率。 NAST的实现代码已经开源，可在https://github.com/thu-coai/NAST获取。这一工作不仅为文本风格转移提供了新的解决方案，也为未来的研究提供了有价值的参考，特别是在解决内容保留和效率问题上。 NAST是一种具有创新性的非自回归生成器，通过词对齐和精细化的风格转换学习，改善了无监督文本样式转移的效果和可解释性。它在实际应用中具有广阔前景，特别是在需要快速、准确转换文本风格的场景下。

资源推荐

资源详情

资源评论

NAST: A Non-Autoregressive Generator with Word Alignment

for Unsupervised Text Style Transfer

Fei Huang, Zikai Chen, Chen Henry Wu, Qihan Guo, Xiaoyan Zhu, Minlie Huang

∗

The CoAI group, DCST; Institute for Artiﬁcial Intelligence;

State Key Lab of Intelligent Technology and Systems;

Beijing National Research Center for Information Science and Technology;

Tsinghua University, Beijing 100084, China.

f-huang18@mails.tsinghua.edu.cn natnstart@gmail.com henrychenwu98@gmail.com

gqh18@mails.tsinghua.edu.cn zxy-dcs@tsinghua.edu.cn aihuang@tsinghua.edu.cn

Abstract

Autoregressive models have been widely used

in unsupervised text style transfer. Despite

their success, these models still suffer from

the content preservation problem that they usu-

ally ignore part of the source sentence and

generate some irrelevant words with strong

styles. In this paper, we propose a Non-

Autoregressive generator for unsupervised text

Style Transfer (NAST), which alleviates the

problem from two aspects. First, we observe

that most words in the transferred sentence can

be aligned with related words in the source sen-

tence, so we explicitly model word alignments

to suppress irrelevant words. Second, existing

models trained with the cycle loss align sen-

tences in two stylistic text spaces, which lacks

ﬁne-grained control at the word level. The pro-

posed non-autoregressive generator focuses on

the connections between aligned words, which

learns the word-level transfer between styles.

For experiments, we integrate the proposed

generator into two base models and evaluate

them on two style transfer tasks. The re-

sults show that NAST can signiﬁcantly im-

prove the overall performance and provide ex-

plainable word alignments. Moreover, the non-

autoregressive generator achieves over 10x

speedups at inference. Our codes are available

at https://github.com/thu-coai/NAST.

1 Introduction

Text style transfer aims at changing the text

style while preserving the style-irrelevant contents,

which has a wide range of applications, e.g., senti-

ment transfer (Shen et al., 2017), text formalization

(Rao and Tetreault, 2018), and author imitation

(Jhamtani et al., 2017). Due to the lack of parallel

training data, most works focus on unsupervised

text style transfer using non-parallel stylistic data.

The cycle consistency loss (Zhu et al., 2017),

a.k.a. the back-translation loss (Lample et al., 2018,

Corresponding author: Minlie Huang.

Not great, but good atmosphere and great service

Not terrible, but not very good

Source:

Transferred:

Autoregressive Generation

(a) Existing Style Transfer Model

Not perfect , but

indeed

very good

Not terrible , but not very good

Source:

Target:

(b) Observation of Word Alignment

Not perfect , but very good indeed

Not terrible , but

not very good

veryNot terrible , but good [Mask]

Source:

Transferred:

Step 1. Alignment Prediction

Aligned:

Two Step Decomposition

Step 2. Non-autoregressive Generation

61 2 3 4 7 0

1 2 3 4 6

Figure 1: Sentiment transfer examples (negative to pos-

itive). (a) Existing models without word alignments

may generate words irrelevant to the source sen-

tence. (b) An example of word alignments between the

source and target sentences. Arrows connect aligned

words (identical or relevant), and blue words are not

aligned. (c) NAST’s generation process. Step 1: gener-

ate the index of aligned words. [Mask] is a placeholder

for unaligned words. Step 2: generate the transferred

sentence non-autoregressively.

2019), has been widely adopted by unsupervised

text style transfer models (Dai et al., 2019; He et al.,

2020; Yi et al., 2020). Speciﬁcally, the cycle loss

minimizes the reconstruction error for the sentence

transferred from style

to style

and then back to

, which aligns the sentences in two stylistic text

spaces to achieve the transfer and preserve style-

irrelevant contents. The cycle-loss-based models

are trained in an end-to-end fashion, and thus can

be easily applied to different datasets.

Although cycle-loss-based models yield promis-

ing results, one of their major failure cases is to

replace some part of the source sentence with irrel-

evant words that have strong styles, as shown in Fig

1(a). This problem degrades content preservation

and can be alleviated from two perspectives.

First

we observe that most words in the human-written

arXiv:2106.02210v1 [cs.CL] 4 Jun 2021

transferred sentence can be aligned with those in

the source sentence. As shown in Fig 1(b), we can

align “Not” with “Not”, “terrible” with “perfect”,

and leave only a few words unaligned. It shows

that humans regard the alignments between words

as a key aspect of content preservation, but they are

not explicitly modeled by cycle-loss-based models

yet.

Second

, existing models use the cycle loss to

align sentences in two stylistic text spaces, which

lacks control at the word level. For example, in

sentiment transfer, “tasty” should be mapped to

“awful” (because they both depict food tastes) but

not “expensive”. We utilize a non-autoregressive

generator to model the word-level transfer, where

the transferred words are predicted based on con-

textual representations of the aligned source words.

In this paper, we propose a Non-Autoregressive

generator for unsupervised Style Transfer (NAST),

which explicitly models word alignment for better

content preservation. Speciﬁcally, our generation

process is decomposed into two steps: ﬁrst pre-

dicting word alignments conditioned on the source

sentence, and then generating the transferred sen-

tence with a non-autoregressive (NAR) decoder.

Modeling word alignments directly suppresses the

generation of irrelevant words, and the NAR de-

coder exploits the word-level transfer. NAST can

be used to replace the autoregressive generators

of existing cycle-loss-based models. In the exper-

iments, we integrate NAST into two base mod-

els: StyTrans (Dai et al., 2019) and LatentSeq (He

et al., 2020). Results on two benchmark datasets

show that NAST steadily improves the overall per-

formance. Compared with autoregressive models,

NAST greatly accelerates training and inference

and provides better optimization of the cycle loss.

Moreover, we observe that NAST learns explain-

able word alignments. Our contributions are:

•

We propose NAST, a Non-Autoregressive gen-

erator for unsupervised text Style Transfer. By

explicitly modeling word alignments, NAST sup-

presses irrelevant words and improves content

preservation for the cycle-loss-based models. To

the best of our knowledge, we are the ﬁrst to

introduce a non-autoregressive generator to an

unsupervised generation task.

•

Experiments show that incorporating NAST in

cycle-loss-based models signiﬁcantly improves

the overall performance and the speed of training

and inference. In further analysis, we ﬁnd that

NAST provides better optimization of the cycle

loss and learns explainable word alignments.

2 Related Work

Unsupervised Text Style Transfer

We categorize style transfer models into three

types. The ﬁrst type (Shen et al., 2017; Zhao et al.,

2018; Yang et al., 2018; John et al., 2019) disen-

tangles the style and content representations, and

then combines the content representations with the

target style to generate the transferred sentence.

However, the disentangled representations are lim-

ited in capacity and thus hardly scalable for long

sentences (Dai et al., 2019). The second type is

the editing-based method (Li et al., 2018; Wu et al.,

2019a,b), which edits the source sentence with sev-

eral discrete operations. The operations are usually

trained separately and then constitute a pipeline.

These methods are highly explainable, but they

usually need to locate and replace the stylist words,

which hardly applies to complex tasks that require

changes in sentence structures. Although our two-

step generation seems similar to a pipeline, NAST

is trained in an end-to-end fashion with the cycle

loss. All transferred words in NAST are gener-

ated, not copied, which is essentially different from

these methods. The third type is based on the cy-

cle loss. Zhang et al. (2018); Lample et al. (2019)

introduce the back translation method into style

transfer, where the model is directly trained with

the cycle loss after a proper initialization. The fol-

lowing works (Dai et al., 2019; Luo et al., 2019;

He et al., 2020; Yi et al., 2020) further adopt a style

loss to improve the style control.

A recent study (Zhou et al., 2020) explores the

word-level information for style transfer, which is

related to our motivation. However, they focus on

word-level style relevance in designing novel objec-

tives, while we focus on modeling word alignments

and the non-autoregressive architecture.

Non-Autoregressive Generation

Non-AutoRegressive (NAR) generation is ﬁrst

introduced in machine translation for parallel de-

coding with low latency (Gu et al., 2018). The

NAR generator assumes that each token is gener-

ated independently of each other conditioned on

the input sentence, which sacriﬁces the generation

quality in exchange for the inference speed.

Most works on NAR generation focus on im-

proving the generation quality while preserving the

speed acceleration in machine translation. Gu et al.

(2018) ﬁnd the decoder input is critical to the gener-

① Simple Alignment

Source Sentence:

so far I am not impressed

Alignment:

Aligned Sentence:

so far I am not impressed

Transferred Sentence:

so far I am very impressed

Source Sentence:

worst food and will never come back

Alignment:

Aligned Sentence:

worst [Mask] food and will come back

Transferred Sentence:

most delicious food and will come back

② Learnable Alignment













…







Transformer Decoder





…









Transformer Encoder

Non-Differentiable

Approximated Gradients

Differentiable

Source Sentence

Aligned Sentence

Transferred Sentence

(a)

(b)

 = [



, 



, ⋯ , 



]

Predicted Alignment

1, 2, 3, 4, 5, 6 = [

]

1, 2, 3, 4, 6, 7 = [ ]0,









… 



  … 

Figure 2: (a) Architecture of NAST transferring X to Y . S

is the target style. The NAR decoder generates each

word y

independently. (b) Examples of two alignment prediction strategies. Simple Alignment: each y

is aligned

with x

. Learnable Alignment: a network predicts the alignment, where t

= 0 indicates a [Mask] placeholder.

ation quality. Several works (Bao et al., 2019; Ran

et al., 2019) improve the decoder input by aligning

source words with target words, which utilize a

two-step generation process and inspire the design

of NAST. To our knowledge, only a few works of

NAR generation explore applications other than

machine translation (Han et al., 2020; Peng et al.,

2020). We are the ﬁrst to apply NAR generators to

an unsupervised text generation task, which surpris-

ingly outperforms autoregressive models in transfer

quality besides the acceleration.

3 Methods

In this paper, we formulate the unsupervised text

style transfer as follows: for two non-parallel cor-

pora with styles

and

respectively, the task aims

at training a style transfer model

. The model

learns the transfer of two directions,

X → Y

and

Y → X

, which can be denoted as

(Y |X)

and

(X|Y ), respectively.

3.1 NAST

NAST is a non-autoregressive generator based on

the observation of the word alignment: in style

transfer tasks, most generated words can be aligned

with the source words, where each pair of the

aligned words is either identical or highly rele-

vant. For simplicity, we only describe

, where

shares the architecture and parameters ex-

cept style embeddings. Given the source sentence

X = [x

, x

, · · · , x

]

, the generation process of

NAST is decomposed into two steps: predicting the

alignment

T = [t

, t

, · · · , t

]

, and then generat-

ing the transferred sentence

Y = [y

, y

, · · · , y

]

When

1 ≤ t

≤ N

, the generated word

is aligned

with the source word

. Otherwise,

is not

aligned with any source word, where we set

and ﬁll

with a [Mask] placeholder. Formally,

we regard

as a latent variable, and the generation

probability is formulated as

(Y |X) =

(Y |X, T )P

(T |X), (1)

where P

(T |X) and P

(Y |X, T ) are modeled

by an alignment predictor and a non-autoregressive

decoder, respectively, as shown in Fig 2.

3.1.1 Alignment Predictor

The alignment predictor predicts the target length

and the alignment

conditioned on the source

sentence

. We utilize a Transformer (Vaswani

et al., 2017) to encode the source sentence and then

explore two alternative strategies to predict T .

Simple Alignment.

Simple Alignment assumes

that the source and target sentences have the same

length, and each generated word

is exactly

aligned with the source word x

. Formally,

(T |X) = I[M = N]

i=1

I[t

= i],

where

I[·]

is the indicator function. A similar strat-

egy has been adopted by editing-based methods

(Wu et al., 2019b; Helbig et al., 2020), where they

simply replace several words in the source sentence.

Although this strategy cannot alter the sentence

length, it empirically works well on simple tasks,

such as sentiment transfer.

Learnable Alignment.

Inspired by Ran et al.

(2019); Bao et al. (2019), we utilize a pointer net-

work (Vinyals et al., 2015) on top of the encoder,

which predicts the alignment T :

(T |X) =

i=1

|X, t

The pointer network is essentially an autoregressive

generator, but it only generates the alignment

pointing to a source word.

剩余13页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6634
资源: 9万+

NAST一种用于无监督文本样式转换的具有单词对齐的非自回归生成器_NAST A Non-Autoregressive Gener

nasT:Netcat 和 Sed，一起

01 LO_NAST3001 TD-LTE组网技术演进分析-PPT-20130731 55.pptx

nast 源码 端口扫描

Nast (Network Analizer Sniffer Tool)-开源

atjson:atjson是用于注释内容的实时内容格式

SAP IDOC - 消息输出 - 采购订单<->销售订单

ABAP Program Tips.pdf

tim-nast:我父亲的网站

ds_assignment:解决此问题并接受conde nast进行的数据科学工作面试

KADhosts：Wersja托管filtrówKAD

Cond-Nast

the-code-done-broke:这是我在CondéNast面试潜在的前端开发职位的候选人时使用的编码练习。 如果您足够聪明，可以在面试之前查找我的GitHub个人资料并找到此存储库，我已经喜欢您了

cnBolt-Extensions-ImageService:使用图像服务代替Bolt的图像字段。 Cloudinary的连接器已包含在内

SAP表.xlsx

Homepay——automatyzacja płatności「Homepay - automatyzacja płatności」-crx插件

MicrobiomeUtilities-开源

webpush-demo:简单的演示，展示了使用nodejs进行Web-push的实现示例

Java-Advanced-Techniques:存储库包含已解决的大学课程“ Java语言编程”任务

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

港大CS（MSC）面试整理

SAP CS客户服务模块基本流程

Cobalt-Strike-4.5

shellcode加载器

cobaltstrike4.3.zip

SAMP算法实现.m

CobaltStrike V4.zip

课程设计报告数字式电缆对线器.docx

最新资源

nast 源码端口扫描

the-code-done-broke:这是我在CondéNast面试潜在的前端开发职位的候选人时使用的编码练习。如果您足够聪明，可以在面试之前查找我的GitHub个人资料并找到此存储库，我已经喜欢您了