没有合适的资源?快使用搜索试试~ 我知道了~
NAST一种用于无监督文本样式转换的具有单词对齐的非自回归生成器_NAST A Non-Autoregressive Gener
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 175 浏览量
2022-01-18
09:06:00
上传
评论
收藏 808KB PDF 举报
温馨提示
试读
14页
NAST一种用于无监督文本样式转换的具有单词对齐的非自回归生成器_NAST A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer.pdf
资源推荐
资源详情
资源评论
NAST: A Non-Autoregressive Generator with Word Alignment
for Unsupervised Text Style Transfer
Fei Huang, Zikai Chen, Chen Henry Wu, Qihan Guo, Xiaoyan Zhu, Minlie Huang
∗
The CoAI group, DCST; Institute for Artificial Intelligence;
State Key Lab of Intelligent Technology and Systems;
Beijing National Research Center for Information Science and Technology;
Tsinghua University, Beijing 100084, China.
Abstract
Autoregressive models have been widely used
in unsupervised text style transfer. Despite
their success, these models still suffer from
the content preservation problem that they usu-
ally ignore part of the source sentence and
generate some irrelevant words with strong
styles. In this paper, we propose a Non-
Autoregressive generator for unsupervised text
Style Transfer (NAST), which alleviates the
problem from two aspects. First, we observe
that most words in the transferred sentence can
be aligned with related words in the source sen-
tence, so we explicitly model word alignments
to suppress irrelevant words. Second, existing
models trained with the cycle loss align sen-
tences in two stylistic text spaces, which lacks
fine-grained control at the word level. The pro-
posed non-autoregressive generator focuses on
the connections between aligned words, which
learns the word-level transfer between styles.
For experiments, we integrate the proposed
generator into two base models and evaluate
them on two style transfer tasks. The re-
sults show that NAST can significantly im-
prove the overall performance and provide ex-
plainable word alignments. Moreover, the non-
autoregressive generator achieves over 10x
speedups at inference. Our codes are available
at https://github.com/thu-coai/NAST.
1 Introduction
Text style transfer aims at changing the text
style while preserving the style-irrelevant contents,
which has a wide range of applications, e.g., senti-
ment transfer (Shen et al., 2017), text formalization
(Rao and Tetreault, 2018), and author imitation
(Jhamtani et al., 2017). Due to the lack of parallel
training data, most works focus on unsupervised
text style transfer using non-parallel stylistic data.
The cycle consistency loss (Zhu et al., 2017),
a.k.a. the back-translation loss (Lample et al., 2018,
*
Corresponding author: Minlie Huang.
Not great, but good atmosphere and great service
Not terrible, but not very good
Source:
Transferred:
Autoregressive Generation
(a) Existing Style Transfer Model
Not perfect , but
indeed
very good
Not terrible , but not very good
Source:
Target:
(b) Observation of Word Alignment
Not perfect , but very good indeed
Not terrible , but
not very good
veryNot terrible , but good [Mask]
Source:
Transferred:
Step 1. Alignment Prediction
Aligned:
(c) NAST (Ours)
Two Step Decomposition
Step 2. Non-autoregressive Generation
61 2 3 4 7 0
5
1 2 3 4 6
7
Figure 1: Sentiment transfer examples (negative to pos-
itive). (a) Existing models without word alignments
may generate words irrelevant to the source sen-
tence. (b) An example of word alignments between the
source and target sentences. Arrows connect aligned
words (identical or relevant), and blue words are not
aligned. (c) NAST’s generation process. Step 1: gener-
ate the index of aligned words. [Mask] is a placeholder
for unaligned words. Step 2: generate the transferred
sentence non-autoregressively.
2019), has been widely adopted by unsupervised
text style transfer models (Dai et al., 2019; He et al.,
2020; Yi et al., 2020). Specifically, the cycle loss
minimizes the reconstruction error for the sentence
transferred from style
X
to style
Y
and then back to
X
, which aligns the sentences in two stylistic text
spaces to achieve the transfer and preserve style-
irrelevant contents. The cycle-loss-based models
are trained in an end-to-end fashion, and thus can
be easily applied to different datasets.
Although cycle-loss-based models yield promis-
ing results, one of their major failure cases is to
replace some part of the source sentence with irrel-
evant words that have strong styles, as shown in Fig
1(a). This problem degrades content preservation
and can be alleviated from two perspectives.
First
,
we observe that most words in the human-written
arXiv:2106.02210v1 [cs.CL] 4 Jun 2021
transferred sentence can be aligned with those in
the source sentence. As shown in Fig 1(b), we can
align “Not” with “Not”, “terrible” with “perfect”,
and leave only a few words unaligned. It shows
that humans regard the alignments between words
as a key aspect of content preservation, but they are
not explicitly modeled by cycle-loss-based models
yet.
Second
, existing models use the cycle loss to
align sentences in two stylistic text spaces, which
lacks control at the word level. For example, in
sentiment transfer, “tasty” should be mapped to
“awful” (because they both depict food tastes) but
not “expensive”. We utilize a non-autoregressive
generator to model the word-level transfer, where
the transferred words are predicted based on con-
textual representations of the aligned source words.
In this paper, we propose a Non-Autoregressive
generator for unsupervised Style Transfer (NAST),
which explicitly models word alignment for better
content preservation. Specifically, our generation
process is decomposed into two steps: first pre-
dicting word alignments conditioned on the source
sentence, and then generating the transferred sen-
tence with a non-autoregressive (NAR) decoder.
Modeling word alignments directly suppresses the
generation of irrelevant words, and the NAR de-
coder exploits the word-level transfer. NAST can
be used to replace the autoregressive generators
of existing cycle-loss-based models. In the exper-
iments, we integrate NAST into two base mod-
els: StyTrans (Dai et al., 2019) and LatentSeq (He
et al., 2020). Results on two benchmark datasets
show that NAST steadily improves the overall per-
formance. Compared with autoregressive models,
NAST greatly accelerates training and inference
and provides better optimization of the cycle loss.
Moreover, we observe that NAST learns explain-
able word alignments. Our contributions are:
•
We propose NAST, a Non-Autoregressive gen-
erator for unsupervised text Style Transfer. By
explicitly modeling word alignments, NAST sup-
presses irrelevant words and improves content
preservation for the cycle-loss-based models. To
the best of our knowledge, we are the first to
introduce a non-autoregressive generator to an
unsupervised generation task.
•
Experiments show that incorporating NAST in
cycle-loss-based models significantly improves
the overall performance and the speed of training
and inference. In further analysis, we find that
NAST provides better optimization of the cycle
loss and learns explainable word alignments.
2 Related Work
Unsupervised Text Style Transfer
We categorize style transfer models into three
types. The first type (Shen et al., 2017; Zhao et al.,
2018; Yang et al., 2018; John et al., 2019) disen-
tangles the style and content representations, and
then combines the content representations with the
target style to generate the transferred sentence.
However, the disentangled representations are lim-
ited in capacity and thus hardly scalable for long
sentences (Dai et al., 2019). The second type is
the editing-based method (Li et al., 2018; Wu et al.,
2019a,b), which edits the source sentence with sev-
eral discrete operations. The operations are usually
trained separately and then constitute a pipeline.
These methods are highly explainable, but they
usually need to locate and replace the stylist words,
which hardly applies to complex tasks that require
changes in sentence structures. Although our two-
step generation seems similar to a pipeline, NAST
is trained in an end-to-end fashion with the cycle
loss. All transferred words in NAST are gener-
ated, not copied, which is essentially different from
these methods. The third type is based on the cy-
cle loss. Zhang et al. (2018); Lample et al. (2019)
introduce the back translation method into style
transfer, where the model is directly trained with
the cycle loss after a proper initialization. The fol-
lowing works (Dai et al., 2019; Luo et al., 2019;
He et al., 2020; Yi et al., 2020) further adopt a style
loss to improve the style control.
A recent study (Zhou et al., 2020) explores the
word-level information for style transfer, which is
related to our motivation. However, they focus on
word-level style relevance in designing novel objec-
tives, while we focus on modeling word alignments
and the non-autoregressive architecture.
Non-Autoregressive Generation
Non-AutoRegressive (NAR) generation is first
introduced in machine translation for parallel de-
coding with low latency (Gu et al., 2018). The
NAR generator assumes that each token is gener-
ated independently of each other conditioned on
the input sentence, which sacrifices the generation
quality in exchange for the inference speed.
Most works on NAR generation focus on im-
proving the generation quality while preserving the
speed acceleration in machine translation. Gu et al.
(2018) find the decoder input is critical to the gener-
① Simple Alignment
Source Sentence:
so far I am not impressed
Alignment:
Aligned Sentence:
so far I am not impressed
Transferred Sentence:
so far I am very impressed
Source Sentence:
worst food and will never come back
Alignment:
Aligned Sentence:
worst [Mask] food and will come back
Transferred Sentence:
most delicious food and will come back
② Learnable Alignment
…
Transformer Decoder
…
Transformer Encoder
Non-Differentiable
Approximated Gradients
Differentiable
Source Sentence
Aligned Sentence
Transferred Sentence
(a)
(b)
= [
,
, ⋯ ,
]
Predicted Alignment
1, 2, 3, 4, 5, 6 = [
]
1, 2, 3, 4, 6, 7 = [ ]0,
…
…
…
Figure 2: (a) Architecture of NAST transferring X to Y . S
Y
is the target style. The NAR decoder generates each
word y
i
independently. (b) Examples of two alignment prediction strategies. Simple Alignment: each y
i
is aligned
with x
i
. Learnable Alignment: a network predicts the alignment, where t
k
= 0 indicates a [Mask] placeholder.
ation quality. Several works (Bao et al., 2019; Ran
et al., 2019) improve the decoder input by aligning
source words with target words, which utilize a
two-step generation process and inspire the design
of NAST. To our knowledge, only a few works of
NAR generation explore applications other than
machine translation (Han et al., 2020; Peng et al.,
2020). We are the first to apply NAR generators to
an unsupervised text generation task, which surpris-
ingly outperforms autoregressive models in transfer
quality besides the acceleration.
3 Methods
In this paper, we formulate the unsupervised text
style transfer as follows: for two non-parallel cor-
pora with styles
X
and
Y
respectively, the task aims
at training a style transfer model
G
. The model
learns the transfer of two directions,
X → Y
and
Y → X
, which can be denoted as
P
G
Y
(Y |X)
and
P
G
X
(X|Y ), respectively.
3.1 NAST
NAST is a non-autoregressive generator based on
the observation of the word alignment: in style
transfer tasks, most generated words can be aligned
with the source words, where each pair of the
aligned words is either identical or highly rele-
vant. For simplicity, we only describe
G
Y
, where
G
X
shares the architecture and parameters ex-
cept style embeddings. Given the source sentence
X = [x
1
, x
2
, · · · , x
N
]
, the generation process of
NAST is decomposed into two steps: predicting the
alignment
T = [t
1
, t
2
, · · · , t
M
]
, and then generat-
ing the transferred sentence
Y = [y
1
, y
2
, · · · , y
M
]
.
When
1 ≤ t
i
≤ N
, the generated word
y
i
is aligned
with the source word
x
t
i
. Otherwise,
y
i
is not
aligned with any source word, where we set
t
i
to
0
and fill
x
t
i
with a [Mask] placeholder. Formally,
we regard
T
as a latent variable, and the generation
probability is formulated as
P
G
Y
(Y |X) =
X
T
P
G
Y
(Y |X, T )P
G
Y
(T |X), (1)
where P
G
Y
(T |X) and P
G
Y
(Y |X, T ) are modeled
by an alignment predictor and a non-autoregressive
decoder, respectively, as shown in Fig 2.
3.1.1 Alignment Predictor
The alignment predictor predicts the target length
M
and the alignment
T
conditioned on the source
sentence
X
. We utilize a Transformer (Vaswani
et al., 2017) to encode the source sentence and then
explore two alternative strategies to predict T .
Simple Alignment.
Simple Alignment assumes
that the source and target sentences have the same
length, and each generated word
y
i
is exactly
aligned with the source word x
i
. Formally,
P
G
Y
(T |X) = I[M = N]
M
Y
i=1
I[t
i
= i],
where
I[·]
is the indicator function. A similar strat-
egy has been adopted by editing-based methods
(Wu et al., 2019b; Helbig et al., 2020), where they
simply replace several words in the source sentence.
Although this strategy cannot alter the sentence
length, it empirically works well on simple tasks,
such as sentiment transfer.
Learnable Alignment.
Inspired by Ran et al.
(2019); Bao et al. (2019), we utilize a pointer net-
work (Vinyals et al., 2015) on top of the encoder,
which predicts the alignment T :
P
G
Y
(T |X) =
M
Y
i=1
P
G
Y
(t
i
|X, t
<i
).
The pointer network is essentially an autoregressive
generator, but it only generates the alignment
t
i
pointing to a source word.
剩余13页未读,继续阅读
资源评论
易小侠
- 粉丝: 6476
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功