没有合适的资源?快使用搜索试试~ 我知道了~
Analogies Explained_Towards Understanding Word Embeddings 词嵌入1
需积分: 0 0 下载量 156 浏览量
2022-08-03
15:14:18
上传
评论
收藏 619KB PDF 举报
温馨提示
试读
9页
Carl Allen 1 Timothy Hospedales 1Word embeddings generated by neural networkmeth
资源详情
资源评论
资源推荐
Analogies Explained: Towards Understanding Word Embeddings
Carl Allen
1
Timothy Hospedales
1
Abstract
Word embeddings generated by neural network
methods such as word2vec (W2V) are well known
to exhibit seemingly linear behaviour, e.g. the
embeddings of analogy “woman is to queen as
man is to king” approximately describe a paral-
lelogram. This property is particularly intriguing
since the embeddings are not trained to achieve
it. Several explanations have been proposed, but
each introduces assumptions that do not hold in
practice. We derive a probabilistically grounded
definition of paraphrasing that we re-interpret
as word transformation, a mathematical descrip-
tion of “
w
x
is to
w
y
”. From these concepts we
prove existence of linear relationships between
W2V-type embeddings that underlie the analogi-
cal phenomenon, identifying explicit error terms.
1. Introduction
The vector representation, or embedding, of words under-
pins much of modern machine learning for natural language
processing (e.g. Turney & Pantel (2010)). Where, previ-
ously, embeddings were generated explicitly from word
statistics, neural network methods are now commonly used
to generate neural embeddings that are of low dimension
relative to the number of words represented, yet achieve
impressive performance on downstream tasks (e.g. Turian
et al. (2010); Socher et al. (2013)). Of these, word2vec
2
(W2V) (Mikolov et al., 2013a) and Glove (Pennington et al.,
2014) are amongst the best known and on which we focus.
Interestingly, such embeddings exhibit seemingly linear be-
haviour (Mikolov et al., 2013b; Levy & Goldberg, 2014a),
e.g. the respective embeddings of analogies, or word rela-
tionships of the form “
w
a
is to
w
a
∗
as
w
b
is to
w
b
∗
”, often
satisfy
w
a
∗
−w
a
+ w
b
≈ w
b
∗
, where
w
i
is the embedding
1
School of Informatics, University of Edinburgh. Correspondence
Proceedings of the
36
th
International Conference on Machine
Learning, Long Beach, California, PMLR 97, 2019. Copyright
2019 by the author(s).
2
Throughout, we refer to the more commonly used Skipgram im-
plementation of W2V with negative sampling (SGNS).
of word
w
i
. This enables analogical questions such as “man
is to king as woman is to ..?” to be solved by vector addi-
tion and subtraction. Such high order structure is surprising
since word embeddings are trained using only pairwise word
co-occurrence data extracted from a text corpus.
We first show that where embeddings factorise pointwise mu-
tual information (PMI), it is paraphrasing that determines
when a linear combination of embeddings equates to that of
another word. We say
king
paraphrases
man
and
royal
, for
example, if there is a semantic equivalence between
king
and
{man, royal}
combined. We can measure such equiva-
lence with respect to probability distributions over nearby
words, in line with Firth’s maxim “You shall know a word
by the company it keeps” (Firth, 1957). We then show that
paraphrasing can be reinterpreted as word transformation
with additive parameters (e.g. from
man
to
king
by adding
royal
) and generalise to also allow subtraction. Finally, we
prove that by interpreting an analogy “
w
a
is to
w
a
∗
as
w
b
is to
w
b
∗
” as word transformations
w
a
to
w
a
∗
and
w
b
to
w
b
∗
sharing the same parameters, the linear relationship
observed between word embeddings of analogies follows
(see overview in Fig 4). Our key contributions are:
•
to derive a probabilistic definition of paraphrasing and
show that it governs the relationship between one (PMI-
derived) word embedding and any sum of others;
•
to show how paraphrasing can be generalised and inter-
preted as the transformation from one word to another,
giving a mathematical formulation for “w
x
is to w
x
∗
”;
•
to provide the first rigorous proof of the linear relation-
ship between word embeddings of analogies, including
explicit, interpretable error terms; and
•
to show how these relationships materialise between
vectors of PMI values, and so too in word embeddings
that factorise the PMI matrix, or approximate such a
factorisation e.g. W2V and Glove.
2. Previous Work
Intuition for the presence of linear analogical relationships,
or linguistic regularity, amongst word embeddings was first
suggested by Mikolov et al. (2013a;b) and Pennington et al.
(2014), and has been widely discussed since (e.g. Levy &
Goldberg (2014a); Linzen (2016)). More recently, several
theoretical explanations have been proposed:
李诗旸
- 粉丝: 26
- 资源: 329
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0