word vectors trained from a large corpus of 5.8 million re-
views for similarity comparison. The word vectors are re-
garded as a form of prior knowledge learned from the past
data. For example, “photo” is a synonym of “picture.” Us-
ing the DP method, “picture” is extracted as an aspect from
the sentence “The picture is blurry,” but “photo” is not ex-
tracted from the sentence “The phone is good, but not its
photos.” One reason for the inability to extract “photo” is
that to ensure good extraction precision and recall, many
useful rules with low precision are not used. The proposed
semantic similarity-based recommendation makes use of the
extracted aspect “picture” to recommend “photo” based on
the semantic similarity of the two words.
However, “picture” cannot be used to recommend “bat-
tery” as an aspect because their semantic similarity value is
very small. To recommend “battery” (if it is not extracted),
we use the second form of recommendation, i.e., aspect as-
sociations or correlations. The idea is that many aspects are
correlated or co-occur across domains. For example, those
products with the aspect “picture” also have a high chance
of using batteries as pictures are usually taken by digital de-
vices which need batteries. If such associations can be dis-
covered, they can be used in recommendation of additional
aspects. For this purpose, we employ association rules from
data mining (Agrawal and Srikant 1994) which fit our needs
very well. To mine associations, we use the extraction re-
sults from reviews of many other products or domains in the
lifelong learning fashion (Chen and Liu 2014).
In our experiments, we use a popular aspect extraction
evaluation corpus from (Hu and Liu 2004) and a new corpus
from (Liu et al. 2015). To learn word vectors and aspect as-
sociations, we use two large collections of product reviews.
Experimental results show that the two forms of recommen-
dations can recommend very reliable aspects, and the ap-
proach that employs both recommendations outperforms the
state-of-the-art dependency rule-based methods markedly.
Related Work
There are two main approaches to aspect extraction: su-
pervised and unsupervised. The former is mainly based on
CRF (Jakob and Gurevych 2010; Choi and Cardie 2010;
Mitchell et al. 2013), while the latter is mainly based on
topic modeling (Mei et al. 2007; Titov and McDonald
2008; Li, Huang, and Zhu 2010; Brody and Elhadad 2010;
Wang, Lu, and Zhai 2010; Moghaddam and Ester 2011;
Mukherjee and Liu 2012), and syntactic rules designed us-
ing dependency relations (Zhuang, Jing, and Zhu 2006;
Wang and Wang 2008; Wu et al. 2009; Zhang et al. 2010;
Qiu et al. 2011).
On the supervised approach, CRF-based methods need
manually labeled training data. Our method is unsupervised.
On the unsupervised approach, topic modeling often only
gives some rough topics rather than precise aspects as a top-
ical term does not necessarily mean an aspect. For example,
in a battery topic, a topic model may find topical terms such
as “battery,” “life,” and “time,” etc., which are related to bat-
tery life (Lin and He 2009; Zhao et al. 2010; Jo and Oh 2011;
Fang and Huang 2012), but each word is not an aspect.
There are also frequency-based methods (Hu and Liu
2004; Popescu and Etzioni 2005; Zhu et al. 2009), word
alignment methods (Liu et al. 2013), label propagation
methods (Zhou, Wan, and Xiao 2013), and other methods.
This paper is most related to the DP method (Qiu et al.
2011), and aims to improve it. Since our method employs
word vectors learned from a large collection of product re-
views, it is also related to (Xu, Liu, and Zhao 2014), which
proposed a joint opinion relation detection method OCDNN.
Although they also used word vectors to represent words in
the neural network training, they used that as a feature repre-
sentation in their classification. The work (Pavlopoulos and
Androutsopoulos 2014) explored the word vectors trained
on English Wikipedia to compute word similarities used in
a clustering algorithm. However, our work is quite different,
we train word vectors using a large review corpus and use
them to recommend aspects.
Our work is also related to topic modeling-based methods
in (Chen and Liu 2014; Chen, Mukherjee, and Liu 2014) as
they also used multiple past domains to help aspect extrac-
tion in the lifelong learning fashion. However, they can only
find some rough topics as other topic models. We can find
more precise aspects with the help of multiple past domains.
In (Liu et al. 2015), a rule selection method is proposed to
improve DP, but it is a supervised method.
Overall Algorithm
This section introduces algorithm AER (Aspect Extraction
based on Recommendation), Algorithm 1, which consists of
two main steps: base extraction and recommendation.
Algorithm 1 AER(D
t
, R
−
, R
+
, O)
Input: Target dataset D
t
, high precision aspect extraction
rules R
−
, high recall aspect extraction rules R
+
, seed
opinion words O
Output: Extracted aspect set A
1: T
−
← DPextract(D
t
, R
−
, O);
2: T
+
← DPextract(D
t
, R
+
, O);
3: T ← T
+
− T
−
;
4: T
s
← Sim-recom(T
−
, T );
5: T
a
← AR-recom(T
−
, T );
6: A ← T
−
∪ T
s
∪ T
a
.
Step 1 (base extraction, lines 1-2): Given the target doc-
ument collection D
t
for extraction and a set O of seed opin-
ion words, this step first uses the DP method (DPextract) to
extract an initial (or base) set T
−
of aspects employing a set
R
−
of high precision rules (line 1). The set of high preci-
sion rules are selected from the set of rules in DP by evalu-
ating their precisions individually using a development set.
The set T
−
of extracted aspects thus has very high precision
but not high recall. Then, extract a set T
+
of aspects from
a larger set R
+
of high recall rules (R
−
⊆R
+
) also using
DPextract (line 2). The set T
+
of extracted aspects thus has
very high recall but not high precision.
Step 2 (recommendation, lines 3-6): This step recom-
mends more aspects using T
−
as the base to improve the re-
call. To ensure recommendation quality, we require that the