没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
随着Web上观点的爆炸式增长,对观点挖掘的研究兴趣日益增长。在这项研究中,我们关注于观点挖掘中的一个重要问题-方面识别(AI),其目的是在实体评论中提取方面术语。以前基于PLSA的AI方法利用2个元组(例如,头部和修饰符的共同出现),其中每个潜在主题都对应一个方面。在这里,我们注意到每次评论还伴随着一个实体及其整体评分,从而导致四元组与前面提到的二元组结合在一起。相信四元组包含更多的共现信息,从而提供更多区分主题的能力,我们提出了四元组PLSA模型,该模型在主题建模中结合了两个项目-实体及其等级,可以更准确地识别方面。与基于2元组PLSA的方法相比,在不同数量的酒店和餐厅评论中进行的实验表明,所提出的模型具有一致且显着的改进。
资源推荐
资源详情
资源评论
Quad-tuple PLSA: Incorporating Entity
and Its Rating in Aspect Identification
Wenjuan Luo
1,2
, Fuzhen Zhuang
1
, Qing He
1
, and Zhongzhi Shi
1
1
The Key Laboratory of Intelligent Information Processing, Institute of Computing
Technology, Chinese Academy of Sciences, Beijing 100190, China
2
Graduate University of Chinese Academy of Sciences, Beijing 100039, China
{luowj,zhuangfz,heq,shizz}@ics.ict.ac.cn
Abstract. With the opinion explosion on Web, there are growing re-
search interests in opinion mining. In this study we focus on an important
problem in opinion mining — Aspect Identification (AI), which aims to
extract aspect terms in entity reviews. Previous PLSA based AI methods
exploit the 2-tuples (e.g. the co-occurrence of head and modifier), where
each latent topic corresponds to an aspect. Here, we notice that each
review is also accompanied by an entity and its overall rating, resulting
in quad-tuples joined with the previously mentioned 2-tuples. Believ-
ing that the quad-tuples contain more co-occurrence information and
thus provide more ability in differentiating topics, we propose a model
of Quad-tuple PLSA, which incorporates two more items — entity and
its rating, into topic modeling for more accurate aspect identification.
The experiments on different numbers of hotel and restaurant reviews
show the consistent and significant improvements of the proposed model
compared to the 2-tuple PLSA based methods.
Keywords: Quad-tuple PLSA, Aspect Identification, Opinion Mining.
1 Introduction
With the Web 2.0 technology encouraging more and more people to participate
in online comments, recent years have witnessed the opinion explosion on Web.
As large scale of user comments accumulate, it challenges both the merchants
and customers to analyze the opinions or make further decisions. As a result,
opinion mining which aims at determining the sentiments of opinions has become
a hot research topic.
Additionally, besides the simple overall evaluation and summary, both cus-
tomers and merchants are becoming increasingly concerned in certain aspects
of the entities. Take a set of restaurant reviews as example. Common restau-
rant aspects include “food”, “service”, “value” and so on. Some guests may be
interested in the “food” aspect, while some may think highly of the “value” or
“service” aspect. To meet these personalized demands, we need to decompose
the opinions into different aspects for better understanding or comparison.
On the other hand, it also brings out perplexity for merchants to digest all
the customer reviews in case that they want to know in which aspect they
P.-N. Tan et al. (Eds.): PAKDD 2012, Part I, LNAI 7301, pp. 392–404, 2012.
c
Springer-Verlag Berlin Heidelberg 2012
Quad-tuple PLSA: Incorporating Entity and Its Rating 393
lack behind their competitors. As pointed out in [12], the task of aspect-based
summarization consists of two subtasks: the first is Aspect Identification (AI),
and the second is sentiment classification and summarization. The study in this
paper mainly focuses on the first task, which aims to accurately identify the
aspect terms in the reviews for certain type of entities.
Hotel: Quality Inn & Suites Downtown Rating:
ƾƾƾƾƾ
Review1: If you are looking for the most elegant hotel, this is not it. If you are looking for
the cheapest, this is not it. If you are looking for the best combination of price, location,
rooms, and staff , the Quality Inn and Suites is a no brainer. A few blocks from the French
Quarter, clean nice rooms, great price, and the staff was awesome. ----by Sabanized
Hotel: L.A. Motel Rating:
ƾƾƾƾ
Review2: Good motel location and good quality! The front desk was helpful, by the way, the
beds could be larger. ----by Jim Porter
Hotel: Hotel Elysee Rating:
ƾ
Review3: The manager was impatient. Beds were small and dirty. Hot water was not
running and the room was smelly. Anyway, it was cheap. ----by Kate Jeniffer
Fig. 1. Sample Reviews
AsshowninFigure1,thereare3reviews on different hotels, where the de-
scription for the same aspect is stained in the same color. One of a recent works
in this area argues that it is more sensible to extract aspects from the phrase
level rather than the sentence level since a single sentence may cover different
aspects of an entity (as shown in Figure 1, a sentence may contain different col-
ored terms) [5]. Thus, Lu et al. decompose reviews into phrases in the form of
(head, modifier) pairs. A head term usually indicates the aspect while a modifier
term reflects the sentiment towards the aspect. Take the phrase “excellent staff”
for example. The head “staff” belongs to the “staff/front desk” aspect, while the
modifier “excellent” shows a positive attitude to it. Utilizing the (head, modifier)
pairs, they explore the latent topics embedded in it with aspect priors. In other
words, they take the these 2-tuples as input, and output the latent topics as the
identified aspects.
In this study, we observe that besides the (head, modifier) pairs each review
is often tied with an entity and its overall rating. As shown in Figure 1, a hotel
name and an overall rating are given for each review. Thus, we can construct
the quad-tuples of
(head, modifier, rating, entity),
which indicates that a phrase of the head and modifier appears in the review
for this entity with the rating. For example, the reviews in Figure 1 include the
following quad-tuples,
剩余12页未读,继续阅读
资源评论
weixin_38692707
- 粉丝: 8
- 资源: 901
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 主题渗透测试&代码审计的内容.zip
- 钓鱼检测数据集VOC+YOLO格式1813张1类别.zip
- 【java毕业设计】酒店管理系统源码(完整前后端+说明文档+LW).zip
- 大数据管理与分析课程设计-基于hadoop实现的图书推荐系统+Java源码+文档说明+课程实验报告(高分作品)
- Python实现基于CNN+LSTM的4位验证码识别项目源码(高分毕业设计)
- 主要用于渗透测试中的字典.zip
- 基于 Java+Mysql 实现的某学校题库管理系统【数据库课程设计】
- C#订单配送管理系统源码数据库 SQL2008源码类型 WebForm
- 【java毕业设计】基于聊天室的远程作业管理系统源码(完整前后端+说明文档+LW).zip
- AI职场领域提示词模板(AI写作提示词)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功