没有合适的资源?快使用搜索试试~ 我知道了~
zq论文51
需积分: 0 0 下载量 31 浏览量
2022-08-08
19:34:48
上传
评论
收藏 507KB DOCX 举报
温馨提示
![preview](https://dl-preview.csdnimg.cn/86359023/0001-e5689077b3285d95de75f5a74b921d28_thumbnail-wide.jpeg)
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
试读
14页
zq论文51
资源详情
资源评论
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/86359023/bg1.jpg)
Summary
With the popularization of Internet technology, online shopping has gradually become an
important way for people to consume. In Amazon's online marketplace, customers can rate and
review after purchase, and these ratings and reviews help other customers make purchasing
decisions.
First, we screened the indicators in the three data sets provided to obtain seven important
indicators related to product review analysis. For the review indicators, we used a classifier
based on Naive Bayesian theory to perform sentiment analysis on the review indicators. In order
to achieve the purpose of quantifying indicators and considering the importance of review
indicators, we further extracted the information in the reviews, and based on the reviews and its
relationship with time, we obtained the review length, review density, and total review
indicators.
Secondly, we use the analytic hierarchy process to establish an SRR evaluation model, and
analyze the comprehensive score of a single product based on 9 related indicators such as star
ratings and reviews. In the SRR evaluation model, these basic indicators are combined to obtain
three effective indicators that reflect product quality: enthusiasm for comment, credibility of
comment, and product popularity. Combined with time, a comprehensive evaluation level of the
product is obtained. Using the SRR model, we can score a product.
Then, in order to reflect whether the rating of star ratings can lead to more reviews, we
conducted an analysis of variance on the rating star index and the comment density index, and
concluded that there is a significant relationship between the rating star rating and the comment
density. Different star ratings have an impact on the number of reviews. Considering that
triggering is a long process, we analyze the correlation between the amount of individual reviews
and the total of reviews next month. It turns out that 1-star reviews can trigger more reviews,
while 5-star reviews can't.
Then, we first extracted high-frequency words as features and calculated the TF-IDF of 12 words,
of which 8 words were related to quality description and 4 were unrelated words. Then, a
correlation analysis is performed between their TF-IDF and star ratings, and the results indicate
that there is only a correlation between these specific descriptors and star indicators.
Finally, we have performed some analysis on the sensitivity and advantages and disadvantages of
the model.
Keywords: sentiment analysis, analytic hierarchy process, analysis of variance, TF-IDF
![](https://csdnimg.cn/release/download_crawler_static/86359023/bg2.jpg)
1. Introduction
1.1 问题背景
阳光公司计划在网络上推出和销售三种产品:微波炉、婴儿奶嘴和吹风机。网络市场
是一个竞争力极大的地方,且网络市场信息繁杂,如何从中提取出有用的信息且制定有
效的模式处理信息,是公司改进产品、制定发展计划的重中之重。
亚马逊是一个有名的网络市场,顾客可以在产品销售网店内对产品打星,最低为 1,
最高为 5,他们也可以在评论中发表他们对该产品的看法。
评论是以文字形式储存信息的,如何让计算机从文字中提取出信息并量化成可处理的
数据是这篇论文要解决的问题,除此之外,我们还要从所给的数据中提取出各项指标,
并建立相关模型帮助阳光公司在竞争中占据有利位置。
1.2 文献回顾
1.3 术语和定义
1.2 我们的工作
为了从顾客的反馈中明确产品的质量,我们建立了一个名为 SRR 的模型,该模型能够直
接或间接地分析顾客的反馈,并由此得出产品的质量。在该模型中,我们量化了评论可信度、
评论积极性、产品知名度的重要指标,用这三个指标来确定单条评论中顾客的反馈,然后并
结合时间,分配权重,并得出所有评论中顾客的反馈,进而得出产品的质量。在第(2)节
中,我们陈述了 SRR 模型的基本假设。在第(4)节中,我们给出了模型中使用的每个指标
的详细解释和计算。第(6)节提供了对 SRR 模型的全面分析。
为了明确顾客对产品的评星等级能否引发其他客户的评价,我们对单个产品额外计算了
月评论总量,并进行单因素相关性分析。为了明确某些特定的单词与特定的星级指标是否有
关联,我们提取出 12 个频率较高的单词,并与时间进行相关性分析。
2.假设
1、一个产品的评论数越多,它的关注度越大
一个产品的评论数往往与它的销量正相关,评论越多,代表越多人关注过它,同时在网络市
场,销量多的产品往往被置于商品列表的前列,更容易获得人们关注。
2、时间越早,评论的价值越低
商家在运营过程中,会不断更新销售策略和生产策略,越早的评论对产品的描述差异越大,
对顾客而言参考价值越低
3、没有购买且没有试用产品的顾客的评论可信度较低
亚马逊对没有购买商品的用户也提供了评论功能,没有购买也没有试用用产品的顾客多为商
家寻找的刷单用户或者来自竞争对手的恶意用户,他们的评论没有实际的价值。
4、最有价值的单词应该是在特定文档中出现频率最高的单词,同时在所有文档中出现频率
最低的单词。
3 数据初步分析
3.1 数据预处理
本文采用多元统计处理方法,由于数据来源于真实生活,存在一定误差,且数据间量纲不同,
需要对数据进行预处理。
![](https://csdnimg.cn/release/download_crawler_static/86359023/bg3.jpg)
(1)无关变量剔除
表格中一共给给出 15 个相关指标,观察数据不难发现。所有数据来源于美国;product_id
与评论涉及的产品唯一对应,不同 product_id 可能分属同一个 product_parent,产品分类时
仅使用 product_parent 指标就足够;product_title 是产品描述;product_category 为消费者类
型;这些指标对于基于星级和评论的分析用处不大,因而在实际数据使用过程中,我们将
marketplace、customer_id、review_id、product_id、product_title 指标去除。
(2)量纲处理
考虑到本文的个别指标数据变化范围非常大,直接处理计算量大,且不好分析指标和因素间
的相关关系。为了方便起见,本文将所有数据进行相对化处理,即
max
*
x
j
ij
ij
x
x
�
其中 i 表示试验样本号,j 表示指标编号,x
jmax
表示同一编号指标的最大值。为了方便起见,
以后无特殊说明均省略‘*’符号,用 x
ij
表示相对化后的数据。相对化后的数据并不影响指
标间的相关关系,但处理起来更加方便和直观。
3.2 Quantitative processing of comments
3.2.1. Establishment of comment text indicators
When a product is purchased, reviews from other customers can often play an effective guiding
role. Considering that reviews are unstructured data, it is difficult to directly apply them to our
model. In order to effectively extract the information that a comment can contain, we split the
text information of the comment into length and emotion. The length can reflect the reliability of
a review. The longer the length, the more authentic the emotional expression of customers in
general. Emotions indicate the probability of this review being a positive review, and represent
the degree of customer dislike of the product.
The length is obtained by counting the number of words. For emotions, we try to transform them
into structured data using sentiment analysis. First, we need to do preprocessing such as word
segmentation, face reduction, and then select the characteristics that are most relevant to the
sentiment of the comment. Textblob helped me achieve this step automatically. Then we need
to use a good classification algorithm. After a series of tests and comparisons, we finally chose
the Naive Bayes classifier. If the features are independent, then its performance will be very
good.
The basic principle is as follows.
剩余13页未读,继续阅读
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083646.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![avatar](https://profile-avatar.csdnimg.cn/43d4e9502c884fef830d319bc2b0e25b_weixin_35817272.jpg!1)
glowlaw
- 粉丝: 22
- 资源: 275
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)
评论0