2020年美赛优秀论文集-C-2002116.pdf资源-CSDN文库

版权申诉

39 浏览量 2024-03-17 21:39:33 上传评论收藏 7.72MB PDF 举报

### 知识点一：美国大学生数学竞赛与数学建模 - **背景介绍**：美国大学生数学竞赛（MCM/ICM）是一项国际性的赛事，旨在通过解决实际问题来培养学生的数学建模能力、团队协作能力和创新思维。该竞赛吸引了全球众多高校的学生参与。 - **竞赛内容**：参赛队伍需在规定时间内完成一篇关于特定问题的研究报告或解决方案。这些问题通常来源于现实世界，涉及经济、社会、环境等多个领域。 - **奖项设置**：根据解决方案的质量和创新性，设立不同级别的奖项。 ### 知识点二：文本情感分析 - **定义**：文本情感分析（Sentiment Analysis）是一种自然语言处理技术，用于识别和提取文本中的主观信息，判断作者的情感倾向（正面、负面或中性）。 - **应用场景**：广泛应用于社交媒体监控、产品评价分析、市场趋势预测等领域。 - **方法**：传统方法包括基于词典的方法、基于机器学习的方法以及深度学习方法等。近年来，结合多种方法的混合模型因其高准确率而受到青睐。 - **CE-VADER模型**：CE-VADER是一种针对电子商务评论进行情感分析的混合模型，它将传统的VADER模型与深度学习技术相结合，可以更准确地识别和分类用户评论的情感强度。 ### 知识点三：评分系统分析 - **五星级评分系统**：常见的产品或服务评价方式之一，用户可以根据自己的满意度给予1至5星的评分。 - **评分与评论的综合分析**：除了简单的统计平均评分外，还可以结合用户提交的文字评论进行综合分析，以更全面地了解产品的优缺点。 - **信息评估模型**：研究者提出了一个先进的信息评估模型，该模型能够综合考虑评分和评论中的信息，挑选出最具代表性和影响力的评价，为产品改进提供参考。 ### 知识点四：产品声誉评估与预测 - **声誉评估**：通过构建数学模型来评估产品或服务的声誉，例如使用微分方程模型计算“声誉”比率。 - **时间序列预测**：采用时间序列分析方法（如自回归模型）对未来的声誉趋势进行预测，帮助决策者了解产品的长期发展情况。 - **案例分析**：研究者选取了婴儿奶嘴、微波炉和吹风机三种产品作为案例进行分析，结果显示婴儿奶嘴具有良好的声誉并预计会成功，而微波炉和吹风机则因为较差的声誉而被预测失败。 ### 结论通过这篇获奖论文的摘要可以看出，研究团队运用了先进的数学模型和技术手段，对亚马逊平台上的产品评价数据进行了深入分析。不仅开发了一种新的文本情感分析模型——CE-VADER，还提出了一套综合评分和评论的信息评估模型以及产品声誉评估与预测方法。这些成果对于理解消费者行为、改善产品质量以及制定营销策略具有重要的参考价值。

资源推荐

资源详情

资源评论

关注数学模型

获取更多资讯

MATHmodels

Problem Chosen

2020

MCM/ICM

Summary Sheet

Team Control Number

2002116

Riddle of Sphinx: Cracking the Secret of Amazon’s

Ratings and Reviews

Summary

We have witnessed the rise of mass online marketplaces. For example Amazon, one of the

biggest online platforms, is worth around $ 915 billion. Guided by the customer obsession

principle, it provides an opportunity for the customers to rate the products from 1 to 5. More-

over, buyers can submit a text-based message, namely review, to express their feeling towards

the products. The massive data of those ratings and reviews offer a wealth of information re-

mained to be mined. Analysis of text-based messages or rating-based values has received wide

attention, yet there is not a method severs as the combination of both, especially for the case of

an online marketplace.

To address the above-mentioned challenge, we propose a novel CE-VADER hybrid model

for sentiment analysis in reviews, classifying messages into ﬁve groups of strong positive, weak

positive, moderate, weak negative and strong negative. Empirical results indicate that the pro-

posed ﬁve-group classiﬁcation model correlates to the ﬁve-star rating system well. Then a

state-of-art informative evaluation model is proposed as the combination of the text-based and

rating-based measures. We pick out 1% most informative reviews and ratings of each product

to evaluate the properties and propose sales strategies.

We propose the “reputation” rate based on the differential equation model in the literature

to evaluate the reputation of the product. Then we employ an Auto Regression (AR) model as

the time series forecasting method to predict future “reputation” rate and the potential success

or the failure of each product. AR model shows high accuracy on the validation set with a

maximum Root Mean Square Error (RMSE) of 0.131. Paciﬁers have a good reputation and pre-

dicted to be successful while microwaves and hair dryers have bad reputations and predicted

to fail. The results show relevance with the proportions of the continuous ﬁve-star or one-star

rating sequence. Lastly, we analyze speciﬁc words and descriptors to ﬁnd their correlation to

the ratings.

According to our empirical results, we propose some conﬁdent sales strategies and recom-

mendations for the online marketplace, e.g., the timing choice of introducing products into

market, targeted adjustment according to star ratings, etc. We write a letter to the marketing

director of Sunshine Company to summarize our analysis and results, together with our rec-

ommendations.

Our framework shows a strong accuracy, robustness. It can be easily implemented to other

data with our source codes.

Keywords: Text-Based Measure, Informative Text Selection, Reputation Quantiﬁcation, Sales

Strategy Formation.

关注数学模型

获取更多资讯

MATHmodels

Team # 2002116 Page 1 of 38

Riddle of Sphinx: Cracking the Secret of Amazon’s

Ratings and Reviews

March 9, 2020

Contents

1 Introduction 3

2 Assumptions and Notations 4

2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Informative Evaluation Model 4

3.1 Vector Encoding Forms of Star Ratings . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Contextual Entropy VADER Hybrid Model for Text-Based Measures . . . . . . . 5

3.2.1 Manually Annotating the Seed Word . . . . . . . . . . . . . . . . . . . . . 7

3.2.2 Contextual Entropy Block (CE) . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2.3 VADER Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.4 Proposed CE-VADER for Sentiment Analysis . . . . . . . . . . . . . . . . 9

3.3 Combination of Text-Based and Rating-Based Measures . . . . . . . . . . . . . . . 10

3.4 Model Implementation, Sensitivity Analysis and Results . . . . . . . . . . . . . . 11

4 Difference Equation to Measure Time-Based Pattern 11

4.1 Difference Equation Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Model Implementation, Sensitivity Analysis and Results . . . . . . . . . . . . . . 12

5 Predict Potential Success or Failure 14

5.1 Time Series Forecasting for Predicting Future Reputation . . . . . . . . . . . . . . 14

5.2 Evaluating the Success or Failure potential . . . . . . . . . . . . . . . . . . . . . . 14

5.3 Model Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Speciﬁc Ratings and Descriptors Analysis 15

6.1 Speciﬁc Star Ratings Relevance to Rating Frequency . . . . . . . . . . . . . . . . . 16

6.2 Speciﬁc Quality Descriptors’ Relevance to Rating Levels . . . . . . . . . . . . . . 18

6.2.1 Naive Bayesian Model for Evaluation . . . . . . . . . . . . . . . . . . . . . 18

关注数学模型

获取更多资讯

MATHmodels

Team # 2002116 Page 3 of 38

1 Introduction

Our society has witnessed the rise of many online marketplaces, with a total worldwide

market value of 4.3 trillion dollars [1]. One salient feature of the online marketplace compared

with traditional platforms is the massive review of texts and ratings. Among all of them, Ama-

zon has received the most attention, as its greatest success [1]. Amazon also provides customers

with chances to freely express their feeling and rate the products that they have purchased.

Previous work [2] indicates that customers will largely refer to the reviews and ratings be-

fore they buy the product on the platforms. Platforms can adjust their sales strategy by checking

these comments. Hence, the ratings and the reviews both provide references to other potential

buyers and massive data to analyze the demand of the customers, which can help to develop

adaptive strategies. By making full use of these data, we can achieve a win-win situation for

both the buyers and the platform.

One of the biggest challenges is the complexity and diversity of the texts of the reviews [3, 4].

In this paper, we propose a novel sentiment analysis model as the text-based measure to address

this issue. In this paper, we develop a series of models as the combination of text-based, rating-

based, and time-based measures to pick out the most informative ratings and reviews to track.

We also construct a novel evaluation framework to quantify the reputation of each product

and predict potential success or failure. Then, we analyze the correlation between continuous

same star ratings, word descriptors and the reputation of the products. We implement our

model on the real data set generated from three different types of products, namely the paciﬁer,

microwave, and the hair dryer.

Researchers have pointed out the necessity to study when and how the online platforms

should adjust their marketing communication strategy in response to consumer reviews or rat-

ings [5]. We propose several sales strategies and recommendations in this paper based on our

analysis and results.

The rest of the paper is organized as follows. In section 2, we list the main assumptions in

model construction and introduce the notations which will be frequently used in this paper.

In section 3, a novel Information Evaluation Model is proposed. It is made up of a hybrid the

state-of-art CE [6] and VADER [7] for sentiment analysis in the review text. Then we propose

the "importance" rate as a combination of text-based measure (i.e., our proposed CE-VADER

model) and ratings-based measure (i.e., the star-rating and the helpful votes) to indicate how

informative the review and the rating are. To the best of our knowledge, we are the ﬁrst to

propose a review-text-based sentiment analysis model. In section 4, we employ a difference

equation model as the backbone to measure the time pattern of each product. Moreover, the

"reputation" rate is proposed in this section to measure the growth or the decline of the repu-

tation. In section 5, we employ an Auto Regression model (AR) to predict the change of rep-

utation in the future time domain and propose a fuzzy system to predict the potential success

or failure of each product. More details about the results of our model implemented on given

data can be found in section 6,7,8. The strengths and weaknesses of the proposed model and

framework are discussed in section 9. We conclude in section 10. All source codes are attached

to the Appendix D-I and can be easily implemented to other data sets.

关注数学模型

获取更多资讯

MATHmodels

Team # 2002116 Page 4 of 38

2 Assumptions and Notations

2.1 Assumptions

To simplify our model and eliminate the complexity, we make the following main assump-

tions in this literature. All assumptions will be re-emphasized once they are used in the con-

struction of our model.

Assumption 1. The online marketplace operates stably. And there were no situations such as an out-

break of an epidemic which would seriously affect the production chain of online shopping.

Assumption 2. The ratings and reviews depict customers’ real experience and feeling about their pur-

chased products. The sentiment in the review text reﬂects one’s feelings on the products.

Assumption 3. The vast majority of individual differences of customers e.g., economic status and edu-

cational level, are ignored.

Assumption 4. It takes some time for shipping the product. Some customers would prefer making

reviews sometime after receiving the purchased products.

Assumption 5. Consumers pay more attention to the negative comments e.g., low-star rating or nega-

tive reviews when purchasing the products.

2.2 Notations

In this work, we use the nomenclature in Table 1 in the model construction. Other none-

frequent-used symbols will be introduced once they are used.

Table 1: Notations used in this literature

Symbol Deﬁnition Type

id review id String

Star rate, subscript is its associated review id Scalar

Helpful votes, subscript is its associated review id Scalar

Review text, subscript is its associated review id String

Review date, subscript is its associated review id Date

VEC Vector encoding of the star rating Mapping

INT Vector encoding of intensity relevant to 5-class seed words Mapping

IMP Importance rate of review and associated rating Mapping

REP Reputation rate of product at some time Mapping

3 Informative Evaluation Model

In this section, we proposed the "importance" to evaluate how informative the review text

and star rates are. The most informative factor we take into account is the sentiment of the

review text. In this literature, we propose a CE-VADER model to address the sentiment analysis

issue in the review text. Our model will classify the text into ﬁve groups: strong positive, weak

positive, moderate, weak negative and strong negative in the consistency of the ﬁve-star rating

scheme. Then our proposed "importance" will incorporate the text-based measure, star rating

剩余38页未读，继续阅读

评论收藏

内容反馈

版权申诉

阿拉伯梳子

粉丝: 2695
资源: 5734

2020年美赛优秀论文集-C-2002116.pdf

2020年美赛优秀论文集

2020年美赛特等奖论文合集

2020美赛C题O奖2002116论文（中文版）.pdf

2020年美赛特等奖论文合辑.zip

美赛赛题汇总，至2020年

2020年美赛优秀论文集-D-2013140.pdf

2020年美赛优秀论文集-C-2010638.pdf

2020年美赛优秀论文集-C-2003717.pdf

2020年美赛优秀论文集-E-2017963.pdf

2020年美赛优秀论文集-D-2002526.pdf

2020美赛特等奖C题论文

2020优秀论文.zip

2020美赛C题目.zip

2020年美赛C.zip

2020美赛C题资料.zip

2020年美赛优秀论文集-E-2010035.pdf

2020年美赛优秀论文集-F-2005623.pdf

2020年美赛优秀论文集-D-2003723.pdf

2020年美赛优秀论文集-C-2007707.pdf

2020年美赛优秀论文集-B-2013836.pdf

2020美赛C题数据分析论文参考

2020美赛的个人参赛经验，附美赛备战资料以及C题M奖论文.zip

2020年美赛ABCDEF赛题特等奖论文合辑，共1036页，ORC版本

2020美赛特等奖E题论文

2016-2020所有美赛C题o奖论文.zip

2020年美赛优秀论文集-B-2019696.pdf

2020年美赛优秀论文集-A-2002354.pdf

2020年美赛优秀论文集-C-2004647.pdf

2020年美赛优秀论文集-C-2009116.pdf

2020年美赛优秀论文集-A-2007799.pdf

最新资源