2023年美赛获奖C类论文_2318982.pdf资源-CSDN文库

版权申诉

7 浏览量 2024-03-10 17:07:00 上传评论收藏 9.53MB PDF 举报

### 2023年美赛获奖C类论文解析 #### 概述本文将深入分析一份关于2023年美国数学建模竞赛（MCM/ICM）获奖的C类论文。该论文主要围绕着热门游戏Wordle展开研究，并通过对Wordle数据集的挖掘，揭示了隐藏在游戏结果背后的信息。文章不仅关注了游戏参与度的变化趋势，还探讨了游戏难度的影响因素，并通过建立预测模型来推测未来游戏结果分布的可能性。 #### 关键知识点解析 ### 一、ARIMA模型与Wordle参与度预测 **知识点：** - **ARIMA模型**（自回归积分滑动平均模型）是一种用于时间序列分析和预测的方法。 - **参与度预测**指的是对Wordle玩家数量随时间变化的趋势进行预测。 **详细解释：** - **ARIMA模型构建**：为了预测Wordle的参与度，作者首先收集了一段时间内Wordle玩家的数量数据，并利用这些数据构建了一个ARIMA模型。ARIMA模型能够捕捉时间序列中的趋势、季节性以及随机性等特性。 - **预测区间**：通过ARIMA模型，研究人员得到了2023年3月1日的预测区间，这表明尽管Wordle已经发布一年，但仍然保持着较高的参与度。 ### 二、Wordle游戏难度影响因素分析 **知识点：** - **Hard Mode**是Wordle的一种难度模式，增加了游戏挑战性。 - **多变量线性回归**是一种统计方法，用于分析多个自变量与一个因变量之间的关系。 **详细解释：** - **因素分析**：论文通过建立多变量线性回归模型，分析了影响Hard Mode百分比的因素。研究发现，重复字母的数量和单词出现频率与游戏难度相关。 - **社区影响**：研究还提到，玩家提前从社区获取的游戏难度信息可能会影响他们选择的游戏模式。 ### 三、Markov链与Wordle结果分布预测 **知识点：** - **Markov链**是一种数学模型，用于描述一个系统随着时间推移的状态变化。 - **第一到达分布**是指系统首次达到特定状态的概率分布。 **详细解释：** - **状态简化**：为了简化问题，作者将玩家的游戏状态简化为他们所知道的每种颜色方块的数量。 - **模型构建**：Wordle被建模为一个Markov链，问题转化为求解这个链的第一到达分布。这需要确定初始分布和转移概率，后者依赖于玩家选择的策略。 - **信息量测量**：文中提出了一种方法来衡量状态中的当前信息量，并基于此建立了整个Markov链模型，进而解决了不同策略下的第一到达时间分布问题。 - **策略变化**：考虑到玩家策略可能会随时间变化，论文进一步假设了不同策略选择比例的变化情况，并据此调整了模型。 ### 四、结论与展望本文通过综合运用ARIMA模型、多变量线性回归以及Markov链等多种数学工具，对Wordle这一流行现象进行了深入的研究。研究成果不仅有助于理解Wordle游戏的特点和趋势，也为未来类似在线游戏的设计和发展提供了宝贵的参考依据。此外，该论文的成功也展示了数学建模在解决实际问题中的强大能力。这份2023年美赛获奖C类论文《One Letter Makes a Difference》通过详尽的数据分析和数学建模方法，为读者揭示了Wordle这一现象背后的数学原理和技术应用，具有很高的学术价值和实践意义。

资源推荐

资源详情

资源评论

Problem Chosen

2023

MCM/ICM

Summary Sheet

Team Control Number

2318982

Wordle: One Letter Makes a Diﬀerence

Summary

Since its launch in early 2022, Wordle has sparked a wave of sharing yellow, green and grey

squares on social media. Wordle has simple but challenging rules that requiring only a short

attention span. Based on the Wordle dataset, we dig into the information hidden behind the number

and the percentage of reported results.

First, we focus on the number of reported results that varies over time. We try to build an

ARIMA model providing us with a prediction interval for the number of reported results on March

1, 2023. It indicates that the Wordle still maintains a high level of enthusiasm one year after

its release. Then, we explore the factors inﬂuencing the percentage of Hard Mode. By ﬁtting a

multiple linear regression model, the results show that the number of repeated letters and the

frequency of words are correlated with the diﬃculty of the game. The diﬃculty information that

players obtained from the community in advance may inﬂuence their choice of game mode.

Next, we are curious how the distribution of the reported results would change in the future.

To simplify the model, we generalize the player’s game states to their known number of squares of

each color. Wordle can then be modeled as a Markov chain, and the problem is transformed into

solving the ﬁrst-arrival distribution of it. This requires knowledge of the initial distribution

and transfer probabilities relying on the strategies chosen by players. In addition, the transfer

probability is assumed to depend on the diﬀerence in the amount of information between states. So

we propose a method to measure the current amount of information in the states. Based on this, we

model the entire Markov chain and solve the ﬁrst reach-time distribution under diﬀerent strategies.

To make the model more reasonable, it is assumed that the proportion of people choosing the

above two strategies varies with time. Accordingly, a method based on historical data is proposed

to estimate this proportion. Finally, we combine the estimated proportion with a Gaussian process

regression model to predict the future proportion of player strategy choices. This is then combined

with the Markov chains model to predict the distribution of future reported results. We ﬁnally

obtain the distribution of EERIE, which is (0.00, 0.15, 11.05, 28.44, 35.46, 21.16, 3.76).

Finally, we want to classify words according to their diﬃculty. Since word diﬃculty is only

related to the word itself, it is believed that clustering according to word attributes can reﬂect the

diﬃculty level of words. For this idea, K-Prototypes clustering is performed and reasonable word

diﬃculty index is set. Then, we extract the diﬃculty information of each category, and then plot

the density function and calculate Kullback-Leibler divergence. Both of results show that words

with diﬀerent attributes have diﬀerent diﬃculty levels. It proves that our idea is reasonable and the

classiﬁcation model is accurate. Further, we classify the EERIE into “hard” class by its attributes,

which is consistent with the percentage distribution obtained above. In addition, we discuss other

information about the dataset, such as the diﬃcult words, the easy words and the unexpected words.

Finally, the sensitivity analysis of the model shows the good robustness of our model.

Keywords: ARIMA; multiple linear regression; Markov chains; K-Prototypes clustering

更多数模资讯和学习资料，请关注b站/公众号：数学建模BOOM

b站主页：https://space.bilibili.com/350975620

Contents

1 Introduction 3

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Restatement of The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Model Assumptions and Notations 4

2.1 Assumptions and Justiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Data Preprocessing 5

4 Task 1: Number Prediction and Word Attributes 5

4.1 Number Prediction Based on ARIMA Model . . . . . . . . . . . . . . . . . . . . . 5

4.2 Eﬀect of Word Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2.1 Attributes of The Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Task 2: Distribution based on Markov Chain Model 14

5.1 State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Initial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Transfer Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.4 Distribution of Reported Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.5 Proportion of Two Strategies Used . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.6 Predicting The Distribution of Future Reporting Results . . . . . . . . . . . . . . . 19

6 Task 3: Classiﬁcation of Solution Words 20

6.1 Diﬃculty Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2 K-Prototypes Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2.1 Solving Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.3 Diﬃculty Classiﬁcation of Solution Words . . . . . . . . . . . . . . . . . . . . . . 21

6.4 Diﬃculty of The Word EERIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7 Task 4: Other Interesting Features 21

8 Sensitivity Analysis 23

9 Modle Evaluation and Further Discussion 23

9.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

9.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

10 A Letter to The Puzzle Editor 24

References 25

Team # 2318982 Page 3 of 25

1 Introduction

1.1 Background

Wordle is a popular ﬁve-letter puzzle game oﬀered daily by the New York Times, where players

try to guess the right words in 6 tries or less, getting feedback with each guess. It’s available in

over 60 languages and has two levels: regular and Hard Mode. In Hard Mode, the letters that were

correctly guessed must be used in subsequent guesses. After a guess, tiles change color: yellow =

letter in wrong place, green = letter in right place and gray = letter not included.

1.2 Restatement of The Problem

Considering the background information and related conditions given in the title, we need to

solve the following problems:

• Develop a model to explain daily variations of reported results, and use it to create a prediciton

interval for the number of results on March 1, 2023. Is the percentage of Hard Mode scores

aﬀected by the word properties? If yes, how? If no, why not?

• Develop a model to predict the solution’s (1,2,3,4,5,6,X) distribution for a speciﬁc future

word. Discuss the uncertainties associated with the prediction. Provide an example of the

predictions for EERIE on March 1, 2023, and the conﬁdence in the model.

• Create a classiﬁcation model to classify the words based on their diﬃculty, and describe

the particular attributes for each. How diﬃcult is the word EERIE according to the model?

Evaluate the model’s accuracy.

• Lastly, describe other interesting features in the dataset.

1.3 Our Work

Considering the background and the problems, our work mainly includes the following:

• We hypothesized that the number of reported results on March 1st, 2023 could be predicted

through building an ARIMA model with optimal parameters. To gain further insight into the

word attributes, we ran a multiple linear regression to examine the eﬀect of word attributes

on the percentage of scores reported in the diﬃculty model.

• We modeled the process of playing wordle games as a discrete-state Markov chain and derived

two game strategies based on the derived information. We then estimated the distribution of

reported outcomes for the two strategies, using theoretical tools such as information entropy

and Markov chain properties. The obtained outcomes were subsequently combined to make

predictions regarding the distribution of reported outcomes at a future date.

• Furthermore, the diﬃculty of any given word is determined by its attributes. As such,

clustering words by their attributes could provide valuable insight into the diﬃculty of each

respective category.

• Finally, after a close analysis of the dataset, we observed several noteworthy characteristics.

In order to avoid complicated description, intuitively reﬂect our work process, the ﬂow chart is

shown in Figure 1.

剩余24页未读，继续阅读

评论收藏

内容反馈

版权申诉

阿拉伯梳子

粉丝: 2535
资源: 5734

2023年美赛获奖C类论文_2318982.pdf

2023年美赛论文模板.docx

2023年美赛获奖C类论文_2307166.pdf

2023年美赛获奖A类论文_2316994.pdf

2023年美赛获奖E类论文_2307336.pdf

2023年美赛获奖E类论文_2301428.pdf

2023年美赛获奖D类论文_2304962.pdf

2023年美赛获奖B类论文_2300136.pdf

2023年美赛获奖B类论文_2315379.pdf

2023年美赛获奖A类论文_2303950.pdf

2023年美赛获奖C类论文_2300348.pdf

2023年美赛获奖C类论文_2318036.pdf

2023年美赛获奖C类论文_2310767.pdf

2023年美赛获奖C类论文_2301192.pdf

2023年美赛获奖C类论文_2314151.pdf

2023年美赛获奖C类论文_2322645.pdf

2023年美赛获奖C类论文_2311035.pdf

2023年美赛获奖E类论文_2314354.pdf

2023年美赛获奖F类论文_2311258.pdf

2023年美赛获奖C类论文_2309397.pdf

2023年美赛获奖A类论文_2300336.pdf

2023年美赛获奖E类论文_2314817.pdf

2023年美赛获奖F类论文_2311517.pdf

2023年美赛获奖D类论文_2303967.pdf

2023年美赛获奖D类论文_2300229.pdf

2023年美赛获奖B类论文_2318300.pdf

2023年美赛获奖A类论文_2322687.pdf

2023年美赛获奖A类论文_2321860.pdf

2023年美赛获奖E类论文_2312411.pdf

2023年美赛获奖A类论文_2300661.pdf

2023年美赛获奖F类论文_2315018.pdf

最新资源