2023美赛c题F奖论文_2023美赛f题资源-CSDN文库

毕业设计

需积分: 2 147 浏览量 2024-01-24 15:44:29 上传评论收藏 1.76MB PDF 举报

资源推荐

资源详情

资源评论

Problem Chosen

2023

MCM/ICM

Summary Sheet

Team Control Number

2313336

Exploring the mysterious distribution out of

the five-letter Wordle game

Summary

In the past year, the five-letter puzzle grid known as Wordle has rapidly gone from being

a popular American puzzle to a global craze. Solving the Wordle problem requires not only a

rich vocabulary but also sophisticated strategies and wisdom. In this paper, we establish a

series of models to predict the number and the distribution of Wordle results.

For Problem 1, to solve the prediction range of the number of reported results, we

analyze the correlation between the total number of players and time in the given table and

use the time series model ARIMA (1,1,0) to describe the changing trend. We analyze the

rationality of the fitting curve and give the prediction range of the number of people on March

1, 2023, which is about 19114 to 19118. To figure out the influence of word factors on the

proportion of difficult mode selection, we choose five important factors related to the

difficulty of words and take 0.05 as the dividing line to analyze the influencing factors. We

can find that these factors do not influence the proportion of difficult mode selection.

For Problem 2, We first established a Multiple Linear Regression model based on five

factors we assumed in Problem 1, and on this basis, we establish the model of Gaussian

Bayesian, in which 







,







are four essential factors. While the first three elements are

all given by analyzing the existing data, we regard 



as parameters that need training.

Through our algorithm, we find the proper 



and preliminarily fitted model. For more

accurate prediction, we introduce Reinforcement Learning, whose advantage is to simulate

the players adjusting their strategies according to the feedback. We define the elements of

reinforcement learning and use a neural network to simulate the strategies. Finally, we

combine the two models and make the prediction of the distribution of the word ‘eerie’,

which is [0.0018,0.0386,0.2498,0.3369,0.2316,0.1128,0.0275] and the average absolute

error is not more than 0.012.

For Problem 3, we introduced the K-Means clustering algorithm to grade the difficulty

of words. We set each word's x and y coordinates as Multiple Linear Regression prediction

results and the average number of steps to complete the goal in Reinforcement Learning. We

set the total number of categories to 5, the larger the number, the more difficult the guess is.

After that, we execute the clustering algorithm and get the specific division of each category.

We use the model to evaluate the word 'eerie'. The result of which belongs to the 5th category

and the accuracy is 92.6%.

Finally, we find some interesting features in the dataset and make a sensitivity analysis

of our model. We calculated that our model has strong accuracy under a wide change of

initialization from -20% to +40% with an average absolute error of 4.4%, which illustrates

that our model has high accuracy and error tolerance.

Keywords: ARIMA, Multiple Linear Regression model, Gaussian Bayesian,

Reinforcement Learning, K-Means clustering algorithm

Team # 2313336    Page  2  of  25 
 
Contents 
 
Introduction ............................................................................................................................... 3 
1 Problem Background......................................................................................................... 3 
2 Restatement of the Problem............................................................................................... 3 
3 Our work .......................................................................................................................... 4 
Assumptions and Justifications ................................................................................................. 5 
Notations .................................................................................................................................... 6 
Data Cleaning and Preprocessing ............................................................................................. 6 
Problem 1: ARIMA Model and selection of important factors ............................................... 6 
1 Overview of using ARIMA model .................................................................................... 6 
1.1 Time series model .................................................................................................. 6 
1.2 The creation and analysis of time series .................................................................. 8 
2 The influence factors of word attributes........................................................................... 11 
Model II: Gaussian-Bayesian Model And Reinforcement Learning...................................... 12 
1 Gaussian-Bayesian Model ............................................................................................... 12 
2 Gaussian-Bayesian Model ............................................................................................... 13 
3 Reinforcement Learning .................................................................................................. 14 
Model III: K-Means Model ..................................................................................................... 17 
Other interesting features ....................................................................................................... 19 
Sensitivity Analysis .................................................................................................................. 20 
Conclusion ............................................................................................................................. 21 
1 Strengths and Weaknesses ............................................................................................. 21 
2 Future improvement： .................................................................................................. 21 
A letter to the Puzzle Editor ....................................................................................................... 22 
References................................................................................................................................... 24 
Appendix .................................................................................................................................... 25 
 
 
 
 
 
 

Team # 2313336 Page 4 of 25

⚫ For a given future solution word on a future date, set up a model to predict the

distribution of the future report results, and evaluate the model. Using a specific

example of the prediction for the word EERIE on March 1, 2023, show the results of

the prediction.

⚫ Develop and summarize a model to solve words by difficulty classification.

Determine the attributes of a given word associated with each classification. Discuss

how difficult it is to judge the word EERIE using this model and the accuracy of this

classification model.

⚫ Find other interesting features of this data set, try to list and describe them.

Finally, we will summarize the results with a one to two page letter to the New York

Times puzzle editor.

1.3 Our work

This paper proposes the model to predict the number and distribution of reported

results at a certain time in the future, which can be divided into three parts. At the very

beginning, we clean and preprocess the data.

 Firstly, we establish the ARIMA model to predict the number of reported results,

and we selected five important factors that affect the difficulty of guessing words

and analyzed their correlation with the proportion of participating in the hard mode.

 Secondly, we establish Multiple Linear Regression model to fit the relationship

between these factors and word complexity. On this basis we establish a Gaussian

Bayesian model to predict the distribution of the reported results and add

Reinforcement Learning to make the predicted results of our model more accurate.

 Thirdly, we carry out K-means clustering based on the prediction results of the first

two questions, classify the words based on their difficulty and make the prediction

of the word ‘eerie’.

 Finally, we make sensitivity analysis and list some other interesting features.

Our work is shown in Figure 2, in which you can have a general understanding of our

work .

剩余24页未读，继续阅读

评论收藏

内容反馈

m0_57819655

粉丝: 0
资源: 2

2023美赛c题F奖论文

2021美赛F奖论文中文版-C题

2020年美赛ABCDEF赛题特等奖论文合辑，共1036页，ORC版本.rar

2016美赛O奖论文合集

2018美赛O奖论文.zip

2016年美赛O奖论文

2018美赛获O奖论文

2020年美赛优秀论文集、美赛O奖论文、美赛模板-word模板-美赛必备、2021年数据建模美赛必备LATEX模板

2018美赛O奖论文.rar

2018年美赛6类题目O奖论文合集

2019美赛O奖论文.zip

2017年美赛O奖论文

2020美赛C题资料.zip

2019数模美赛O奖论文.rar

最全的o奖美赛论文集2018年到2013年

美赛O奖论文

2018美赛O奖论文 (2).zip

美赛O奖论文.zip

2017年美国大学生数学建模竞赛O奖论文

34个经典javaweb项目实例.zip

毕业设计 springBoot人力资源管理系统+毕业论文+前后端源代码

项目源码：基于Hadoop+Spark招聘推荐可视化系统 大数据项目 计算机毕业设计

毕业设计：舆情监测系统（SpringBoot+NLP）

基于spring boot的小区物业管理系统源码+论文+答辩ppt

计算机毕业设计：Flask股票数据采集分析可视化系统 python+爬虫+金融数据

毕业设计 基于javaweb的在线答题平台

毕业设计-基于JAVA的springboot超市进销存系统(源代码+论文）

人脸识别系统OpenCV+dlib+python（含数据库）Pyqt5界面设计 项目源码 毕业设计

基于深度学习的课堂行为识别和考试作弊检测系统的设计与实现（python源码）

基于51单片机的智能电子秤系统设计(含代码仿真及论文)

最新资源

项目源码：基于Hadoop+Spark招聘推荐可视化系统大数据项目计算机毕业设计

毕业设计基于javaweb的在线答题平台

人脸识别系统OpenCV+dlib+python（含数据库）Pyqt5界面设计项目源码毕业设计