2023年美赛特等奖论文-C-2307946-解密.pdf资源-CSDN文库

版权申诉

183 浏览量 2024-05-06 22:05:54 上传评论收藏 5.99MB PDF 举报

资源推荐

资源详情

资源评论

Problem Chosen

C

2023

MCM/ICM

Summary Sheet

Team Control Number

2307946

Words Behind Wordle: Puzzle Game Analysis

Using Machine Learning and Time Series Theory

Summary

Wordle is a popular puzzle currently offered daily by New York Times. Players try

to solve the puzzle by guessing a ﬁve-letter word in six tries or less, receiving feedback

with every guess. Making full use of relative information can effectively help editors to

improve operational performance.

Firstly, to explain the variation and predict the future value, a time series model based

on the number of reported results is introduced. After determining the optimal groups

of orders, ARIMA(0,1,1) model is used to forecast the prediction interval of the number

of reported results on March 1, 2023, which is [10139.23, 30808.07](80% conﬁdence). To

ﬁnd out if any attributes of the word affect the hard mode percentage, a words attributes

system and a LightGBM model are introduced. The results show that there are some lag

attributes that have some but less effect than lag Hard Mode percentage itself.

Secondly, to predict the associated percentages of (1, 2, 3, 4, 5, 6, X), two models are

established based on GBDT and MMoE. The results show that the MMoE model signif-

icantly outperforms the GBDT model, with MSE of 145. Then, we attemptd to improve

the model by using data augmentation and feature engineering methods. The former

leads to a large amount of noise, which fails to achieve the expected effect, and the latter

slightly improves the model performance. The prediction of the ﬁnal model for the word

EERIE is (0.649, 7.579, 26.298, 32.614, 20.930, 9.63, 2.298).

Thirdly, K-means model is introduced to cluster the samples into 4 groups with the

distribution of attempt times as the features by difﬁculty. In order to determine which

features of the words are associated with the classiﬁcations, we used the classiﬁcation as

the output feature and all the attributes of the words as the input feature to establish a

LightGBM model for training. The accuracy of the test set reaches 70%. The importance

of the output features is sorted. Finally, the model is used to predict the category of the

EERIE word, and the prediction result is Group 2.

Finally, some interesting features of the dataset are found in dataset. The characteris-

tics of large frequency words, the shape of distribution of attempt number and the corre-

lation of the word features are discussed.

In addition, we evaluated the advantages and disadvantages of the model and pro-

posed some suggestions, and carried out a sensitivity analysis of the model to the com-

mission rate, thereby proved the reliability and stability of the model.

Keywords: Wordle ; ARIMA; LightGBM; MMoE; data augmentation; feature engineer-

ing; K-means; sensitivity analysis

剩余25页未读，继续阅读

内容反馈

版权申诉

阿拉伯梳子

粉丝: 1670
资源: 5735

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip