2023年美赛特等奖论文-C-2314151-解密.pdf资源-CSDN文库

版权申诉

数学建模

12 浏览量 2024-05-06 22:05:56 上传评论收藏 2.13MB PDF 举报

资源推荐

资源详情

资源评论

Problem Chosen

2023

MCM/ICM

Summary Sheet

Team Control Number

2314151

Breaking the Wordle

Summary

As Wordle has become popular on social media, more and more users have played the scrabble

game. How do time and word attributes aﬀect the number of reports, distribution of attempts, and

other report-related information? Therefore, a modeling analysis was conducted using the game data

from 2022.

Before building the model, we cleaned and normalized the given data and identiﬁed word at-

tributes such as the number of repeated letters, number of vowel letters, number of consonant letters,

commonness, and frequency. Preliminary preparations were made for model building and solving.

First, to predict the number of future reports, a prophet-based time-series prediction model was

built, considering the eﬀects of trends, seasonality, and holidays. The predictions yielded a range

of report numbers for March 1, 2023: [10355,18742]. Regarding the variation of report numbers,

during the week, the number of reports tends to be highest on Wednesdays and lowest on weekends.

In exploring the eﬀect of word attributes on the proportion of diﬃculty reports, we calculated higher-

order partial correlation coeﬃcients for both, controlling for the interaction between word attributes,

and found that the number of vowel letters, the number of non-repeats, and word commonness were

negatively correlated. The number of consonant letters and the number of non-repeats was positively

correlated.

Secondly, an optimized multi-objective regression prediction framework was developed to

explore the eﬀects of word attributes on the distribution of reported outcomes. The framework chose

the optimal lasso regression to predict the test set with an RMSE of 0.80. The distribution of the

number of attempts to predict ’EERIE’ was (0, 4, 17, 34, 30, 13, 2). The ranking importance of each

attribute was calculated, and it was found that the number of consonant letters, number of vowel letters,

and frequency had a more signiﬁcant inﬂuence on the distribution of reported results with the inﬂuence

factors of 4.226, 3.993, and 1.253, respectively.

Next, the above model was used to predict the distribution of reported outcomes for each word in

the 5-letter word set. Then, K-means was used to classify the words into high (≥4.37), medium (4.13-

4.37), and low (<4.13) diﬃculty categories based on the average number of attempts, and it was found

that the Number of duplicates, Maximum of repeats, Prevalence and Frequency diﬀered signiﬁcantly

across categories. Moreover, the interval of each attribute was divided. According to the established

model, ’EERIE’ is diﬃcult. The model’s accuracy is 91.36 %by matching the attribute intervals for

diﬀerent diﬃculty words, and it can be inferred that the established model and the divided attribute

intervals are reasonable.

Finally, the sensitivity analysis results demonstrate that our model is robust and reliable. In addition,

The study of the data set also revealed the declining popularity of Wordle and the increasing percentage

of diﬃcult mode challenges, and provided the New York Times with suggestions for restoring the

game’s popularity.

Keywords: Wordle analysis, Prophet, High-order partial correlation, Multi-objective regression

forecasting, K-means

Team # 2314151 Page 1 of 24

Contents

1 Introduction 2

1.1 Problem Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Restatement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Our Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Preparation of the Models 3

2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Data Processing 4

3.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 Outlier rejection and standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.3 Word attribute determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Task 1 7

4.1 Prophet algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.2 Higher-order partial correlation analysis model . . . . . . . . . . . . . . . . . . . . . 9

5 Task 2 12

5.1 Multi-objective regression prediction framework . . . . . . . . . . . . . . . . . . . . . 12

5.2 Establishment of prediction model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.3 Word prediction - EERIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.4 Feature inﬂuence degree analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.5 Model reliability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Task 3 16

6.1 K-means clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.2 Selection of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.3 Clustering results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.4 Word interval identiﬁcation - EERIE . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.5 Model reliability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Interesting aspects of the data 20

8 Sensitivity Analysis 21

9 Strengths and Weaknesses 22

9.1 Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9.2 Weakness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

10 Letter 22

Team # 2314151 Page 2 of 24

1 Introduction

1.1 Problem Background

Crossword puzzles have always seemed inseparably linked to the media. Since January 2022, Wor-

dle, the New York Times’ digital crossword, has become more and more popular in many countries[1].

How do players play Wordle? They are permitted to select ﬁve letters from a pool of 26 to construct

a ﬁve-letter word that can be solved in no more than six attempts to conclude the Wordle puzzle

successfully. After the player submits the word, the sticker’s color will change. Green is the correct

letter, and yellow is the letter in the word but in the wrong place. There are two modes of play: normal

mode and hard mode. Hard mode is where the correct letter (green or yellow) is found in the previous

attempt and must be used in subsequent attempts.

Wordle updates the puzzle once a day, and many players report their scores on social media. As

a result, data such as the number of people reporting their scores that day, the number of players

participating in hard mode, and the percentage of players completing the puzzle on diﬀerent attempts

are all collected and counted. By using the available data wisely, we can solve some interesting

problems.

1.2 Restatement of the Problem

Considering the background information, constraints outlined in the problem statement and addi-

tional guidance, we need to solve the following problems:

• Task 1: Establish a model that can explain and predict changes in the number of reported results

and provide a prediction interval for the number of reported results on March 1, 2023. In addition,

an examination of the impact of word attributes on the proportion of reports ﬁled by players in

the hard mode is necessary, accompanied by a rationale for this phenomenon.

• Task 2: Develop a model that predicts reported outcomes’ distribution and explore the uncer-

tainties the model and predictions have.

• Task 3: Build a model for classifying words according to diﬃculty and determine the factors

associated with word classiﬁcation. This model is used to determine the diﬃculty of EERIE and

to discuss the accuracy of the classiﬁcation model.

• Task 4: Enumerate and explicate additional noteworthy characteristics inherent in this dataset.

• Task 5: Present a concise summary of the study ﬁndings in a letter addressed to the Puzzle

Editor of the New York Times.

1.3 Our Works

Based on the analysis of the problem, we propose the model framework shown in ﬁgure 1, which

is mainly composed of the following parts:

Data analysis: processes the reported data and identiﬁes the characteristics of the words.

剩余24页未读，继续阅读

评论收藏

内容反馈

版权申诉

阿拉伯梳子

粉丝: 1654
资源: 5735

2023年美赛特等奖论文-C-2314151-解密.pdf

2023年美赛特等奖论文-C-2301192-解密.pdf

2023年美赛特等奖论文-C-2318982-解密.pdf

2023年美赛特等奖论文-C-2300348-解密.pdf

2023年美赛特等奖论文-C-2307166-解密.pdf

2023年美赛特等奖论文-C-2311035-解密.pdf

2023年美赛特等奖论文-C-2322645-解密.pdf

2023年美赛特等奖论文-C-2318036-解密.pdf

2023年美赛特等奖论文-C-2311717-解密.pdf

2023年美赛特等奖论文-C-2310767-解密.pdf

2023年美赛特等奖论文-C-2307946-解密.pdf

2023年美赛特等奖论文-C-2309397-解密.pdf

2023年美赛特等奖论文-E-2312411-解密.pdf

2023年美赛特等奖论文-F-2315018-解密.pdf

2023年美赛特等奖论文-B-2318300-解密.pdf

2023年美赛特等奖论文-E-2308000-解密.pdf

2023年美赛特等奖论文-A-2321860-解密.pdf

2023年美赛特等奖论文-E-2320131-解密.pdf

2023年美赛特等奖论文-E-2314354-解密.pdf

2023年美赛特等奖论文-E-2307336-解密.pdf

相关实用应用程序（Windows可用）

李飞飞自传 我看见的世界 The World I see

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

第十九届研电赛-技术论文模板

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

智联招聘：2024年大学生就业力调研报告.pdf

4个亲测好用的ChatGPT4渠道

2024年俄罗斯商用车数字集群信息娱乐系统市场机会及渠道调研报告Sample.pdf

最新资源

李飞飞自传我看见的世界 The World I see