2023年美赛特等奖论文-C-2318036-解密.pdf资源-CSDN文库

版权申诉

数学建模

163 浏览量 2024-05-06 22:06:00 上传评论收藏 1.08MB PDF 举报

资源推荐

资源详情

资源评论

Problem Chosen

2023

MCM/ICM

Summary Sheet

Team Control Number

2318036

Exploring Wordle: Insights into Puzzle Solving and Tweet Shares Pattern

Summary

Wordle, a word puzzle that has attracted millions of people, is now owned by The New York

Times. For the company’s game editor, how the game is solved and shared on social media is critical

information, as it can be used to guide future puzzle design and ultimately maximise the total number

of players. This paper aims to build a quantitative model based on word attributes and result

reports on Twitter to predict the future pattern of players.

After examining and cleaning the raw data, we ﬁrst deﬁne 12 attribute indicators measuring its

familiarity (how often used), degree of association, degree of confusion and word composition

features. They are computed in advance because the following models will frequently use these

indicators.

For Problem 1, we build a dynamic system called Target-two-Players-Lost (T2PL) based on the

SIR Model to explain the daily ﬂuctuation of Wordle reports. Players are additionally divided into two

categories: general players and loyal players, each with a diﬀerent attrition rate. This allows the model

to simulate unequal decline rates over diﬀerent time periods better. The relationship between word

attributes and the number of hard mode players is also explored, and it is found that certain attributes

aﬀect the percentage of Hard Mode reports.

For Problem 2, we develop a P&S Model, which is a model that uses simulation algorithms and

gradient descent to mimic the behavior of players in guessing words and sharing the game results. The

simulator works by eliminating all unsatisfactory words using observable information, then randomly

sampling words from the remaining word list using word frequency as the weight. However, we found

that the simulation result could not perfectly match the true distribution. Therefore, we rescaled the

distribution with 7 variables representing how players are likely to share their score when given diﬀerent

scores. They are optimised by gradient descent, and better distribution predictions could be generated.

Using the P&S Model, we predict the distribution of the word EERIE on March 1, 2023 is (0, 0,

9%, 29%, 45%, 14%, 3%).

For Problem 3, we are required to classify puzzles by diﬃculty. We perform a cluster analysis

on all reported trial distributions using 3 clusters K-means, with each cluster labelled easy, medium

and hard. We ﬁt a Random Forest Model to divide the words into these three categories using the

attribute indicators deﬁned at the beginning. The correlation coeﬃcient between each indicator and

the diﬃculty is calculated, showing the direction in which these indicators aﬀect the diﬃculty of the

puzzles. The sensitivity of the clustering is discussed as well. Based on our model, the diﬃculty of

EERIE is hard.

For Problem 4, we further explore the eﬀects of word diﬃculty. Using Linear Regression, we

found that word diﬃculty has an obvious eﬀect on the number of results reported: harder puzzles

lead to fewer reports. Diﬃculty also correlates with the percentage of people choosing Hard Mode, as

we mentioned earlier. Through this part of the study, we ﬁnd that the correlation is formed by word

diﬃculty aﬀecting the number of Normal Mode players.

With all the uncovered interactions between word attributes, puzzle diﬃculty, and game report pat-

terns, Wordle operators could gain a deeper understanding of their players. Several sensible suggestions

could also be made based on this discovery.

Keywords: Wordle; Dynamic system; Simulation; K-means; Random Forest

Team 2318036

Contents

1 Introduction 3

1.1 Background and Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Restatement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Assumptions and Notations 4

2.1 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Data Preprocessing 5

4 Task 1: Word Attribute Indicators 6

5 Task 2: Predicting Daily Reports & Hard Mode Percentage 8

5.1 Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.2 Establishment of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.3 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4 Solution and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.5 Hard Mode Percentage Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Task 3: Predicting Report Distribution 13

6.1 Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.2 Establishment of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.3 Predict Conﬁdence and Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7 Task 4: Word Diﬃculty Classiﬁcation 16

7.1 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7.2 Diﬃculty Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

8 Task 5: Other Features 20

8.1 Fluctuations in the Number of Reported Results . . . . . . . . . . . . . . . . . . . . . 20

8.2 Eﬀect of Word Diﬃculty on Hard Mode Reports Percentage . . . . . . . . . . . . . . 20

9 Strengths and Weaknesses 21

9.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

9.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9.3 Further Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9.3.1 Model Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9.3.2 Model Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Team 2318036 Page 3 of 24

1 Introduction

1.1 Background and Literature Review

The word puzzle Wordle invented by Josh Wardle has attract millions of people due to its simplicity

and myriad variation. Beyond the game, perhaps the major factor that cause Wordle went viral is its

integrated sharing format consists of emoji squares, which spread widely through Twitter. In January

2022, Wordle was purchased by The New York Times Company and operated by them ever since. Only

one piece of puzzle is released every day at the game’s oﬃcial website and this scarcity is also believed

to contribute to Wordle’s success.

In Wordle, player aim to crack a ﬁve-letter word within six guesses. Feedback is given after each

guess is submitted: Letters highlighted in green indicates that the answer has the same letter at the

same location. Yellow indicates this letter appears in the answer, but at another place. Grey indicates

the letter is absent in the answer. Generally, it requires three to ﬁve tries for an average player, but it

could vary signiﬁcantly among diﬀerent words. Addition to the normal version, there is also a Hard

Mode Wordle, stipulating each discovered correct word (in Yellow or Green) must be maintained in

the following guess[11].

Much research has focused on ﬁnding optimal strategy on solving the puzzle[1][4]. However, it

seems that player’s pattern is worthy to explore as well. As a major product under The New York Times

Games, its operator would like to trace and predict the number of shared games on Twitter. Besides,

released word should be well-considered, since easy problems could not challenge experienced player,

while rare word like ”rebus” or ”tapir” make most fans frustrated[10]. Therefore, a quantitative model

to predict distribution of attempts according to a given word is also expected.

1.2 Restatement of the Problem

Considering the background, in this paper we are required to solve the following problems:

• Task 1: Combine the game mechanics of Wordle to build a set of indicators that reﬂect the

attributes of words, and apply them to the subsequent model.

• Task 2: Develop a model that explains the trends in the number of reported results and the

percentage of scores reported that were played in Hard Mode, and use it to predict the number of

reported results on March 1, 2023. Further, analyze the eﬀect of word attributes on the percentage

of scores reported that were played in Hard Mode.

• Task 3: Develop a model that can predict the distribution of the reported results based on words

and use it to predict the distribution for the word EERIE on March 1, 2023. In addition, illustrate

the uncertainty and accuracy of the model.

• Task 4: Classify words according to their diﬃculty and explain the relationship between the

attributes of the words and the diﬃculty of the words.

• Task 5: Perform a comprehensive analysis of the dataset, and give some interesting conclusions.

Team 2318036 Page 5 of 24

• Assumption 1: There will not be a shift in the general trend of Wordle’s daily user number.

Justiﬁcation:This is required to predict future trend based on observed daily usage.

• Assumption 2: Most players use rational strategies.

Justiﬁcation: To establish a mathematic model for potential player, it is necessary to assume

that they are actually using a strategy and will not take unessential moves. Otherwise, it would

become meaningless to simulate result based on potential strategies.

• Assumption 3: There is no signiﬁcant change in players’ skill along time.

Justiﬁcation: As Wordle is played for a period, players are expected to improve their strategies

which might aﬀect attempt times distributions at diﬀerent date. However, experienced players

are giving up Wordle while rookies are joining simultaneously, producing an opposite eﬀect. It

would be too complicated to consider these possibilities.

• Assumption 4: In task 2 (T2PL Model), player and those who share their result are not distin-

guished.

Justiﬁcation: For convenience, players are modelled in Task 2, although Twitter report numbers

are actually used. This is because there is not enough information to distinguish between the two

categories in this step, and it makes sense to switch from modelling players to modelling players

who share their results.

2.2 Notations

Symbol Deﬁnition Symbol Deﬁnition

𝑁 Number of reported results 𝑃 Number of all players

𝐻 Number of reports in Hard Mode 𝑃

𝑙𝑜𝑦

Number of loyal players

𝑃

𝐻

Percentage of scores reported that

were played in Hard Mode.

𝑃

𝐻

= 𝐻/𝑁 × 100%

𝑃

𝑔𝑒𝑛

Number of general players

𝐷

All words of length 5 in dictionary

data

𝑊

𝑠 𝑗

Probability of sharing when

ﬁnishing with j tries

𝑇 Number of Targeted Population

𝑝

𝑖 𝑗

Probability of solving Wordle #i

with j tries

Table 1: Symbol table.

Word attribute indicators is not included, because they are explaned in detail below.

3 Data Preprocessing

The dataset Problem_C_Data_Wordle.xlsx contains 359 days of Wordle report information.

Each row consists of date, the word of the day, the number of reported results, the hard mode results,

and the distribution of each number of attempts. There are no missing values in the table, but on closer

inspection several words are misspelled. We manually correct each of these by searching for the correct

Wordle answer using the question number for that day.

剩余23页未读，继续阅读

评论收藏

内容反馈

版权申诉

阿拉伯梳子

粉丝: 1671
资源: 5735

2023年美赛特等奖论文-C-2318036-解密.pdf

2023年美赛特等奖论文-C-2318982-解密.pdf

2023年美赛特等奖论文-C-2307166-解密.pdf

2023年美赛特等奖论文-C-2311035-解密.pdf

2023年美赛特等奖论文-C-2301192-解密.pdf

2023年美赛特等奖论文-C-2300348-解密.pdf

2023年美赛特等奖论文-C-2322645-解密.pdf

2023年美赛特等奖论文-C-2311717-解密.pdf

2023年美赛特等奖论文-C-2314151-解密.pdf

2023年美赛特等奖论文-C-2310767-解密.pdf

2023年美赛特等奖论文-C-2307946-解密.pdf

2023年美赛特等奖论文-C-2309397-解密.pdf

2023年美赛特等奖论文-E-2307336-解密.pdf

2023年美赛特等奖论文-A-2316994-解密.pdf

2023年美赛特等奖论文-F-2315018-解密.pdf

2023年美赛特等奖论文-E-2305598-解密.pdf

2023年美赛特等奖论文-D-2304962-解密.pdf

2023年美赛特等奖论文-B-2315379-解密.pdf

2023年美赛特等奖论文-B-2300136-解密.pdf

2023年美赛特等奖论文-F-2311517-解密.pdf

相关实用应用程序（Windows可用）

李飞飞自传 我看见的世界 The World I see

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

智联招聘：2024年大学生就业力调研报告.pdf

4个亲测好用的ChatGPT4渠道

AI大模型-基于深度学习的神经网络模型语言模型图像识别自然语言处理

学术海报模板+论文科研+研究生

最新资源

李飞飞自传我看见的世界 The World I see