2018美赛O奖论文B题-B74316-解密.pdf资源-CSDN文库

版权申诉

数学竞赛

107 浏览量 2024-03-17 21:49:24 上传评论收藏 3.75MB PDF 举报

资源推荐

资源详情

资源评论

For office use only

Team Control Number

74316

Problem Chosen

For office use only

2018

MCM/ICM

Summary Sheet

The purpose of this paper is to analyze the time and geographical distributions of various languages and forecast the development

and distribution of the next 50 years, providing the theoretical basis and advice for the company's offices locating decision.

Firstly, we analyze the quantitative temporal distribution of language users. Various kinds of influences and factors that affect the

number of languages users are considered, and concluded as ten indicators, such as GDP per capita, average years of schooling. By

Principal Component Analysis, they are combined into four primary components: level of economic development, level of social

equality, level of national welfare, and cultural exchanges. On this basis, the short-term difference models are established for native

speakers and non-native speakers. First-order autoregressive model (AR(1)) is used to fit the time distribution of native speakers in

order to reflect the autocorrelation characteristics. Most native speakers are consistent with non-stationary unit root process. Then,

we construct the co-integration relationship between the principal components and the second language users. The error correction

model is established and it is found that the random error and the error correction term all achieved stability. In the co-integration

space, the influence of the principal component on L2 has a first-order differential stationary nature.

Based on the short-term model, the long-term differential model is further established. Considering the change process of native

speakers as a logistic model similar to the natural population growth, the system is stable for the coefficients in the normal range,

and the stable equilibrium is given maximum capacity under the current conditions. The time distribution largely synchronizes with

the natural change process of the population. Besides, due to the differential smoothness of the linear combinations of the various

factors, the influence on second-language users is regarded as a constant for a long time. Therefore, the model of L2 is a constant

coefficient differential equation whose time path is determined by the strength of language influence. Therefore, we sum up L1 and

L2 to calculate the total number of language speakers, which is a non-stationary dynamic system. It means that the driving forces

of a particular language are the endogenous growth of native speakers and the external influence as a second language. The time

distribution of the total number of languages is on the rise.

Secondly, we use the long-term model to predict the situation of each country in the next 50 years. The number of influential

language speakers increase significantly, while the growth pattern is driven by the second language transmission. The number of

less influential language speakers grow less obviously or even decreased. The growth pattern is endogenous to the native speakers.

In Top 10, there is a possibility that the number of native speakers will drop significantly or the number of non-native speakers may

not grow enough, thus the future rankings may be superseded. Sensitivity analysis and Monte Carlo robustness simulations show

that our model is robust and predictable.

Thirdly, we build a Markov Model to analyze the geographical distribution of languages and their changes. This paper constructs

a transition matrix of immigrants. Based on the information of population growth, natural growth rate and language distribution,

the distribution of the total number of each language in each country is inferred. Then we center on and visualize the national

capitals. The prediction shows the geographical distribution of languages tends to be intertwined and spread as second-language in

the future.

Next, we locate the new offices by Cluster Analysis and think that the ability of speaking English and Chinese is needed and that

the development of economy is considered as well. So, we construct the 4 indicators, the ratio of English speakers and Chinese

speakers, GDP per capita and net immigrants. The short-term and long-term models and the Markov model of the forecast results

calculate the value. According to the 4 indicators, we analyze the 224 countries by cluster analysis separately. Due to the quantified

result of grades of each country, we use the Multiple Objective Decision Making (MODM) of Fuzzy Evaluation to calculate the

grade and choose 6 national capitals with maximum grades, as the location of new offices.

Finally, Using MINE model, the latitude and longitude coordinate grids are meshed. And the grid density index is calculated to

identify the areas with over-dense distribution of offices. Eliminating appropriate number of offices allowed for reduction costs by

serving the largest global scale with minimal office locations. The result shows that it is more suitable to set up 4 new offices. Thus,

as for the short term, we recommend to build 4 new offices in London, Singapore, Ottawa and Canberra. And Singapore, London,

Canberra and Paris are recommended in the long run.

Key Words: Autoregressive Model, Co-integration, Differential System, Markov Chain, Cluster Analysis, MINE Model

Team#74316

Page 1 of 20

Content

Introduction........................................................................................................................... 2

1.1 Background.......................................................................................................................2

1.2 General Assumptions........................................................................................................2

Model 1 – Language Speakers’ Quantitatively Distribution.............................................

2.1 Introduction and Assumptions..........................................................................................2

2.1.1 Introduction..................................................................................................................................... 2

2.1.2 Assumptions.................................................................................................................................... 2

2.2 Variables and Parameters................................................................................................. 2

2.2.1 Notations..........................................................................................................................................2

2.2.2 Dimension Reduction - PCA...........................................................................................................3

2.3 Short-term Models............................................................................................................4

2.3.1 Native Speaker Model - Autoregressive model.............................................................................. 4

2.3.2 Non-native Speaker Model – Error Correction Model................................................................... 6

2.4 Long-term Models – Logistic Model............................................................................... 8

2.4.1 Assumptions.................................................................................................................................... 8

2.4.2 Model Design.................................................................................................................................. 8

2.5 Predictions...................................................................................................................... 10

Validation of Model 1..........................................................................................................

3.1 Sensitivity Analysis........................................................................................................ 11

3.1.1 fixed growth rate δ

................................................................................................................

3.1.2

External Promoting Coefficient

.........................................................................................

3.1.3

Initial Value.......................................................................................................................... 12

3.2 Robust Analysis..............................................................................................................12

3.2.1

Monte-Carlo Simulation....................................................................................................... 12

Model 2 - Geographic distribution....................................................................................

4.1 Migration Patterns – Markov Process............................................................................ 13

4.1.1

Assumptions and Variables...................................................................................................13

4.1.2

Model Settings......................................................................................................................13

4.1.3

Solution and Visualization....................................................................................................14

4.2 Language Distributions.................................................................................................. 15

4.2.1

Prediction..............................................................................................................................15

4.2.2 Conclusion............................................................................................................................ 16

4.3 Comparison of Model 1&2.............................................................................................16

Model 3 – Offices Location Decisions................................................................................

5.1 Model in Short-term and Long-term - Cluster Analysis & MODM...............................16

5.1.1

Assumption........................................................................................................................... 16

5.1.2

The Settings and Solutions of Model....................................................................................16

5.2 Comparison.....................................................................................................................18

5.3 Resource-saving Suggestions – MINE Model................................................................18

Strengths and Weaknesses................................................................................................. 20

6.1 Strengths.........................................................................................................................20

6.2 Weaknesses and Improvements......................................................................................20

Memo.......................................................................................................................................

Work cited...............................................................................................................................23

Appendix................................................................................................................................. 24

Notes: Due to space limitations, the appendix does not show all the data.

Team#74316

Page 2 of 20

Introduction

1.1 Background

As known to all, nearly 7000 languages are spoken over the world, and they make up the communication

network through hundreds of countries and regions. Languages are essential to construct foreign trade,

develop tourism and promote scientific and technological progress, which makes it an indicator and an

effective tool to measure a country’s comprehensive power. Also, a measurement of the utility of a

particular language is the number of speakers who use it as native or the second or third language. Therefore,

it should be taken attention that the number of speakers of a particular language would change over times

with the languages’ rise and fall as it may be coincident with the economic and political development of its

main country.

For now, ten languages are claimed to use by half the world’s population, which includes Mandarin (incl.

Standard Chinese), Spanish, English, Hindi, Arabic, Bengali, Portuguese, Russian, Punjabi, and Japanese.

And the number of speakers of one language would be influenced by migration, social pressures, business

relations, social media and so on. It is necessary for us to find out its variation and trends in the future to

expect their rankings and make better use of them.

1.2 General Assumptions

(1) Following models only consider the top 26 languages used in the world as the number of speakers of

them nearly amounts to 98% population of the world.

(2) There is no unexpected collision of other planets and no other disasters disrupting people’s normal life.

(3) In addition to the differences of details, numbers of speakers used particular languages are following

the same model settings.

(4) All stochastic error terms can be expressed as white noises or its time-weighted form.

(5) All individuals are homogeneous. Their choices of the second or third languages are completely

influenced by factors of countries, societies and cultures. And the contribution to individual language using

from personal interests, life plans are negligible.

(6) The macro-level factors such as culture and migration have the same impact on the number of total

speakers using one particular language in the long run.

Model 1 – Language Speakers’ Quantitatively Distribution

2.1 Introduction and Assumptions

2.1.1

Introduction

In Part 1, a modeling analysis is acquired to analyze the different languages users’ time distribution,

including native speakers and non-native speakers, and to predict the situation in the next 50 years. Firstly,

we discuss the native speakers and non-native speakers separately. The use of native language is mainly

determined by the environment and is hardly related to those factors, for example, the society, media,

technology and tourism. And the use of language over time is highly likely to be positively related to the

changing number of native people, which, moreover, mostly depends on the autocorrelation. But the factors

like migration and culture shock primarily influence the use non-native language. As a consequence of that,

we take account of those factors to set the model of non-native language.

2.1.2

Assumptions

(1) The time distribution of some native speakers is determined by the pattern of the population distribution.

The factors besides population changing are neglected.

(2) The time distribution of non-native speakers is determined by the serial correlation, population

migration and those factors, including international business relations, increased global tourism, the use of

electronic communication and social media, and the use of technology to assist in quick and easy language

translation.

(3) There are some factors that have the first co-integration relationship with the non-native speakers.

2.2 Variables and Parameters

2.2.1

Notations

VARIABLES

DEFINITION

The Number of Native Speakers

Team#74316

Page 3 of 20

The Number of Non-native Speakers

The Number of Total Speakers





-S

t-1





, … 

䐸

Variables as following





, … , 



Primary Components





-Y

t-1

Stochastic Error Term

敲

−

Error Correction Model term, which indicates the extent to which the explained

variables deviate from the long-term equilibrium in the previous period

Maximum population

One-step Transition Matrix

Possibility from One Statement to Another

Steady-state probabilities

PARAMETERS

DEFINITION

Parameters for AR(p)

Parameter for long-term model

Parameter of variable ECM

Natural Growth Rate of the Number of Native Speakers

Table 1: Variables and Parameters for Model 1

2.2.2

Dimension Reduction - PCA

Using PCA to derive metrics for the measurement of language evolving and reduce variables dimension by

create new variables that are linear combinations of the original variables. New linear combinations are

uncorrelated and only a few of them contain most of the original information, which are called principal

components.

Following variables are derived from the country where people use the particular language. And the final

values of each observation variables are weighted by the number of speakers in the country.

a. GDP (per capita)/$

GDP is a monetary measure of the market value of all final goods and services produced in a period.

b. Crop yield/$

Crop yield refers to both the measure of the yield of a crop per unit are of land cultivation.

c. Average years of schooling/Years

Average years of schooling reflect the educational attainment among age groups and genders.

d. Gini coefficient

The Gini coefficient measures the inequality among values of a frequency distribution.

e. Gross National Happiness Index

GNH is a measurement of the collective happiness in a nation, which can imply social pressure.

It can be formulated as follows:

GNH =

△ Income

Gini coefficient × unemployment rate × inflation rate

f. Number of Migrants/10000 people

The number of migrants has an effect on the use of language.

g. Labor Productivity/$10000

Labor productivity presents how many products have been produced.

h. Consumer Price Index/%

Consumer Price Index presents the price level of goods, which can reveal the relationship between demand

and supply.

i. Income of Tourism/$

With the prosperity of tourism in one country, its language is used more frequently and broadly, as foreign

tourists flood in and hung around the interest.

j. The amount of translation or directory softwares that record the language/unit

More softwares record the particular language, the more popular it is among the world.

Team#74316

Page 4 of 20

Metrics for the language using measurement

As for the non-native speakers, we take the serial correlation and other factors into consideration. From

the macro-view, those factors, including international business relations, global tourism, social media and

technology of language translation, can have an effect on the use of non-native language. We collect 10

micro-factors that might affect the number of non-native speakers, which form a time series from 2008 to

2016. Due to the limit of data, we only collect the 6 official languages of UN, Mandarin Chinese, English,

Spanish, Arabic, Russian, and French, as the representative territorial data to form a 6x9 Panel. (Detailed

in Appendix 2).

Principal component analysis (PCA) is statistical procedure that transforms the statics to orthogonal linear

equations to set up a new evaluating system. The first principal component has the largest possible variance

and the resulting vectors (



…



) are an uncorrelated orthogonal basis set. So, we got the

principal below:



= 



+ 



+ 



+ ⋯ +

1 10





= 



+ 



+ 



+ ⋯ +

2 10





= 



+ 



+ 



+ ⋯ +

3 10



…



= 



+ 



+ 



+ ⋯ +

1010



var(



) = var(



′

) = 



′

∑







′





= 1









should be uncorrelated (

 ≠ 

has the largest possible variance.

In this case, we use 10 columns of data matrix X = (

, X

), to represent

ten factors including GDP (per capita), crop yield, average years of schooling, GNH, Gini coefficient,

number of migrants, labor productivity, CPI, income of tourism and the number of software.

After the factors rotating, the feature vectors involved with the 4 main components are,

(1)

Namely, Y

= η

, Y

= η

, Y

= η

, Y

= η

Y1 is the economic factor, including GDP per capital, Crop Yield, Tourism Income. Y2 is the social equality

factor. Y3 is the national welfare factor. Y4 is the culture-shock factor.

So, we got a new metrics system by PCA,



, 

inherits 87.6% possible variance from X, which is

more than 85%. Thus, it’s effective and properly reflects the original factors.

(

, 

)

is defined as

the metrics system for language using measurement model.

2.3 Short-term Models

2.3.1

Native Speaker Model - Autoregressive model

According to the short-term model below, we solve the native speaker and non-native speaker models

separately, then sum up the number of total target language speakers.

For native speaker

, due to hypotheses, the number of some native speakers is influenced by

autocorrelation in time series. As a consequence, we choose p

-order Auto-Regression Model to fit curve,

namely AR(p), as



= 

−1

+ 

−2

+ ⋯ + 



−

+ 



(2)

represents the number of native speakers in year t.



…





represent the influential coefficients of different lag orders.





represents the error term whose mean value is 0 and variance is σ

. The distribution matches White Noise

Process WN(0,σ

)

Firstly, we collect the time-series data of those language speakers. As for the top 10 languages, we use the

statistics of native speakers, non-native speakers and total speakers from 2003 to 2017





(3.16

2.95

0.63

0.87

0.42

0.75

2.76 0.44 2.58 0.37)





(0.92

0.77

0.63

3.78

0.40

0.53

0.46 0.29 0.66 0.29)





(0.82

0.41

1.89

2.33

2.26

0.38



0.07



0.34 0.65 1.42)





(0.67

0.95

0.58

0.33

0.26

3.89

1.01



0.88 0.61 3.37)

剩余38页未读，继续阅读

评论收藏

内容反馈

版权申诉

阿拉伯梳子

粉丝: 1597
资源: 5735

2018美赛O奖论文B题-B74316-解密.pdf

2018美赛O奖论文B题-B77238-解密.pdf

2018美赛O奖论文B题-B91566-Forecasting the Language Distribution-解密.pdf

2018美赛O奖论文B题-B79002-How many languages-解密.pdf

2018美赛O奖论文A题-A88255-解密.pdf

2018美赛O奖论文D题-D82504-解密.pdf

2018美赛O奖论文E题-E89499-解密.pdf

2018美赛O奖论文E题-E93840-解密.pdf

2018美赛O奖论文D题-D80386-解密.pdf

2018美赛O奖论文F题-F83744-解密.pdf

2018美赛O奖论文E题-E73119-解密.pdf

2018美赛O奖论文C题-C82150-解密.pdf

2018美赛O奖论文D题-D78826-A Design of Elecomb-解密.pdf

2018美赛O奖论文C题-C78577-Sustainable Energy Assessment-解密.pdf

2018美赛O奖论文D题-D73156-Construct all-electric network-解密.pdf

2018美赛O奖论文E题-E72968-SPEC A Climate-based Fragility Model-解密.pdf

2018美赛O奖论文A题-A76271-Wave Goodbye to Poor Reception-解密.pdf

2018美赛O奖论文F题-F87280-How much is your privacy information-解密.pdf

2023年美赛特等奖论文-B-2315379-解密.pdf

2023年美赛特等奖论文-B-2300136-解密.pdf

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

李飞飞自传 我看见的世界 The World I see

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

智联招聘：2024年大学生就业力调研报告.pdf

4个亲测好用的ChatGPT4渠道

数字电子时钟课程设计数字电子时钟课程设计

最新资源

李飞飞自传我看见的世界 The World I see