没有合适的资源?快使用搜索试试~ 我知道了~
2018美赛O奖论文B题-B74316-解密.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 107 浏览量
2024-03-17
21:49:24
上传
评论
收藏 3.75MB PDF 举报
温馨提示
试读
39页
美国大学生数学竞赛获奖论文,历届,单项文件,内容丰富,大学生数学,数学竞赛,参考资料,极具参考价值
资源推荐
资源详情
资源评论
For office use only
T1
T2
T3
T4
Team Control Number
74316
Problem Chosen
B
For office use only
F1
F2
F3
F4
2018
MCM/ICM
Summary Sheet
The purpose of this paper is to analyze the time and geographical distributions of various languages and forecast the development
and distribution of the next 50 years, providing the theoretical basis and advice for the company's offices locating decision.
Firstly, we analyze the quantitative temporal distribution of language users. Various kinds of influences and factors that affect the
number of languages users are considered, and concluded as ten indicators, such as GDP per capita, average years of schooling. By
Principal Component Analysis, they are combined into four primary components: level of economic development, level of social
equality, level of national welfare, and cultural exchanges. On this basis, the short-term difference models are established for native
speakers and non-native speakers. First-order autoregressive model (AR(1)) is used to fit the time distribution of native speakers in
order to reflect the autocorrelation characteristics. Most native speakers are consistent with non-stationary unit root process. Then,
we construct the co-integration relationship between the principal components and the second language users. The error correction
model is established and it is found that the random error and the error correction term all achieved stability. In the co-integration
space, the influence of the principal component on L2 has a first-order differential stationary nature.
Based on the short-term model, the long-term differential model is further established. Considering the change process of native
speakers as a logistic model similar to the natural population growth, the system is stable for the coefficients in the normal range,
and the stable equilibrium is given maximum capacity under the current conditions. The time distribution largely synchronizes with
the natural change process of the population. Besides, due to the differential smoothness of the linear combinations of the various
factors, the influence on second-language users is regarded as a constant for a long time. Therefore, the model of L2 is a constant
coefficient differential equation whose time path is determined by the strength of language influence. Therefore, we sum up L1 and
L2 to calculate the total number of language speakers, which is a non-stationary dynamic system. It means that the driving forces
of a particular language are the endogenous growth of native speakers and the external influence as a second language. The time
distribution of the total number of languages is on the rise.
Secondly, we use the long-term model to predict the situation of each country in the next 50 years. The number of influential
language speakers increase significantly, while the growth pattern is driven by the second language transmission. The number of
less influential language speakers grow less obviously or even decreased. The growth pattern is endogenous to the native speakers.
In Top 10, there is a possibility that the number of native speakers will drop significantly or the number of non-native speakers may
not grow enough, thus the future rankings may be superseded. Sensitivity analysis and Monte Carlo robustness simulations show
that our model is robust and predictable.
Thirdly, we build a Markov Model to analyze the geographical distribution of languages and their changes. This paper constructs
a transition matrix of immigrants. Based on the information of population growth, natural growth rate and language distribution,
the distribution of the total number of each language in each country is inferred. Then we center on and visualize the national
capitals. The prediction shows the geographical distribution of languages tends to be intertwined and spread as second-language in
the future.
Next, we locate the new offices by Cluster Analysis and think that the ability of speaking English and Chinese is needed and that
the development of economy is considered as well. So, we construct the 4 indicators, the ratio of English speakers and Chinese
speakers, GDP per capita and net immigrants. The short-term and long-term models and the Markov model of the forecast results
calculate the value. According to the 4 indicators, we analyze the 224 countries by cluster analysis separately. Due to the quantified
result of grades of each country, we use the Multiple Objective Decision Making (MODM) of Fuzzy Evaluation to calculate the
grade and choose 6 national capitals with maximum grades, as the location of new offices.
Finally, Using MINE model, the latitude and longitude coordinate grids are meshed. And the grid density index is calculated to
identify the areas with over-dense distribution of offices. Eliminating appropriate number of offices allowed for reduction costs by
serving the largest global scale with minimal office locations. The result shows that it is more suitable to set up 4 new offices. Thus,
as for the short term, we recommend to build 4 new offices in London, Singapore, Ottawa and Canberra. And Singapore, London,
Canberra and Paris are recommended in the long run.
Key Words: Autoregressive Model, Co-integration, Differential System, Markov Chain, Cluster Analysis, MINE Model
Team#74316
Page 1 of 20
Content
1
Introduction........................................................................................................................... 2
1.1 Background.......................................................................................................................2
1.2 General Assumptions........................................................................................................2
2
Model 1 – Language Speakers’ Quantitatively Distribution.............................................
2
2.1 Introduction and Assumptions..........................................................................................2
2.1.1 Introduction..................................................................................................................................... 2
2.1.2 Assumptions.................................................................................................................................... 2
2.2 Variables and Parameters................................................................................................. 2
2.2.1 Notations..........................................................................................................................................2
2.2.2 Dimension Reduction - PCA...........................................................................................................3
2.3 Short-term Models............................................................................................................4
2.3.1 Native Speaker Model - Autoregressive model.............................................................................. 4
2.3.2 Non-native Speaker Model – Error Correction Model................................................................... 6
2.4 Long-term Models – Logistic Model............................................................................... 8
2.4.1 Assumptions.................................................................................................................................... 8
2.4.2 Model Design.................................................................................................................................. 8
2.5 Predictions...................................................................................................................... 10
3
Validation of Model 1..........................................................................................................
11
3.1 Sensitivity Analysis........................................................................................................ 11
3.1.1 fixed growth rate δ
................................................................................................................
11
3.1.2
External Promoting Coefficient
μ
.........................................................................................
12
3.1.3
Initial Value.......................................................................................................................... 12
3.2 Robust Analysis..............................................................................................................12
3.2.1
Monte-Carlo Simulation....................................................................................................... 12
4
Model 2 - Geographic distribution....................................................................................
13
4.1 Migration Patterns – Markov Process............................................................................ 13
4.1.1
Assumptions and Variables...................................................................................................13
4.1.2
Model Settings......................................................................................................................13
4.1.3
Solution and Visualization....................................................................................................14
4.2 Language Distributions.................................................................................................. 15
4.2.1
Prediction..............................................................................................................................15
4.2.2 Conclusion............................................................................................................................ 16
4.3 Comparison of Model 1&2.............................................................................................16
5
Model 3 – Offices Location Decisions................................................................................
16
5.1 Model in Short-term and Long-term - Cluster Analysis & MODM...............................16
5.1.1
Assumption........................................................................................................................... 16
5.1.2
The Settings and Solutions of Model....................................................................................16
5.2 Comparison.....................................................................................................................18
5.3 Resource-saving Suggestions – MINE Model................................................................18
6
Strengths and Weaknesses................................................................................................. 20
6.1 Strengths.........................................................................................................................20
6.2 Weaknesses and Improvements......................................................................................20
Memo.......................................................................................................................................
21
Work cited...............................................................................................................................23
Appendix................................................................................................................................. 24
Notes: Due to space limitations, the appendix does not show all the data.
Team#74316
Page 2 of 20
1
Introduction
1.1 Background
As known to all, nearly 7000 languages are spoken over the world, and they make up the communication
network through hundreds of countries and regions. Languages are essential to construct foreign trade,
develop tourism and promote scientific and technological progress, which makes it an indicator and an
effective tool to measure a country’s comprehensive power. Also, a measurement of the utility of a
particular language is the number of speakers who use it as native or the second or third language. Therefore,
it should be taken attention that the number of speakers of a particular language would change over times
with the languages’ rise and fall as it may be coincident with the economic and political development of its
main country.
For now, ten languages are claimed to use by half the world’s population, which includes Mandarin (incl.
Standard Chinese), Spanish, English, Hindi, Arabic, Bengali, Portuguese, Russian, Punjabi, and Japanese.
And the number of speakers of one language would be influenced by migration, social pressures, business
relations, social media and so on. It is necessary for us to find out its variation and trends in the future to
expect their rankings and make better use of them.
1.2 General Assumptions
(1) Following models only consider the top 26 languages used in the world as the number of speakers of
them nearly amounts to 98% population of the world.
(2) There is no unexpected collision of other planets and no other disasters disrupting people’s normal life.
(3) In addition to the differences of details, numbers of speakers used particular languages are following
the same model settings.
(4) All stochastic error terms can be expressed as white noises or its time-weighted form.
(5) All individuals are homogeneous. Their choices of the second or third languages are completely
influenced by factors of countries, societies and cultures. And the contribution to individual language using
from personal interests, life plans are negligible.
(6) The macro-level factors such as culture and migration have the same impact on the number of total
speakers using one particular language in the long run.
2
Model 1 – Language Speakers’ Quantitatively Distribution
2.1 Introduction and Assumptions
2.1.1
Introduction
In Part 1, a modeling analysis is acquired to analyze the different languages users’ time distribution,
including native speakers and non-native speakers, and to predict the situation in the next 50 years. Firstly,
we discuss the native speakers and non-native speakers separately. The use of native language is mainly
determined by the environment and is hardly related to those factors, for example, the society, media,
technology and tourism. And the use of language over time is highly likely to be positively related to the
changing number of native people, which, moreover, mostly depends on the autocorrelation. But the factors
like migration and culture shock primarily influence the use non-native language. As a consequence of that,
we take account of those factors to set the model of non-native language.
2.1.2
Assumptions
(1) The time distribution of some native speakers is determined by the pattern of the population distribution.
The factors besides population changing are neglected.
(2) The time distribution of non-native speakers is determined by the serial correlation, population
migration and those factors, including international business relations, increased global tourism, the use of
electronic communication and social media, and the use of technology to assist in quick and easy language
translation.
(3) There are some factors that have the first co-integration relationship with the non-native speakers.
2.2 Variables and Parameters
2.2.1
Notations
VARIABLES
DEFINITION
The Number of Native Speakers
Team#74316
Page 3 of 20
The Number of Non-native Speakers
N
T
The Number of Total Speakers
Δ
S
t
-S
t-1
, …
䐸
Variables as following
, … ,
Primary Components
Δ
Y
t
-Y
t-1
Stochastic Error Term
敲
−
Error Correction Model term, which indicates the extent to which the explained
variables deviate from the long-term equilibrium in the previous period
N
M
Maximum population
One-step Transition Matrix
P
Possibility from One Statement to Another
Steady-state probabilities
PARAMETERS
DEFINITION
Parameters for AR(p)
Parameter for long-term model
Parameter of variable ECM
Natural Growth Rate of the Number of Native Speakers
Table 1: Variables and Parameters for Model 1
2.2.2
Dimension Reduction - PCA
Using PCA to derive metrics for the measurement of language evolving and reduce variables dimension by
create new variables that are linear combinations of the original variables. New linear combinations are
uncorrelated and only a few of them contain most of the original information, which are called principal
components.
Following variables are derived from the country where people use the particular language. And the final
values of each observation variables are weighted by the number of speakers in the country.
a. GDP (per capita)/$
GDP is a monetary measure of the market value of all final goods and services produced in a period.
b. Crop yield/$
Crop yield refers to both the measure of the yield of a crop per unit are of land cultivation.
c. Average years of schooling/Years
Average years of schooling reflect the educational attainment among age groups and genders.
d. Gini coefficient
The Gini coefficient measures the inequality among values of a frequency distribution.
e. Gross National Happiness Index
GNH is a measurement of the collective happiness in a nation, which can imply social pressure.
It can be formulated as follows:
GNH =
△ Income
Gini coefficient × unemployment rate × inflation rate
f. Number of Migrants/10000 people
The number of migrants has an effect on the use of language.
g. Labor Productivity/$10000
Labor productivity presents how many products have been produced.
h. Consumer Price Index/%
Consumer Price Index presents the price level of goods, which can reveal the relationship between demand
and supply.
i. Income of Tourism/$
With the prosperity of tourism in one country, its language is used more frequently and broadly, as foreign
tourists flood in and hung around the interest.
j. The amount of translation or directory softwares that record the language/unit
More softwares record the particular language, the more popular it is among the world.
Team#74316
Page 4 of 20
Metrics for the language using measurement
As for the non-native speakers, we take the serial correlation and other factors into consideration. From
the macro-view, those factors, including international business relations, global tourism, social media and
technology of language translation, can have an effect on the use of non-native language. We collect 10
micro-factors that might affect the number of non-native speakers, which form a time series from 2008 to
2016. Due to the limit of data, we only collect the 6 official languages of UN, Mandarin Chinese, English,
Spanish, Arabic, Russian, and French, as the representative territorial data to form a 6x9 Panel. (Detailed
in Appendix 2).
Principal component analysis (PCA) is statistical procedure that transforms the statics to orthogonal linear
equations to set up a new evaluating system. The first principal component has the largest possible variance
and the resulting vectors (
1
,
2
,
3
,
…
,
10
) are an uncorrelated orthogonal basis set. So, we got the
principal below:
1
=
11
1
+
12
2
+
13
3
+ ⋯ +
1 10
10
2
=
21
1
+
22
2
+
23
3
+ ⋯ +
2 10
10
3
=
31
1
+
32
2
+
33
3
+ ⋯ +
3 10
10
…
10
=
41
1
+
42
2
+
43
3
+ ⋯ +
1010
10
var(
) = var(
′
) =
′
∑
′
= 1
,
should be uncorrelated (
≠
).
has the largest possible variance.
In this case, we use 10 columns of data matrix X = (
X
1
, X
2
, X
3
, X
4
, X
5
, X
6
, X
7
, X
8
, X
9
, X
10
), to represent
ten factors including GDP (per capita), crop yield, average years of schooling, GNH, Gini coefficient,
number of migrants, labor productivity, CPI, income of tourism and the number of software.
After the factors rotating, the feature vectors involved with the 4 main components are,
1
(1)
2
Namely, Y
1
= η
1
, Y
2
= η
2
, Y
3
= η
3
, Y
4
= η
4
Y1 is the economic factor, including GDP per capital, Crop Yield, Tourism Income. Y2 is the social equality
factor. Y3 is the national welfare factor. Y4 is the culture-shock factor.
So, we got a new metrics system by PCA,
1
,
2
,
3
,
4
inherits 87.6% possible variance from X, which is
more than 85%. Thus, it’s effective and properly reflects the original factors.
(
1
,
2
,
3
,
4
)
is defined as
the metrics system for language using measurement model.
2.3 Short-term Models
2.3.1
Native Speaker Model - Autoregressive model
According to the short-term model below, we solve the native speaker and non-native speaker models
separately, then sum up the number of total target language speakers.
For native speaker
, due to hypotheses, the number of some native speakers is influenced by
autocorrelation in time series. As a consequence, we choose p
th
-order Auto-Regression Model to fit curve,
namely AR(p), as
N
=
1
N
−1
+
2
N
−2
+ ⋯ +
N
−
+
(2)
N
represents the number of native speakers in year t.
1
,
…
,
represent the influential coefficients of different lag orders.
represents the error term whose mean value is 0 and variance is σ
2
. The distribution matches White Noise
Process WN(0,σ
2
)
Firstly, we collect the time-series data of those language speakers. As for the top 10 languages, we use the
statistics of native speakers, non-native speakers and total speakers from 2003 to 2017
(3.16
2.95
0.63
0.87
0.42
0.75
2.76 0.44 2.58 0.37)
T
(0.92
0.77
0.63
3.78
0.40
0.53
0.46 0.29 0.66 0.29)
T
3
(0.82
0.41
1.89
2.33
2.26
0.38
0.07
0.34 0.65 1.42)
T
4
(0.67
0.95
0.58
0.33
0.26
3.89
1.01
0.88 0.61 3.37)
T
剩余38页未读,继续阅读
资源评论
阿拉伯梳子
- 粉丝: 1597
- 资源: 5735
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功