没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
1
History and Theoretical Basics
of Hidden Markov Models
Guy Leonard Kouemou
EADS Deutschland GmbH,
Germany
1. Introduction
The following chapter can be understood as one sort of brief introduction to the history and
basics of the Hidden Markov Models.
Hidden Markov Models (HMMs) are learnable finite stochastic automates. Nowadays, they
are considered as a specific form of dynamic Bayesian networks. Dynamic Bayesian
networks are based on the theory of Bayes (Bayes & Price, 1763).
A Hidden Markov Model consists of two stochastic processes. The first stochastic process is
a Markov chain that is characterized by states and transition probabilities. The states of the
chain are externally not visible, therefore “hidden”. The second stochastic process produces
emissions observable at each moment, depending on a state-dependent probability
distribution. It is important to notice that the denomination “hidden” while defining a
Hidden Markov Model is referred to the states of the Markov chain, not to the parameters of
the model.
The history of the HMMs consists of two parts. On the one hand there is the history of
Markov process and Markov chains, and on the other hand there is the history of algorithms
needed to develop Hidden Markov Models in order to solve problems in the modern
applied sciences by using for example a computer or similar electronic devices.
1.1. Brief history of Markov process and Markov chains
Andrey Andreyevich Markov (June 14, 1856 – July 20, 1922) was a Russian mathematician.
He is best known for his work on the theory of stochastic Markov processes. His research
area later became known as Markov process and Markov chains.
Andrey Andreyevich Markov introduced the Markov chains in 1906 when he produced the
first theoretical results for stochastic processes by using the term “chain” for the first time. In
1913 he calculated letter sequences of the Russian language.
A generalization to countable infinite state spaces was given by Kolmogorov (1931). Markov
chains are related to Brownian motion and the ergodic hypothesis, two topics in physics
which were important in the early years of the twentieth century. But Markov appears to
have pursued this out of a mathematical motivation, namely the extension of the law of
large numbers to dependent events.
Out of this approach grew a general statistical instrument, the so-called stochastic Markov
process.
In mathematics generally, probability theory and statistics particularly, a Markov process
can be considered as a time-varying random phenomenon for which Markov properties are
www.intechopen.com
Hidden Markov Models, Theory and Applications
4
achieved. In a common description, a stochastic process with the Markov property, or
memorylessness, is one for which conditions on the present state of the system, its future
and past are independent (Markov1908),(Wikipedia1,2,3).
Markov processes arise in probability and statistics in one of two ways. A stochastic process,
defined via a separate argument, may be shown (mathematically) to have the Markov
property and as a consequence to have the properties that can be deduced from this for all
Markov processes. Of more practical importance is the use of the assumption that the
Markov property holds for a certain random process in order to construct a stochastic model
for that process. In modelling terms, assuming that the Markov property holds is one of a
limited number of simple ways of introducing statistical dependence into a model for a
stochastic process in such a way that allows the strength of dependence at different lags to
decline as the lag increases.
Often, the term Markov chain is used to mean a Markov process which has a discrete (finite
or countable) state-space. Usually a Markov chain would be defined for a discrete set of
times (i.e. a discrete-time Markov Chain) although some authors use the same terminology
where "time" can take continuous values.
1.2 Brief history of algorithms need to develop Hidden Markov Models
With the strong development of computer sciences in the 1940's, after research results of
scientist like John von Neuman, Turing, Conrad Zuse, the scientists all over the world tried to
find algorithms solutions in order to solve many problems in real live by using deterministic
automate as well as stochastic automate. Near the classical filter theory dominated by the
linear filter theory, the non-linear and stochastic filter theory became more and more
important. At the end of the 1950's and the 1960's we can notice in this category the
domination of the "Luenberger-Observer", the "Wiener-Filter", the „Kalman-Filter" or the
"Extended Kalman-Filter" as well as its derivatives (Foellinger1992), (Kalman1960).
At the same period in the middle of the 20th century, Claude Shannon (1916 – 2001), an
American mathematician and electronic engineer, introduced in his paper "A mathematical
theory of communication'', first published in two parts in the July and October 1948 editions
of the Bell System Technical Journal, a very important historical step, that boosted the need
of implementation and integration of the deterministic as well as stochastic automate in
computer and electrical devices.
Further important elements in the History of Algorithm Development are also needed in
order to create, apply or understand Hidden Markov Models:
The expectation-maximization (EM) algorithm: The recent history of the expectation-
maximization algorithm is related with history of the Maximum-likelihood at the beginning
of the 20th century (Kouemou 2010, Wikipedia). R. A. Fisher strongly used to recommend,
analyze and make the Maximum-likelihood popular between 1912 and 1922, although it had
been used earlier by Gauss, Laplace, Thiele, and F. Y. Edgeworth. Several years later the EM
algorithm was explained and given its name in a paper 1977 by Arthur Dempster, Nan
Laird, and Donald Rubin in the Journal of the Royal Statistical Society. They pointed out
that the method had been "proposed many times in special circumstances" by other authors,
but the 1977 paper generalized the method and developed the theory behind it. An
expectation-maximization (EM) algorithm is used in statistics for finding maximum
likelihood estimates of parameters in probabilistic models, where the model depends on
unobserved latent variables. EM alternates between performing an expectation (E) step,
which computes an expectation of the likelihood by including the latent variables as if they
www.intechopen.com
History and Theoretical Basics of Hidden Markov Models
5
were observed, and maximization (M) step, which computes the maximum likelihood
estimates of the parameters by maximizing the expected likelihood found on the E step. The
parameters found on the M step are then used to begin another E step, and the process is
repeated. EM is frequently used for data clustering in machine learning and computer
vision. In natural language processing, two prominent instances of the algorithm are the
Baum-Welch algorithm (also known as "forward-backward") and the inside-outside
algorithm for unsupervised induction of probabilistic context-free grammars. Mathematical
and algorithmic basics of Expectation Maximization algorithm, specifically for HMM-
Applications, will be introduced in the following parts of this chapter.
The Baum-Welch algorithm: The Baum–Welch algorithm is a particular case of a
generalized expectation-maximization (GEM) algorithm (Kouemou 2010, Wikipedia). The
Baum–Welch algorithm is used to find the unknown parameters of a hidden Markov model
(HMM). It makes use of the forward-backward algorithm and is named for Leonard E.
Baum and Lloyd R. Welch. One of the introducing papers for the Baum-Welch algorithm
was presented 1970 "A maximization technique occurring in the statistical analysis of
probabilistic functions of Markov chains", (Baum1970). Mathematical and algorithmic basics
of the Baum-Welch algorithm specifically for HMM-Applications will be introduced in the
following parts of this chapter.
The Viterbi Algorithm: The Viterbi algorithm was conceived by Andrew Viterbi in 1967 as
a decoding algorithm for convolution codes over noisy digital communication links. It is a
dynamic programming algorithm (Kouemou 2010, Wikipedia). For finding the most likely
sequence of hidden states, called the Viterbi path that results in a sequence of observed
events. During the last years, this algorithm has found universal application in decoding the
convolution codes, used for example in CDMA and GSM digital cellular, dial-up modems,
satellite, deep-space communications, and 802.11 wireless LANs. It is now also commonly
used in speech recognition applications, keyword spotting, computational linguistics, and
bioinformatics. For example, in certain speech-to-text recognition devices, the acoustic signal
is treated as the observed sequence of events, and a string of text is considered to be the
"hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of
text given the acoustic signal (Wikipedia, David Forney's). Mathematical and algorithmic
basics of the Viterbi-Algorithm for HMM-Applications will be introduced in the following
parts of this chapter.
The chapter consists of the next following parts:
• Part 2: Mathematical basics of Hidden Markov Models
• Part 3: Basics of HMM in stochastic modelling
• Part4: Types of Hidden Markov Models
• Part5: Basics of HMM in signal processing applications
• Part6: Conclusion and References
2. Mathematical basics of Hidden Markov Models
Definition of Hidden Markov Models
A Hidden Markov Model (cf. Figure 1) is a finite learnable stochastic automate.
It can be summarized as a kind of double stochastic process with the two following aspects:
• The first stochastic process is a finite set of states, where each of them is generally
associated with a multidimensional probability distribution. The transitions between
www.intechopen.com
Hidden Markov Models, Theory and Applications
6
the different states are statistically organized by a set of probabilities called transition
probabilities.
• In the second stochastic process, in any state an event can be observed. Since we will
just analyze what we observe without seeing at which states it occurred, the states are
"hidden" to the observer, therefore the name "Hidden Markov Model".
Each Hidden Markov Model is defined by states, state probabilities, transition probabilities,
emission probabilities and initial probabilities.
In order to define an HMM completely, the following five Elements have to be defined:
1. The N states of the Model, defined by
{
}
1
,...,
N
SS S= (1)
2. The
M
observation symbols per state
{
}
1
,...,
M
Vv v= . If the observations are
continuous then M is infinite.
3. The State transition probability distribution
{
}
i
j
Aa= , where
i
j
a is the probability that
the state at time 1t
+
is
j
S , is given when the state at time t is
i
S . The structure of this
stochastic matrix defines the connection structure of the model. If a coefficient
i
j
a is
zero, it will remain zero even through the training process, so there will never be a
transition from state
i
S to
j
S
.
{
}
1
|,1,
ij t t
apq jqi ijN
+
=
== ≤≤
(2)
Where
t
q denotes the current state. The transition probabilities should satisfy the
normal stochastic constraints, 0, 1 ,
ij
ai
j
N≥≤≤ and
1
1, 1
N
ij
j
aiN
=
=
≤≤
∑
.
4. The Observation symbol probability distribution in each state,
{
}
()
j
Bbk= where
(
)
j
bk
is the probability that symbol
k
v is emitted in state
j
S .
{
}
() | , 1 , 1
jtkt
bk po v q j j N k M== = ≤≤ ≤≤
(3)
where
k
v denotes the
th
k
observation symbol in the alphabet, and
t
o the current
parameter vector.
The following stochastic constraints must be satisfied:
() 0, 1 , 1
j
bk j N k M≥≤≤ ≤≤ and
1
() 1, 1
M
j
k
bk j N
=
=
≤≤
∑
If the observations are continuous, then we will have to use a continuous probability
density function, instead of a set of discrete probabilities. In this case we specify the
parameters of the probability density function. Usually the probability density is
approximated by a weighted sum of
M Gaussian distributions N,
1
() ( , ,)
M
j
t
j
m
j
m
j
mt
m
bo c N o
μ
=
=Σ
∑
(4)
where
jm
cwei
g
htin
g
coe
ff
icients=
,
jm
mean vectors
μ
=
, and
www.intechopen.com
History and Theoretical Basics of Hidden Markov Models
7
jm
Covariance matricesΣ=
.
j
m
c
should also satisfy the stochastic assumptions
0, 1 , 1
jm
c
j
NmM≥≤≤ ≤≤ and
1
1, 1
M
jm
m
cjN
=
=
≤≤
∑
5. The HMM is the initial state distribution
{
}
i
π
π
=
, where
i
π
is the probability that the
model is in state
i
S at the time 0t
=
with
{
}
1
1
i
pq
iand i N
π
=
=≤≤ (5)
Fig. 1. Example of an HMM
By defining the HMM it is also very important to clarify if the model will be discrete,
continuing or a mix form (Kouemou 2007).
The following notation is often used in the literature by several authors (Wikipedia):
(
)
,,AB
λ
π
= (6)
to denote a Discrete HMM, that means with discrete probability distributions, while
(
)
,, , ,
jm jm jm
Ac
λ
μπ
=Σ (7)
is often used to denote a Continuous HMM that means with exploitations statics are based
here on continuous densities functions or distributions.
Application details to these different forms of HMM will be illustrated in the following parts
of this chapter.
3. Basics of HMM in stochastic modelling
This part of the chapter is a sort of compendium from well known literature (Baum1970),
(Huang1989), (Huang1990), (Kouemou2010), (Rabiner1986), (Rabiner1989), (Viterbi1967),
(Warakagoda2010), (Wikipedia2010) in order to introduce the problematic of stochastic
modelling using Hidden Markov Models.
In this part some important aspects of modelling Hidden Markov Models in order to solve
real problems, for example using clearly defined statistical rules, will be presented. The
stochastic modelling of an HMM automate consist of two steps:
www.intechopen.com
剩余307页未读,继续阅读
资源评论
宇宙神灵
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功