HiddenMarkovModels,TheoryandApplications资源-CSDN文库

markov

马尔科夫

需积分: 9 85 浏览量 2014-05-23 11:08:10 上传评论 1 收藏 7.62MB PDF 举报

资源推荐

资源详情

资源评论

History and Theoretical Basics

of Hidden Markov Models

Guy Leonard Kouemou

EADS Deutschland GmbH,

Germany

1. Introduction

The following chapter can be understood as one sort of brief introduction to the history and

basics of the Hidden Markov Models.

Hidden Markov Models (HMMs) are learnable finite stochastic automates. Nowadays, they

are considered as a specific form of dynamic Bayesian networks. Dynamic Bayesian

networks are based on the theory of Bayes (Bayes & Price, 1763).

A Hidden Markov Model consists of two stochastic processes. The first stochastic process is

a Markov chain that is characterized by states and transition probabilities. The states of the

chain are externally not visible, therefore “hidden”. The second stochastic process produces

emissions observable at each moment, depending on a state-dependent probability

distribution. It is important to notice that the denomination “hidden” while defining a

Hidden Markov Model is referred to the states of the Markov chain, not to the parameters of

the model.

The history of the HMMs consists of two parts. On the one hand there is the history of

Markov process and Markov chains, and on the other hand there is the history of algorithms

needed to develop Hidden Markov Models in order to solve problems in the modern

applied sciences by using for example a computer or similar electronic devices.

1.1. Brief history of Markov process and Markov chains

Andrey Andreyevich Markov (June 14, 1856 – July 20, 1922) was a Russian mathematician.

He is best known for his work on the theory of stochastic Markov processes. His research

area later became known as Markov process and Markov chains.

Andrey Andreyevich Markov introduced the Markov chains in 1906 when he produced the

first theoretical results for stochastic processes by using the term “chain” for the first time. In

1913 he calculated letter sequences of the Russian language.

A generalization to countable infinite state spaces was given by Kolmogorov (1931). Markov

chains are related to Brownian motion and the ergodic hypothesis, two topics in physics

which were important in the early years of the twentieth century. But Markov appears to

have pursued this out of a mathematical motivation, namely the extension of the law of

large numbers to dependent events.

Out of this approach grew a general statistical instrument, the so-called stochastic Markov

process.

In mathematics generally, probability theory and statistics particularly, a Markov process

can be considered as a time-varying random phenomenon for which Markov properties are

www.intechopen.com

Hidden Markov Models, Theory and Applications

achieved. In a common description, a stochastic process with the Markov property, or

memorylessness, is one for which conditions on the present state of the system, its future

and past are independent (Markov1908),(Wikipedia1,2,3).

Markov processes arise in probability and statistics in one of two ways. A stochastic process,

defined via a separate argument, may be shown (mathematically) to have the Markov

property and as a consequence to have the properties that can be deduced from this for all

Markov processes. Of more practical importance is the use of the assumption that the

Markov property holds for a certain random process in order to construct a stochastic model

for that process. In modelling terms, assuming that the Markov property holds is one of a

limited number of simple ways of introducing statistical dependence into a model for a

stochastic process in such a way that allows the strength of dependence at different lags to

decline as the lag increases.

Often, the term Markov chain is used to mean a Markov process which has a discrete (finite

or countable) state-space. Usually a Markov chain would be defined for a discrete set of

times (i.e. a discrete-time Markov Chain) although some authors use the same terminology

where "time" can take continuous values.

1.2 Brief history of algorithms need to develop Hidden Markov Models

With the strong development of computer sciences in the 1940's, after research results of

scientist like John von Neuman, Turing, Conrad Zuse, the scientists all over the world tried to

find algorithms solutions in order to solve many problems in real live by using deterministic

automate as well as stochastic automate. Near the classical filter theory dominated by the

linear filter theory, the non-linear and stochastic filter theory became more and more

important. At the end of the 1950's and the 1960's we can notice in this category the

domination of the "Luenberger-Observer", the "Wiener-Filter", the „Kalman-Filter" or the

"Extended Kalman-Filter" as well as its derivatives (Foellinger1992), (Kalman1960).

At the same period in the middle of the 20th century, Claude Shannon (1916 – 2001), an

American mathematician and electronic engineer, introduced in his paper "A mathematical

theory of communication'', first published in two parts in the July and October 1948 editions

of the Bell System Technical Journal, a very important historical step, that boosted the need

of implementation and integration of the deterministic as well as stochastic automate in

computer and electrical devices.

Further important elements in the History of Algorithm Development are also needed in

order to create, apply or understand Hidden Markov Models:

The expectation-maximization (EM) algorithm: The recent history of the expectation-

maximization algorithm is related with history of the Maximum-likelihood at the beginning

of the 20th century (Kouemou 2010, Wikipedia). R. A. Fisher strongly used to recommend,

analyze and make the Maximum-likelihood popular between 1912 and 1922, although it had

been used earlier by Gauss, Laplace, Thiele, and F. Y. Edgeworth. Several years later the EM

algorithm was explained and given its name in a paper 1977 by Arthur Dempster, Nan

Laird, and Donald Rubin in the Journal of the Royal Statistical Society. They pointed out

that the method had been "proposed many times in special circumstances" by other authors,

but the 1977 paper generalized the method and developed the theory behind it. An

expectation-maximization (EM) algorithm is used in statistics for finding maximum

likelihood estimates of parameters in probabilistic models, where the model depends on

unobserved latent variables. EM alternates between performing an expectation (E) step,

which computes an expectation of the likelihood by including the latent variables as if they

www.intechopen.com

History and Theoretical Basics of Hidden Markov Models

were observed, and maximization (M) step, which computes the maximum likelihood

estimates of the parameters by maximizing the expected likelihood found on the E step. The

parameters found on the M step are then used to begin another E step, and the process is

repeated. EM is frequently used for data clustering in machine learning and computer

vision. In natural language processing, two prominent instances of the algorithm are the

Baum-Welch algorithm (also known as "forward-backward") and the inside-outside

algorithm for unsupervised induction of probabilistic context-free grammars. Mathematical

and algorithmic basics of Expectation Maximization algorithm, specifically for HMM-

Applications, will be introduced in the following parts of this chapter.

The Baum-Welch algorithm: The Baum–Welch algorithm is a particular case of a

generalized expectation-maximization (GEM) algorithm (Kouemou 2010, Wikipedia). The

Baum–Welch algorithm is used to find the unknown parameters of a hidden Markov model

(HMM). It makes use of the forward-backward algorithm and is named for Leonard E.

Baum and Lloyd R. Welch. One of the introducing papers for the Baum-Welch algorithm

was presented 1970 "A maximization technique occurring in the statistical analysis of

probabilistic functions of Markov chains", (Baum1970). Mathematical and algorithmic basics

of the Baum-Welch algorithm specifically for HMM-Applications will be introduced in the

following parts of this chapter.

The Viterbi Algorithm: The Viterbi algorithm was conceived by Andrew Viterbi in 1967 as

a decoding algorithm for convolution codes over noisy digital communication links. It is a

dynamic programming algorithm (Kouemou 2010, Wikipedia). For finding the most likely

sequence of hidden states, called the Viterbi path that results in a sequence of observed

events. During the last years, this algorithm has found universal application in decoding the

convolution codes, used for example in CDMA and GSM digital cellular, dial-up modems,

satellite, deep-space communications, and 802.11 wireless LANs. It is now also commonly

used in speech recognition applications, keyword spotting, computational linguistics, and

bioinformatics. For example, in certain speech-to-text recognition devices, the acoustic signal

is treated as the observed sequence of events, and a string of text is considered to be the

"hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of

text given the acoustic signal (Wikipedia, David Forney's). Mathematical and algorithmic

basics of the Viterbi-Algorithm for HMM-Applications will be introduced in the following

parts of this chapter.

The chapter consists of the next following parts:

• Part 2: Mathematical basics of Hidden Markov Models

• Part 3: Basics of HMM in stochastic modelling

• Part4: Types of Hidden Markov Models

• Part5: Basics of HMM in signal processing applications

• Part6: Conclusion and References

2. Mathematical basics of Hidden Markov Models

Definition of Hidden Markov Models

A Hidden Markov Model (cf. Figure 1) is a finite learnable stochastic automate.

It can be summarized as a kind of double stochastic process with the two following aspects:

• The first stochastic process is a finite set of states, where each of them is generally

associated with a multidimensional probability distribution. The transitions between

www.intechopen.com

Hidden Markov Models, Theory and Applications

the different states are statistically organized by a set of probabilities called transition

probabilities.

• In the second stochastic process, in any state an event can be observed. Since we will

just analyze what we observe without seeing at which states it occurred, the states are

"hidden" to the observer, therefore the name "Hidden Markov Model".

Each Hidden Markov Model is defined by states, state probabilities, transition probabilities,

emission probabilities and initial probabilities.

In order to define an HMM completely, the following five Elements have to be defined:

1. The N states of the Model, defined by

{

}

,...,

SS S= (1)

2. The

observation symbols per state

{

}

,...,

Vv v= . If the observations are

continuous then M is infinite.

3. The State transition probability distribution

{

}

Aa= , where

a is the probability that

the state at time 1t

S , is given when the state at time t is

S . The structure of this

stochastic matrix defines the connection structure of the model. If a coefficient

a is

zero, it will remain zero even through the training process, so there will never be a

transition from state

S to

{

}

|,1,

ij t t

apq jqi ijN

== ≤≤

(2)

Where

q denotes the current state. The transition probabilities should satisfy the

normal stochastic constraints, 0, 1 ,

N≥≤≤ and

1, 1

aiN

≤≤

∑

4. The Observation symbol probability distribution in each state,

{

}

()

Bbk= where

(

)

is the probability that symbol

v is emitted in state

S .

{

}

() | , 1 , 1

jtkt

bk po v q j j N k M== = ≤≤ ≤≤

(3)

where

v denotes the

observation symbol in the alphabet, and

o the current

parameter vector.

The following stochastic constraints must be satisfied:

() 0, 1 , 1

bk j N k M≥≤≤ ≤≤ and

() 1, 1

bk j N

≤≤

∑

If the observations are continuous, then we will have to use a continuous probability

density function, instead of a set of discrete probabilities. In this case we specify the

parameters of the probability density function. Usually the probability density is

approximated by a weighted sum of

M Gaussian distributions N,

() ( , ,)

bo c N o

=Σ

∑

(4)

where

cwei

htin

coe

icients=

mean vectors

, and

www.intechopen.com

剩余307页未读，继续阅读

评论收藏

内容反馈

宇宙神灵

粉丝: 0
资源: 2

Hidden Markov Models, Theory and Applications

Hidden Markov Models

Hidden Markov Models Fundamentals

Markov Chains Theory and Applications

A Tutorial on Hidden Markov Models

Introduction to Machine Learning (2014)

Russell S , Norvig P Artificial Intelligence- A Modern Approach (2Ed,Ph,2003)(T)(1112S)

The Application of Hidden Markov Models in Speech Recognition

Markov Chains -- Models, Algorithms and Applications

Inference in Hidden Markov Models

Face Recognition Using Hidden Markov Models

Introduction to Machine Learning第二版

Introduction To Machine Learning

Learning Probabilistic Graphical Models in R(PACKT,2016)

Learning.Probabilistic.Graphical.Models.in.R.17843920

The EM Algorithm and Extensions (2nd Edition)

An introduction to Hidden Markov Models kohlschein 2006

Adaptive Networks Theory, Models and Applications

Hidden Markov Models A Continuous-Time Version

HMM.rar_Hidden Markov Models_MARKOV

Handbook-of-Hidden-Markov-Models-in-Bioinformatics.pdf

Ppattern Recognition and Machine Learning

深度学习神经网络(英文版PDF教程）

Mixed Models: Theory and Applications

CentOS7Oracle11gInstallHelper.zip

风电场风速及功率数据.zip

最新资源