2 APPENDIX A • HIDDEN MARKOV MODELS
state must sum to 1. Figure A.1b shows a Markov chain for assigning a probabil-
ity to a sequence of words w
1
...w
n
. This Markov chain should be familiar; in fact,
it represents a bigram language model, with each edge expressing the probability
p(w
i
|w
j
)! Given the two models in Fig. A.1, we can assign a probability to any
sequence from our vocabulary.
Formally, a Markov chain is specified by the following components:
Q = q
1
q
2
...q
N
a set of N states
A = a
11
a
12
...a
n1
...a
nn
a transition probability matrix A, each a
i j
represent-
ing the probability of moving from state i to state j, s.t.
P
n
j=1
a
i j
= 1 ∀i
π = π
1
,π
2
,...,π
N
an initial probability distribution over states. π
i
is the
probability that the Markov chain will start in state i.
Some states j may have π
j
= 0, meaning that they cannot
be initial states. Also,
P
n
i=1
π
i
= 1
Before you go on, use the sample probabilities in Fig. A.1a (with π = [.1,.7.,2])
to compute the probability of each of the following sequences:
(A.2) hot hot hot hot
(A.3) cold hot cold hot
What does the difference in these probabilities tell you about a real-world weather
fact encoded in Fig. A.1a?
A.2 The Hidden Markov Model
A Markov chain is useful when we need to compute a probability for a sequence
of observable events. In many cases, however, the events we are interested in are
hidden: we don’t observe them directly. For example we don’t normally observe
hidden
part-of-speech tags in a text. Rather, we see words, and must infer the tags from the
word sequence. We call the tags hidden because they are not observed.
A hidden Markov model (HMM) allows us to talk about both observed events
Hidden
Markov model
(like words that we see in the input) and hidden events (like part-of-speech tags) that
we think of as causal factors in our probabilistic model. An HMM is specified by
the following components:
Q = q
1
q
2
...q
N
a set of N states
A = a
11
...a
i j
...a
NN
a transition probability matrix A, each a
i j
representing the probability
of moving from state i to state j , s.t.
P
N
j=1
a
i j
= 1 ∀i
O = o
1
o
2
...o
T
a sequence of T observations, each one drawn from a vocabulary V =
v
1
,v
2
,...,v
V
B = b
i
(o
t
) a sequence of observation likelihoods, also called emission probabili-
ties, each expressing the probability of an observation o
t
being generated
from a state i
π = π
1
,π
2
,...,π
N
an initial probability distribution over states. π
i
is the probability that
the Markov chain will start in state i. Some states j may have π
j
= 0,
meaning that they cannot be initial states. Also,
P
n
i=1
π
i
= 1
评论0
最新资源