IEEE TRANSACTIONS ON FUZZY SYSTEMS 1
Making use of partial knowledge about hidden
states in HMMs: an approach based on belief
functions
Emmanuel Ramasso, Thierry Denoeux
Abstract—This paper addresses the problem of parameter
estimation and state prediction in Hidden Markov Models
(HMMs) based on observed outputs and partial knowledge of
hidden states expressed in the belief function framework. The
usual HMM model is recovered when the belief functions are
vacuous. Parameters are learnt using the Evidential Expectation-
Maximization algorithm, a recently introduced variant of the
Expectation-Maximization algorithm for maximum likelihood
estimation based on uncertain data. The inference problem, i.e.,
finding the most probable sequence of states based on observed
outputs and partial knowledge of states, is also addressed.
Experimental results demonstrate that partial information about
hidden states, when available, may substantially improve the
estimation and prediction performances.
Index Terms—Hidden Markov Models, Dempster-Shafer The-
ory, Evidence Theory, Evidential Expectation-Maximisation
(E
2
M) algorithm, Uncertain data, Soft labels, Partially supervised
learning.
I. INTRODUCTION
Hidden Markov Models (HMMs) are powerful tools for
sequential data modeling and analysis. For several decades,
many complex applications have been successfully addressed
using HMMs, such as word sequence discovery in speech
recordings [20], motion sequence recognition in videos [30],
gene finding in DNA sequences [16], prognosis of ball bearing
degradation [11], [21] or financial time series forecasting [5].
A HMM is a simple dynamic Bayesian network composed
of observed random variables (outputs) X
t
and latent discrete
random variables (hidden states) Y
t
, where t is a discrete time
index [20] (Figure 1). The sequence of states Y
1
, Y
2
, . . . is a
Markov chain and the distribution of the output X
t
at time t,
as well as the distribution of X
t
conditional on all X
u
, only
depend on Y
t
. We note that this simple model has recently
been extended to “pairwise” [18] and “triplet” Markov chains
[19]. However, only the basic HMM will be considered in this
paper.
In the standard setting, the outputs are observed until some
time T while the states remain hidden. The model parameters
(i.e., the probability distribution of Y
1
, the state transition
probabilities and the parameters of the conditional probability
distributions of X
t
given Y
t
, referred to as emission probabil-
ities) can then be estimated using an iterative procedure called
Emmanuel Ramasso is with the FEMTO-ST Institute, UMR CNRS 6174 -
UFC / ENSMM / UTBM, Automatic Control and Micro-Mechatronic Systems
Department, 24 rue Alain Savary, F-25000 Besanc¸on, France
Thierry Denoeux is with the Universit
´
e de Technologie de Compi
`
egne,
Heudiasyc, UMR CNRS 7253, Centre de Recherches de Royallieu, BP 20529,
F-60205 Compi
`
egne Cedex, France
Y
t#
Y
t+1#
Y
t&1#
X
t&1#
X
t#
X
t+1#
Fig. 1. Graphical representation of a Hidden Markov Model.
the Baum-Welch algorithm [1], [20], which is a particular
instance of the Expectation-Maximization (EM) algorithm.
In this paper, we consider a different situation in which the
states are not completely hidden but are partially observed.
Partial observations of hidden states may be available in a
wide range of applications. For instance, in speech recognition,
partial information on words or phonemes may be available
from the analysis of lip motion. In behavior analysis, video se-
quences may be labeled with some imprecision or uncertainty.
In machine diagnosis and prognosis applications, experts may
express probability judgements on the machine condition at
different time steps, etc.
Here, partial knowledge about hidden states will be assumed
to be described using the the Dempster-Shafer theory of
belief functions [26], a formal framework for representing
and reasoning with uncertain information. This theory com-
bines logical and probabilistic approaches to uncertainty and
includes the set-membership and probabilistic frameworks as
special cases. In particular, it allows the representation of weak
knowledge up to complete ignorance: the usual HMM model
will thus be recovered as a special case.
In this context, we will solve the two classical problems
related to HMMs, i.e.,
1) Estimating the model parameters based on observations
of outputs and partial information on states (learning)
and
2) Finding the most likely sequence of states, given the
observed outputs and partial information on states (in-
ference).
The latter problem will be solved by a variant of the Viterbi
algorithm, while the former will be addressed using a method-
ology for statistical inference based on uncertain observations