Maximum Likelihood from Incomplete Data via the
EM
Algorithm
By
A.
P.
DEMPSTER,
N.
M.
LAIRD and D. B. RDIN
Harvard University and Educational Testing Service
[Read before the ROYAL
STATISTICAL
at a meeting organized
by
the RESEARCH
SOCIETY
SECTION
on Wednesday, December 8th,
1976,
Professor
S.
D.
SILVEY
in
the Chair]
A
broadly applicable algorithm for computing maximum likelihood estimates from
incomplete data is presented at various levels of generality. Theory showing the
monotone behaviour of the likelihood and convergence of the algorithm is derived.
Many examples are sketched, including missing value situations, applications to
grouped, censored or truncated data, finite mixture models, variance component
estimation, hyperparameter estimation, iteratively reweighted least squares and
factor analysis.
Keywords
:
MAXIMUM
LIKELIHOOD
;
INCOMPLETE DATA
;
EM
ALGORITHM
;
POSTERIOR MODE
1. INTRODUCTION
THIS paper presents a general approach to iterative computation of maximum-likelihood
estimates when the observations can be viewed as incomplete data. Since each iteration of the
algorithm consists of an expectation step followed by a maximization step we call it the
EM
algorithm. The
EM
process is remarkable in part because of the simplicity and generality of
the associated theory, and in part because of the wide range of examples which fall under its
umbrella. When the underlying complete data come from an exponential family whose
maximum-likelihood estimates are easily computed, then each maximization step of an
EM
algorithm is likewise easily computed.
The term "incomplete data" in its general form implies the existence of two sample spaces
%Y
and X and a many-one mapping from3 to
Y.
The observed data y are a realization from
CY.
The corresponding x in X is not observed directly, but only indirectly through y. More
specifically, we assume there is a mapping x+ y(x) from X to
Y,
and that x is known only to
lie in X(y), the subset of
X
determined by the equation y
=
y(x), where y is the observed data.
We refer to x as the
complete data
even though in certain examples x includes what are
traditionally called parameters.
We postulate a family of sampling densities f(x
I
+)
depending on parameters and derive
its corresponding family of sampling densities g(y[+). The complete-data specification
f(
...
1
...)
is related to the incomplete-data specification g(
...
I
...)
by
(1.1)
The
EM
algorithm is directed at finding a value of
+
which maximizes g(y
1
+)
g'iven an
observed y, but it does so by making essential use of the associated family f(xl+). Notice
that given the incomplete-data specification g(y1
+),
there are many possible complete-data
specifications
f
(x)
+)
that will generate g(y
1
+).
Sometimes a natural choice will be obvious,
at other times there may be several different ways of defining the associated f(xl+).
Each iteration of the
EM
algorithm involves two steps which we call the expectation step
(E-step) and the maximization step (M-step). The precise definitions of these steps, and their
associated heuristic interpretations, are given in Section
2
for successively more general types
of models. Here we shall present only a simple numerical example to give the flavour of the
method.
评论3
最新资源