MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm资源-CSDN文库

4星 · 超过85%的资源需积分: 50 133 浏览量 2009-05-02 22:32:45 上传评论收藏 1.11MB PDF 举报

### 最大似然估计与EM算法 #### 一、引言《Maximum Likelihood from Incomplete Data via the EM Algorithm》是一篇由A. P. Dempster、N. M. Laird和D. B. Rubin在1977年发表的重要论文，该论文首次系统地介绍了EM算法（Expectation-Maximization algorithm）及其在处理不完整数据时的应用。EM算法是一种广泛应用于统计学和机器学习领域的强大工具，用于求解最大似然估计问题，尤其是在面对含有缺失数据的情况时尤为有效。 #### 二、EM算法简介 EM算法是一种迭代优化方法，用于在数据部分缺失的情况下估计模型参数的最大似然值。它通过两个主要步骤不断地迭代直至收敛：期望步（E-step）和最大化步（M-step）。具体来说： - **期望步（E-step）**：在这个步骤中，基于当前对参数的估计值，计算出隐藏变量的条件期望。这一步骤的目标是构建一个对于未知参数的期望函数。 - **最大化步（M-step）**：在得到了期望函数后，这一步骤通过最大化该期望函数来更新参数的估计值。这通常涉及到对数似然函数的极大化过程。 EM算法的一个关键特性是其单调性，即每次迭代都会使得似然函数非递减，从而确保了算法的稳定性。 #### 三、理论基础论文中详细讨论了EM算法的理论基础，包括证明了EM算法的收敛性质。具体而言，通过证明每次迭代之后似然函数不会减少，从而保证了算法最终能够收敛到一个局部最优解。此外，论文还探讨了EM算法的几种特殊情况以及如何应用到不同的统计模型中，例如： - **缺失值情况**：当数据集中存在缺失值时，EM算法可以通过迭代估计缺失数据的期望值来求解最大似然估计。 - **分组、截断或删失数据**：EM算法同样适用于处理这类数据，通过调整E-step中的期望计算方法来适应数据的特殊结构。 - **有限混合模型**：EM算法被广泛用于估计混合模型中的参数，特别是在高斯混合模型中。 - **方差成分估计**：EM算法可以用来估计复杂的方差成分模型。 - **超参数估计**：对于包含未知超参数的模型，EM算法提供了一种有效的估计方法。 - **迭代重加权最小二乘法**：EM算法与迭代重加权最小二乘法相结合，可以在某些情况下简化计算。 - **因子分析**：EM算法也可以用于因子分析中未知因子的估计。 #### 四、应用示例论文中提供了多个具体的例子来说明EM算法的应用，包括但不限于： - **缺失值的处理**：通过EM算法估计缺失数据，并求解最大似然估计。 - **分组、删失或截断数据**：在这些情况下，EM算法能够处理复杂的数据结构。 - **混合模型**：特别是高斯混合模型中的参数估计。 - **方差成分模型**：在这些模型中，EM算法可以用来估计复杂的方差结构。 - **超参数估计**：对于具有未知超参数的模型，EM算法提供了一个有效的解决方案。 #### 五、结论 EM算法因其灵活性和广泛适用性而成为处理不完整数据时进行最大似然估计的强大工具。A. P. Dempster、N. M. Laird和D. B. Rubin在1977年发表的这篇开创性论文不仅为后续的研究奠定了坚实的基础，而且至今仍然对实际应用有着深远的影响。EM算法不仅在统计领域有广泛应用，在机器学习、生物信息学等多个领域也有着重要的应用价值。

资源详情

资源评论

Maximum Likelihood from Incomplete Data via the EM Algorithm

A. P. Dempster; N. M. Laird; D. B. Rubin

Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp.

1-38.

Stable URL:

http://links.jstor.org/sici?sici=0035-9246%281977%2939%3A1%3C1%3AMLFIDV%3E2.0.CO%3B2-Z

Journal of the Royal Statistical Society. Series B (Methodological) is currently published by Royal Statistical Society.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained

prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in

the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

http://www.jstor.org/journals/rss.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For

more information regarding JSTOR, please contact support@jstor.org.

http://www.jstor.org

Mon Apr 2 09:45:50 2007

Maximum Likelihood from Incomplete Data via the

Algorithm

DEMPSTER,

LAIRD and D. B. RDIN

Harvard University and Educational Testing Service

[Read before the ROYAL

STATISTICAL

at a meeting organized

the RESEARCH

SOCIETY

SECTION

on Wednesday, December 8th,

1976,

Professor

SILVEY

the Chair]

broadly applicable algorithm for computing maximum likelihood estimates from

incomplete data is presented at various levels of generality. Theory showing the

monotone behaviour of the likelihood and convergence of the algorithm is derived.

Many examples are sketched, including missing value situations, applications to

grouped, censored or truncated data, finite mixture models, variance component

estimation, hyperparameter estimation, iteratively reweighted least squares and

factor analysis.

Keywords

MAXIMUM

LIKELIHOOD

;

INCOMPLETE DATA

;

ALGORITHM

;

POSTERIOR MODE

1. INTRODUCTION

THIS paper presents a general approach to iterative computation of maximum-likelihood

estimates when the observations can be viewed as incomplete data. Since each iteration of the

algorithm consists of an expectation step followed by a maximization step we call it the

algorithm. The

process is remarkable in part because of the simplicity and generality of

the associated theory, and in part because of the wide range of examples which fall under its

umbrella. When the underlying complete data come from an exponential family whose

maximum-likelihood estimates are easily computed, then each maximization step of an

algorithm is likewise easily computed.

The term "incomplete data" in its general form implies the existence of two sample spaces

and X and a many-one mapping from3 to

The observed data y are a realization from

CY.

The corresponding x in X is not observed directly, but only indirectly through y. More

specifically, we assume there is a mapping x+ y(x) from X to

and that x is known only to

lie in X(y), the subset of

determined by the equation y

y(x), where y is the observed data.

We refer to x as the

complete data

even though in certain examples x includes what are

traditionally called parameters.

We postulate a family of sampling densities f(x

depending on parameters and derive

its corresponding family of sampling densities g(y[+). The complete-data specification

...

...)

is related to the incomplete-data specification g(

...

...)

(1.1)

The

algorithm is directed at finding a value of

which maximizes g(y

g'iven an

observed y, but it does so by making essential use of the associated family f(xl+). Notice

that given the incomplete-data specification g(y1

+),

there are many possible complete-data

specifications

(x)

that will generate g(y

+).

Sometimes a natural choice will be obvious,

at other times there may be several different ways of defining the associated f(xl+).

Each iteration of the

algorithm involves two steps which we call the expectation step

(E-step) and the maximization step (M-step). The precise definitions of these steps, and their

associated heuristic interpretations, are given in Section

for successively more general types

of models. Here we shall present only a simple numerical example to give the flavour of the

method.

DEMPSTER

Maximum Likelihood from Incomplete Data

al.

[No. 1,

Rao (1965, pp. 368-369) presents data in which 197 animals are distributed multinomially

into four categories, so that the observed data consist of

A genetic model for the population specifies cell probabilities

in,

&(l

n), &(I

n), in) for some

with 0

Thus

Rao uses the parameter 0 where

0), and carries through one step of the familiar

Fisher-scoring procedure for maximizing g(y

/(I-

0),)

given the observed y. To illustrate the

algorithm, we represent y as incomplete data from a five-category multinomial population

where the cell probabilities are

(i,

an,

i(l -n), &(l

n), in), the idea being to split the first of

the original four categories into two categories. Thus the complete data consist of

(XI, XZ, X3, X4, ~5) where

XI+

x2,

x3,

x4,

x5,

and the complete data

specification is

(XI+ x2

+x5)

(~)ZI(in)..

iTp

4.1~~

(in)Xs.

(1.3)

f(x14

xl! x,! x3! x4! x,!

Note that the integral in (1.1) consists in this case of summing (1.3) over the (xl,xJ pairs

(0,125), (1,124),

...,

(125, O), while simply substituting (18,20,34) for (x3, x,, x,).

To define the

algorithm we show how to find

n(p+l)

from

n(p),

where

n(p)

denotes the

value of

after

iterations, for p

0,1,2, .

As stated above, two steps are required. The

expectation step estimates the sufficient statistics of the complete data

given the observed

data y. In our case, (x3, x4, x,) are known to be (18,20,34) so that the only sufficient statistics

that have to be estimated are xl and

x, where x,+x,

125. Estimating x1 and x, using

the current estimate of

leads to

~$13)

125

and xip)

125-

in(p)

&n(P)

tn(p)'

The maximization step then takes the estimated complete data (x:p),xip), 18,20,34) and

estimates

by maximum likelihood as though the estimated complete data were the observed

data, thus yielding

The

algorithm for this example is defined by cycling back and forth between (1.4) and (1.5).

Starting from an initial value of

do)

0.5, the algorithm moved for eight steps as displayed

in Table 1. By substituting

xip)

from equation (1.4) into equation (IS), and letting

n(p)

n(p+l)

we can explicitly solve a quadratic equation for the maximum-likelihood

estimate of n:

The second column in Table 1 gives the deviation n(p)-n*, and the third column gives the

ratio of successive deviations. The ratios are essentially constant for

3. The general theory

of Section 3 implies the type of convergence displayed in this example.

19771 DEMPSTER

Maximum Likelihood from Incomplete Data

et al.

The

algorithm has been proposed many times in special circumstances. For example,

Hartley (1958) gave three multinomial examples similar to our illustrative example. Other

examples to be reviewed in Section 4 include methods for handling missing values in normal

models, procedures appropriate for arbitrarily censored and truncated data, and estimation

TABLE1

The

aIgorithm in a simple case

T(")

-T* (Tf9+1) -T*)+(T'") -T*)

methods for finite mixtures of parametric families, variance components and hyperparameters

in Bayesian prior distributions of parameters. In addition, the

algorithm corresponds to

certain robust estimation techniques based on iteratively reweighted least squares. We

anticipate that recognition of the

algorithm at its natural level of generality will lead to new

and useful examples, possibly including the general approach to truncated data proposed in

Section 4.2 and the factor-analysis algorithms proposed in Section 4.7.

Some of the theory underlying the

algorithm was presented by Orchard and Woodbury

(1972), and by Sundberg (1976), and some has remained buried in the literature of special

examples, notably in Baum

al.

(1970). After defining the algorithm in Section 2, we

demonstrate in Section

the key results which assert that successive iterations always increase

the likelihood, and that convergence implies a stationary point of the likelihood. We give

sufficient conditions for convergence and also here a general description of the rate of con-

vergence of the algorithm close to a stationary point.

Although our discussion is almost entirely within the maximum-likelihood framework, the

technique and theory can be equally easily applied to finding the mode of the posterior

distribution in a Bayesian framework. The extension required for this application appears

at the ends of Sections 2 and

2. DEFINITIONSOF

THE

ALGORITHM

We now define the

algorithm, starting with cases that have strong restrictions on the

complete-data specification

+),

then presenting more general definitions applicable when

these restrictions are partially removed in two stages. Although the theory of Section

applies at the most general level, the simplicity of description and computational procedure,

and thus the appeal and usefulness, of the

algorithm are greater at the more restricted levels.

Suppose first that

has the regular exponential-family form

where

denotes a 1 x

vector parameter, t(x) denotes a

r vector of

complete-data

sufficient

statistics and the superscript T denotes matrix transppse. The term regular means here that

is restricted only to an r-dimensional convex set

such that (2.1) defines a density for all

The parameterization

in (2.1) is thus unique up to an arbitrary non-singular

linear transformation, as is the corresponding choice of t(x). Such parameters are often called

DEMPSTERet al.

Maximum Likelihood from Incomplete Data

[No. 1,

natural parameters, although in familiar examples the conventional parameters are often

non-linear functions of

For example, in binomial sampling, the conventional parameter

.rr

and the natural parameter

are related by the formula

log.rr/(l

-r).

In Section 2, we

adhere to the natural parameter representation for

when dealing with exponential families,

while in Section 4 we mainly choose conventional representations. We note that in (2.1) the

sample space

over which f(xl+)

is the same for all

in i2.

We now present a simple characterization of the

algorithm which can usually be applied

when (2.1) holds. Suppose that

+(p)

denotes the current value of

after

cycles of the

algorithm. The next cycle can be described in two steps, as follows:

E-step: Estimate the complete-data sufficient statistics t(x) by finding

M-step: Determine

+(pfl)

as the solution of the equations

Equations (2.3) are the familiar form of the likelihood equations for maximum-likelihood

estimation given data from a regular exponential family. That is, if we were to suppose that

t(p) represents the sufficient statistics computed from an observed x drawn from (2.1), then

equations (2.3) usually define the maximum-likelihood estimator of

Note that for given x,

maximizing log

is equivalent to maximizing

log a(+) +log b(x)

+t(~)~

which depends on x only through t(x). Hence it is easily seen that equations (2.3) define the

usual condition for maximizing

-logs(+)

++t(p)T whether or not t(p) computed from (2.2)

represents a value of t(x) associated with any x in

In the example of Section 1, the compo-

nents of x are integer-valued, while their expectations at each step usually are not.

difficulty with the M-step is that equations (2.3) are not always solvable for

in i2. In

such cases, the maximizing value of

lies on the boundary of i2 and a more general definition,

as given below, must be used. However, if equations (2.3) can be solved for

in i2, then the

solution is unique due to the well-known convexity property of the log-likelihood for regular

exponential families.

Before proceeding to less restricted cases, we digress to explain why repeated application

of the

E-

and M-steps leads ultimately to the value

that maximizes

L(+)

1% g(y

+I¶

(2.4)

where g(y

is defined from (1.1) and (2.1). Formal convergence properties of the

algorithm are given in Section 3 in the general case.

First, we introduce notation for the conditional density of x given y and

namely,

so that (2.4) can be written in the useful form

For exponential families, we note that

k(x

b(x) exp (+t(~)~)/a(+

Y),

where

剩余38页未读，继续阅读

评论收藏

内容反馈

teasea

2016-02-29

很好的一篇参考文献，找了很久，谢谢分享。

Maximum Likelihood from Incomplete Data via the EM Algorithm

评论3

最新资源

Maximum Likelihood from Incomplete Data via the EM Algorithm

评论3

最新资源

相关推荐

The behavior of maximum likelihood estimates under nonstandard conditions

Maximum Likelihood from Incomplete Data via the EM Algorithm.pdf

最大似然估计 Maximum Likelihood Estimation

The EM Algorithm and Extensions：EM算法及其扩展.pdf

基于python的改进高斯混合模型的图割算法研究源码数据库论文.docx

EM 算法在CT图像中肝癌检出中应用

EM算法—Dempster（英文）

The EM Algorithm and Extensions (2nd Edition)

图像处理和计算机视觉经典篇

图像处理和计算机视觉中的经典论文

A Maximum Likelihood Stereo Algorithm

Nonlocal maximum Likelihood estimation method

A Gentle Tutorial of the EM Algorithm and its apllication to parameter estimation for Gaussian Mixture and Hidden Markov Models

The EM Algorithm and Extensions

Maximum Likelihood Outlier Detection

EM algorithm for Gaussian Mixture Model

MUSIC and maximum likelihood techniques on two-dimensional DOA estimation

Least Squares Maximum Likelihood as Algorithm.rar_least square_

cml.rar_cml_cml.rar_likelihood_maximum likelihood

Maximum-Likelihood Array Processing in Non-Gaussian Noise with Gaussian Mixtures

Fast Estimation of Gaussian Mixture Models for Image Segmentation

MLPnP-matlab-MLPnP - A maximum likelihood solution to

DML_Algorithm_for_DOA_Estimation.zip_DOA_SML_dml 方位_dml算法_doa sm

matlab.rar_matlab维特比_maximum likelihood_似然比_最大似然算法_维特比译码

maximum likelihood estimation_最大似然估计_

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

Qt 5实现串口调试助手（源工程文件、0积分下载）