empca.zip_EM_EMPCA_empca_empcamatlab_empca.zip

共1个文件

pdf：1个

版权申诉

190 浏览量 2022-07-14 00:52:09 上传评论收藏 119KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

empca.zip （1个子文件）

empca.pdf 167KB

EM Algorithms for PCA and SPCA

Sam Roweis



Abstract

I present an expectation-maximization (EM) algorithm for principal

component analysis (PCA). The algorithm allows a few eigenvectors and

eigenvalues to be extracted from large collections of high dimensional

data. It is computationally very efﬁcient in space and time. It also natu-

rally accommodates missing information. I also introduce a new variant

of PCA called sensible principal component analysis (SPCA) which de-

ﬁnes a proper density model in the data space. Learningfor SPCA is also

done with an EM algorithm. I report results on synthetic and real data

showing that these EM algorithms correctly and efﬁciently ﬁnd the lead-

ing eigenvectors of the covariance of datasets in a few iterations using up

to hundreds of thousands of datapoints in thousands of dimensions.

1 Why EM for PCA?

Principal component analysis (PCA) is a widely used dimensionality reductiontechnique in

data analysis. Its popularity comes from three important properties. First, it is the optimal

(in terms of mean squared error) linear scheme for compressing a set of high dimensional

vectors into a set of lower dimensional vectors and then reconstructing. Second, the model

parameters can be computed directly from the data – for example by diagonalizing the

sample covariance. Third, compression and decompression are easy operations to perform

given the model parameters – they require only matrix multiplications.

Despite these attractive features however, PCA models have several shortcomings. One is

that naive methods for ﬁnding the principal component directions have trouble with high

dimensional data or large numbers of datapoints. Consider attempting to diagonalize the

sample covariance matrix of



vectors in a space of



dimensions when



and



are several

hundred or several thousand. Difﬁculties can arise both in the form of computational com-

plexity and also data scarcity.

Even computing the sample covariance itself is very costly,

requiring







operations. In general it is best to avoid altogether computing the sample



[email protected]; Computation & Neural Systems, California Institute of Tech.

On the data scarcity front, we often do not have enough data in high dimensions for the sample

covariance to be of full rank and so we must be careful to employ techniques which do not require full

rank matrices. On the complexity front, direct diagonalization of a symmetric matrix thousands of

rows in size can be extremely costly since this operation is



for



inputs. Fortunately, several

techniques exist for efﬁcient matrix diagonalization when only the ﬁrst few leading eigenvectors and

eigenvalues are required (for example the power method [10] which is only



covariance explicitly. Methods such as the snap-shot algorithm [7] do this by assuming that

the eigenvectors being searched for are linear combinations of the datapoints; their com-

plexity is









. In this note, I present a version of the expectation-maximization (EM)

algorithm [1] for learning the principal components of a dataset. The algorithm does not re-

quire computing the sample covariance and has a complexity limited by









operations

where



is the number of leading eigenvectors to be learned.

Another shortcoming of standard approaches to PCA is that it is not obvious how to deal

properly with missing data. Most of the methods discussed above cannot accommodate

missing values and so incomplete points must either be discarded or completed using a

variety of ad-hoc interpolation methods. On the other hand, the EM algorithm for PCA

enjoys all the beneﬁts [4] of other EM algorithms in terms of estimating the maximum

likelihood values for missing information directly at each iteration.

Finally, the PCA model itself suffers from a critical ﬂaw which is independent of the tech-

nique used to compute its parameters: it does not deﬁne a proper probability model in the

space of inputs. This is because the density is not normalized within the principal subspace.

In other words, if we perform PCA on some data and then ask how well new data are ﬁt

by the model, the only criterion used is the squared distance of the new data from their

projections into the principal subspace. A datapoint far away from the training data but

nonetheless near the principal subspace will be assigned a high “pseudo-likelihood”or low

error. Similarly, it is not possible to generate “fantasy” data from a PCA model. In this note

I introduce a new model called sensible principal component analysis (SPCA), an obvious

modiﬁcation of PCA, which does deﬁne a proper covariance structure in the data space. Its

parameters can also be learned with an EM algorithm, given below.

In summary, the methods developed in this paper provide three advantages. They allow

simple and efﬁcient computation of a few eigenvectors and eigenvalues when working with

many datapoints in high dimensions. They permit this computation even in the presence of

missing data. On a real vision problem with missing information, I have computed the 10

leading eigenvectors and eigenvalues of



points in





dimensions in a few hours using

MATLAB on a modest workstation. Finally, through a small variation, these methods allow

the computation not only of the principal subspace but of a complete Gaussian probabilistic

model which allows one to generate data and compute true likelihoods.

2 Whence EM for PCA?

Principal component analysis can be viewed as a limiting case of a particular class of linear-

Gaussian models. The goal of such models is to capture the covariance structure of an ob-

served



-dimensional variable



using fewer than the

  







free parameters required in

a full covariance matrix. Linear-Gaussian models do this by assuming that



was produced

as a linear transformation of some



-dimensional latent variable



plus additive Gaussian

noise. Denoting the transformation by the







matrix



, and the (



-dimensional) noise



(with covariance matrix



) the generative model can be written

  



"!$#&%







"!$#&



(1a)

The latent or cause variables



are assumed to be independent and identically distributed

according to a unit variance spherical Gaussian. Since



are also independent and normal

distributed (and assumed independent of



), the model reduces to a single Gaussian model

All vectors are column vectors. To denote the transpose of a vector or matrix I use the notation

')(

. The determinant of a matrix is denoted by

* +,*

and matrix inversion by

+,-).

. The zero matrix

and the identity matrix is

. The symbol

means “distributed according to”. A multivariate

normal (Gaussian) distribution with mean

and covariance matrix

is written as



52$6&3



. The

same Gaussian evaluated at the point

is denoted



52$63



7* 8

for



which we can write explicitly:

 



!#  



, 



(1b)

In order to save parameters over the direct covariance representation in



-space, it is neces-

sary to choose







and also to restrict the covariance structure of the Gaussian noise



constraining the matrix



For example, if the shape of the noise distribution is restricted

to be axis aligned (its covariance matrix is diagonal) the model is known as factor analysis.

2.1 Inference and learning

There are two central problems of interest when working with the linear-Gaussian models

described above. The ﬁrst problem is that of state inference or compression which asks:

given ﬁxed model parameters



and



, what can be said about the unknown hidden states



given some observations



? Since the datapoints are independent, we are interested in

the posterior probability













over a single hidden state given the corresponding single

observation. This can be easily computed by linear matrix projection and the resulting

density is itself Gaussian:

















































  #



 





!#%



 





"!#  







 

(2a)













 





 #







 









 



 







(2b)

from which we obtain not only the expected value





of the unknown state but also an

estimate of the uncertainty in this value in the form of the covariance





. Computing



from



(reconstruction) is also straightforward:













 



  #







. Finally,

computing the likelihood of any datapoint



is merely an evaluation under (1b).

The second problem is that of learning, or parameter ﬁtting which consists of identifying

the matrices



and



that make the model assign the highest likelihood to the observed

data. There are a family of EM algorithms to do this for the various cases of restrictions to



but all follow a similar structure: they use the inference formula (2b) above in the e-step

to estimate the unknown state and then choose



and the restricted



in the m-step so as

to maximize the expected joint likelihood of the estimated



and the observed



2.2 Zero noise limit

Principal component analysis is a limiting case of the linear-Gaussian model as the covari-

ance of the noise



becomes inﬁnitesimally small and equal in all directions. Mathemati-

cally, PCA is obtained by taking the limit

 

! #"%$'&

. This has the effect of making

the likelihood of a point



dominated solely by the squared distance between it and its re-

construction

 

. The directions of the columns of



which minimize this error are known

as the principal components. Inference now reduces to

simple least squares projection:













 



(

 #

)





*









#"%$







" 











(3a)













 





















$ # !



 







.



















$



(3b)

Since the noise has become inﬁnitesimal, the posterior over states collapses to a single

point and the covariance becomes zero.

This restriction on

is not merely to save on parameters: the covariance of the observation noise

must be restricted in some way for the model to capture any interesting or informative projections in

the state

. If

were not restricted, the learning algorithm could simply choose

021

and then

set

to be the covariance of the data thus trivially achieving the maximum likelihood model by

explaining all of the structure in the data as noise. (Remember that since the model has reduced to a

single Gaussian distribution for

we can do no better than having the covariance of our model equal

the sample covariance of our data.)

Recall that if

 

with



6574

and is rank

then left multiplication by

(



#080

(



-).

(which appears not to be well deﬁned because



#090

(



is not invertible) is exactly equivalent to left

multiplication by



(



- .

(

. The intuition is that even though

080

(

truly is not invertible, the

directions along which it is not invertible are exactly those which

(

is about to project out.

评论收藏

内容反馈

版权申诉

小波思基

粉丝: 72
资源: 1万+

empca.zip_EM_EM PCA_empca_empca matlab_empca.zip

最新资源

empca.zip_EM_EM PCA_empca_empca matlab_empca.zip

matlab PCA算法聚类.zip

PCA.zip_matlab_

EM.zip_EM matlab

matlab 实现EM算法 程序源码.zip

PCA.zip_图形图像处理_matlab_

PCA.zip_PCA matlab_PCA matlab_PCA主成分_PCA主成分分析_matlab PCA

pca.zip_PCA Matlab_PCA matlab_PCA 代码_pca

pca.rar_ pca_PCA matlab_matlab pca_matlab PCA_pca

matlab_PCA.zip_matlab_PCA ceshi1.m_matlab_pca_pca matlabppt_pca三

matlab_PCA故障诊断.zip

PCA.zip_matlab例程_matlab_

EM.zip_matlab例程_matlab_

EM.rar_EM_EM matlab_EM-PCA

PCA.zip_Windows编程_matlab_

pca_process.zip_PCA实现降维_PCA降维MATLAB_pca_pca降维

matlab.zip_SVM_medfilt MATLAB_pca_pca svm matlab_svm matlab

pca.zip_DEMO_PCA DEMO_PCA matlab_matlab PCA

pca.zip_FERET DATABASE _FERET.MAT_On the Up_PCA.m_feret

pca.zip_PCA matlab_pca

PCA.zip_PCA 人脸识别_PCA图像_matlab图像识别_人脸识别 matlab_图像识别 PCA

PCA.zip_PCA matlab_pca_主成分分析

matlab编写的EM聚类算法.zip_EM 聚类_EM算法_matlab_改进EM算法_聚类算法 MATLAB

MNIST-PCA.zip_MNIST_PCA MNIST分类_matlab PCA_pca mnist_生成 mnist

gpml-matlab.zip_EM_EM image_ep_gpml matlab_sq_dist.m

PCA.m_机器学习_facerecognition_matlab人脸pca_源码.zip

PCA-MATLAB.zip_eitherci9_pca_人脸 pca matlab_人脸识别 matlab_人脸识别matla

PCAT.zip_BSS matlab_PCA BSS MATLAB_bss_pca bss_zip

svm+pca.zip_PCA SVM_PCA-SVM_SVM_matlab_svm pca

PCA代码.zip_PCA.m_pca_thick5fw

最新资源

matlab 实现EM算法程序源码.zip