Principal Component Analysis (PCA)
Mr. Abhilash Singh & Mr. Zafar Beg, Teaching Assistant
Department of Earth and Environmental Sciences
Indian Institute of Science Education and Research, Bhopal
EES 336
Mr. Abhilash Singh & Mr. Zafar Beg ( Department of Earth and Environmental Sciences Indian Institute of Science Education and Research, Bhopal )Remote Sensing & GIS Laboratory EES 336 EES 336 1 / 11
What is PCA?
PCA is a mathematical procedure that transforms a no. of possible
correlated variables into smaller no. of uncorrelated variables called
principal components (PC’s).
The first PC accounts for the highest variability (i.e., variance) in the
data and the succeeding components has less variability than the
preceding one.
It is a dimension reduction or data compression method (without
losing much information).
Mr. Abhilash Singh & Mr. Zafar Beg ( Department of Earth and Environmental Sciences Indian Institute of Science Education and Research, Bhopal )Remote Sensing & GIS Laboratory EES 336 EES 336 2 / 11
Steps for PCA
Start with the data for n observations on p variables. (How to arrange
your data?)
Form a matrix of size n × p with deviations from mean for each of
the variables. (Why we do this?)
Calculate the covariance matrix (p × p). (Why covariance matrix?)
Calculate the eigenvalues and eigenvectors of the covariance matrix.
(What are eigenvalues and eigenvectors?)
Choose principal components and form a feature vector.
Derive the new data set.
Mr. Abhilash Singh & Mr. Zafar Beg ( Department of Earth and Environmental Sciences Indian Institute of Science Education and Research, Bhopal )Remote Sensing & GIS Laboratory EES 336 EES 336 3 / 11
PCA in MATLAB
[coeff, score, latent, ∼, explained] = pca(X)
where,
X: Input data of n x p dimension. The rows (n) should be the
observations and the column (p) should be the variables. X should be
zero-centered, i.e X= X - mean(X).
coeff: It is a p x p matrix where each column is a principal
component. The first column explains the most variance. (coeff =
eigen vector).
score: Data ’X’ transformed into PC spaces.
i.e X = score * coeff will be the reconstructed data.
latent: Variance explained by each PC
explained: Percentage of the total variance explained by each PC.
explained =
latent
total(latent)
∗ 100
It is used to decide how many PC’s to keep.
Mr. Abhilash Singh & Mr. Zafar Beg ( Department of Earth and Environmental Sciences Indian Institute of Science Education and Research, Bhopal )Remote Sensing & GIS Laboratory EES 336 EES 336 4 / 11
PCA in MATLAB
[coeff, score, latent, ∼, explained] = pca(X)
where,
X: Input data of n x p dimension. The rows (n) should be the
observations and the column (p) should be the variables. X should be
zero-centered, i.e X= X - mean(X).
coeff: It is a p x p matrix where each column is a principal
component. The first column explains the most variance. (coeff =
eigen vector).
score: Data ’X’ transformed into PC spaces.
i.e X = score * coeff will be the reconstructed data.
latent: Variance explained by each PC
explained: Percentage of the total variance explained by each PC.
explained =
latent
total(latent)
∗ 100
It is used to decide how many PC’s to keep.
Mr. Abhilash Singh & Mr. Zafar Beg ( Department of Earth and Environmental Sciences Indian Institute of Science Education and Research, Bhopal )Remote Sensing & GIS Laboratory EES 336 EES 336 4 / 11