生统第七次作业
一、“Stocks.txt” is data related to stocks, with the symbols appearing in column
1, and various variables relating to the symbol on the right.
Question:
1) Apply PCA on this data and explain how much variability is explained by the
first two principal components.
2) How many components to keep if we want to have more than 90% variance
explained.
3) Use biplot() to visualize the PCA result, and interpret how many variables
comprised by the principal components 1.
Answer:
1)
> data <- read.table("Stocks.txt", row.names = 1, header = T)
> pca <- princomp(data, cor = T)
> summary(pca)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 2.070766 1.2971806 1.0228343 0.70727217
0.62193416 0.24914879 0.179082732 0.0433452635
Proportion of Variance 0.536009 0.2103347 0.1307738 0.06252924 0.04835026
0.00775939 0.004008828 0.0002348515
Cumulative Proportion 0.536009 0.7463437 0.8771174 0.93964667
0.98799693 0.99575632 0.999765149 1.0000000000
可以看出 PC1 解释了 53.6%的 variability , PC2 解释了 21%的 variability 。
2)
根据 Cumulative Proportion,可以知道,需要保留 PC1-PC4。
3)
> biplot(pca)
评论0