Variance and KMeans
ML 2022: Machine Learning
https://people.sc.fsu.edu/∼jburkardt/classes/ml 2022/cluster lab/cluster lab.pdf
Using KMeans, we can discover how data has formed cluster patterns.
Variance reduction!
We are interested in how much our data clusters around one or more centers.
• variance measures the tightness of the clustering;
• If we suspect a single cluster, we may want to standardize the data first;
• For multidimensional data, the covariance matrix reports variance, independence, and correla-
tion of the data;
• If there are multiple centers, a better model is available through kmeans;
• To use kmeans, we need to choose the number of clusters;
• The behavior of the inertia suggests the right number of clusters;
1 Copying the data
Each of the exercises will be carried out on a particular datafile. These datafiles are available on the datasets
page at the class website:
https://people.sc.fsu.edu/∼jburkardt/classes/ml 2022/datasets/datasets.html
You might go ahead now and download them all:
1