Fuzzy Clustering and Data
Analysis Toolbox
For Use with Matlab
Balazs Balasko, Janos Abonyi and Balazs Feil
Preface
About the Toolbox
The Fuzzy Clustering and Data Analysis Toolbox is a collection of Matlab
functions. Its propose is to divide a given data set into subsets (called
clusters), hard and fuzzy partitioning mean, that these transitions between
the subsets are crisp or gradual.
The toolbox provides four categories of functions:
• Clustering algorithms. These functions group the given data set
into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy
partitioning methods with different distance norms that are defined in
Section 1.2.
• Evaluation with cluster prototypes. On the score of the cluster-
ing results of a data set there is a possibility to calculate membership
for ”unseen” data sets with these set of functions. In 2-dimensional
case the functions draw a contour-map in the data space to visualize
the results.
• Validation. The validity function provides cluster validity measures
for each partition. It is useful, when the number of cluster is un-
known apriori. The optimal partition can be determined by the point
of the extrema of the validation indexes in dependence of the number
of clusters. The indexes calculated are:Partition Coefficient(PC), Clas-
sification Entropy(CE),Partition Index(SC), Separation Index(S), Xie and
Beni’s Index(XB), Dunn’s Index(DI) and Alternative Dunn Index(DII).
• Visualization The Visualization part of this toolbox provides the
modified Sammon mapping of the data. This mapping method is a
i
multidimensional scaling method described by Sammon. The original
method is computationally expensive when a new data point has to
be mapped, so a modified method described by Abonyi got into this
toolbox.
• Example. An example based on industrial data set to present the
usefulness of the purpose of these algorithms.
Installation
The installation is straightforward and it does not require any changes to
your system settings. If you would like to use these functions, just copy the
directory ”FUZZCLUST” within its files where the directory ”toolbox” is
situated (...\ MATLAB\ TOOLBOX \ ...).
Contact
Janos Abonyi or Balazs Feil:
Department of Process Engineering University of Veszprem
P.O.Box 158 H-8200, Veszprem, Hungary
Phone: +36-88-422-022/4209 Fax: +36-88-421-709
E-mail: abonyij@fmt.vein.hu, feilb@fmt.vein.hu
Web: (www.fmt.vein.hu/softcomp)
Fuzzy Clustering and Data Analysis Toolbox ii
Contents
1 Theoretical introduction 3
1.1 Cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 The data . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 The clusters . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Cluster partition . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 K-means and K-medoid algorithms . . . . . . . . . . . . 8
1.2.2 Fuzzy C-means algorithm . . . . . . . . . . . . . . . . . 8
1.2.3 The Gustafson–Kessel algorithm . . . . . . . . . . . . . 10
1.2.4 The Gath–Geva algorithm . . . . . . . . . . . . . . . . . 11
1.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Principal Component Analysis (PCA) . . . . . . . . . . 16
1.4.2 Sammon mapping . . . . . . . . . . . . . . . . . . . . . 17
1.4.3 Fuzzy Sammon mapping . . . . . . . . . . . . . . . . . 18
2 Reference 19
Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . 21
Kmeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Kmedoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1
FCMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
GKclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
GGclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
clusteval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
clustnormalize and clustdenormalize . . . . . . . . . . . . . . . 46
PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Sammon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
FuzSam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
projeval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
samstr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 Case Studies 55
3.1 Comparing the clustering methods . . . . . . . . . . . . . . . . 56
3.2 Optimal number of clusters . . . . . . . . . . . . . . . . . . . . 60
3.3 Multidimensional data sets . . . . . . . . . . . . . . . . . . . . 63
3.3.1 Iris data set . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.2 Wine data set . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.3 Breast Cancer data set . . . . . . . . . . . . . . . . . . 69
3.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 71
Fuzzy Clustering and Data Analysis Toolbox 2