SPECTRAL CLUSTERING WITH MEAN SHIFT PREPROCESSING
Umut Ozertem, Deniz Erdogmus
CSEE Department, OGI, Oregon Health & Science University, Portland, Oregon, USA
Abstract. Clustering is a fundamental problem in
machine learning with numerous important applications
in statistical signal processing, pattern recognition, and
computer vision, where unsupervised analysis of data
classification structures are required. The current state-
of-the-art in clustering is widely accepted to be the so-
called spectral clustering. Spectral clustering, based on
pairwise affinities of samples imposes very large
computational requirements. In this paper, we propose a
vector quantization preprocessing stage for spectral
clustering similar to the classical mean-shift principle
for clustering. This preprocessing reduces the
dimensionality of the matrix on which spectral
techniques will be applied, resulting in significant
computational savings.
1. INTRODUCTION
Data clustering is an important fundamental problem
having a wide range of applications on different aspects
of unsupervised learning; image segmentation, data
mining, speech recognition, and data compression to
name just a few. In recent years there has been a growing
interest on spectral clustering and it is recognized as an
important tool for clustering problems. In spectral
clustering, data segmentation is obtained using the
eigendecomposition of an affinity matrix that defines the
similarities in the data. In the definition of the affinity
matrix, different similarity measures can be utilized to
characterize the affinities. The affinities do not even have
to obey the metric axioms except the symmetry property.
Spectral clustering dates back to the discovery of the
utilization of the second eigenvector of the Laplacian
matrix to bi-partition the data, by Fiedler [1]. Recently, a
number of related clustering methods are suggested that
utilize the eigenvectors or generalized eigenvectors of the
affinity matrix [2-14]. Such methods are known as
spectral clustering and considered to be the state-of-the
art clustering methods in the literature.
The majority of the spectral clustering algorithms
are different variants of graph cut and multiway cut
methods, each using different affinity matrices and
utilizing the resulting eigendecomposition in different
manners. Obtaining the eigendecomposition, the
clustering is obtained by thresholding the values of a
suitably selected eigenvector. One should also notice that
these methods are sensitive to the definition of affinity
between the data pairs, and since no theoretical criterion
for choosing the functions to assign the affinities are
known, these algorithms require the assumption of the
existence of a suitable affinity definition.
A different track in spectral clustering was
designated by Scott and Longuet-Higgins [12], in which
they propose a mapping using the eigenvectors of the
affinity matrix to transform the data from the original data
space to the kernel induced feature space, and do the
actual clustering on the image of the data in that space.
Normalization of the transformed data is an important
step in this approach, and provided that, clustering of the
image of the data in the kernel induced feature space is
shown to be generating very successful results for a
variety of different data sets. Spectral clustering can be
understood as measuring sample similarities by an inner
product in the kernel-induced feature space. Using
Mercer kernels, the kernel trick defines a technique to
compute the inner products in the potentially infinite
dimensional kernel induced feature space. Kernel-based
methods rely on the assumption that the clustering in the
kernel induced feature space is easier compared to the
original clustering problem. In practice, one cannot prove
that this assumption holds for all Mercer kernels, on the
other hand, one could search for a kernel that makes this
desired property true. Kernel optimization, in general, is a
daunting task and there are no practical solutions yet. We
will exploit the connection of kernel methods with kernel
density estimation to be able to utilize well-known results
from this literature to select an appropriate kernel [15].
The main shortcoming of the spectral clustering
algorithms is the computational complexity, since these
algorithms require the computation of the eigenvectors of
the
NN
affinity matrix, where N is the sample size.
The computational complexity of all the eigenvector
calculations is O(N
3
), which makes the spectral clustering
methods impractical to use for large data sets.
In this paper, we propose a spectral clustering
algorithm using fixed-size kernel density estimation with
a mean shift algorithm to represent the data in a much
smaller and quantized affinity matrix. This leads to a
reduced computational complexity, which is defined in
the order of modes present in the data probability density
function.
2. THE PROPOSED METHOD
We discuss the details of the proposed method in this
section after a brief overview of spectral clustering.
Given a set of vectors {x
1
,…,x
N
} and a suitable kernel
function K
σ
(x
i
,x
j
) the measure of pairwise affinity or
closeness where
σ
denotes the kernel size (e.g., the
standard deviation in the case of a Gaussian kernel), the
affinity matrix K or the normalized graph Laplacian
matrix L are constructed as shown in (1) [5,7,8].
730-7803-9518-2/05/$20.00 ©2005 IEEE