Our work in this paper, withdraws image segmentation back to
the paradigm of pure clustering-then-labeling. Specially, instead of
processing on pixel level, we complete feature extraction and
clustering on the superpixel (Ren and Malik, 2003; Mori, 2005) le-
vel. Image is first over-segmented into many small homogeneous
regions, known as superpixels. On feature extraction of each
superpixel, we adopt a simple 3-D color feature vector which is
simply computed by averaging the L
⁄
a
⁄
b
⁄
color values over all of
the pixels in each superpixel. On clustering all superpixels in this
feature space, we propose a novel unsupervised clustering algo-
rithm. We combine mean-shift (Comaniciu and Meer, 2002) clus-
tering algorithm with Semi-supervised Discriminant Analysis
(SDA) (Cai et al., 2007) in an incremental learning scheme. We
use mean-shift to generate the cluster label, and use SDA to do
subspace selection. Both these steps are performed alternately.
Experimental results show that clustering result obtained by our
clustering algorithm is more coherent than those obtained by sev-
eral other popular clustering algorithms. Due to the robustness of
our clustering algorithm, the segmentation of image could ob-
tained by directly setting the superpixels on image with their cor-
responding clustering labels. Compared with several previous
methods, our segmentation method not only is simple and
straightforward, also has a competitive performance in quantita-
tive evaluation on Berkeley segmentation database (Martin
et al., 2001).
Another important work in this paper, is our proposed approach
to automatic parameter selection for our segmentation method.
Usually, given a segmentation algorithm, it’s not hard to select a
proper parameter of this algorithm via hand-tuning to obtain a
good segmentation for an image. But if we want to segment well
a database of images, say the Berkeley segmentation database,
and make the benchmark comparison with several segmentation
methods, we must build a methodological approach to parameter
selection. Under the same conditions, the aim is that the approach
can automatically select the proper parameter of segmentation
method for every image in the database to obtain a good segmenta-
tion. In this paper, from the experiments on the 300 images in
Berkeley segmentation database, it showed that our parameter
selection method had well achieved this aim.
2. Color feature of superpixel
Conventionally, image segmentation is thought as a low-level
vision task in which feature is usually extracted for each pixel.
For an image with a typical size, hundreds of thousands of data
points will have to be processed if we directly do clustering in
the feature space. In this paper, to reduce the computational cost
of clustering, we utilize the feature extracted on the superpixel le-
vel, instead of the pixel level.
Superpixel (Ren and Malik, 2003) is a small atomic region ob-
tained by doing over-segmentation on image based on the local
cues such as color (or texture) and edges. Compared with the large
number of pixels in an image, the over-segmented result usually
has only several hundreds of superpixels. Furthermore, two other
desirable properties of superpixel also motivate us to utilize it
for image segmentation. One is that each superpixel is a perceptu-
ally consistent unit, i.e. all pixels in a superpixel are almost uniform
in color or texture. The other is that the superpixel map obtained
by over-segmentation could well preserve most of the structures
(i.e. edges or contours) necessary for segmentation at the scale of
interest. In (Ren and Malik, 2003), authors have validated this
property by comparing the reconstructed segmentation from
superpixel map with the groundtruth using a contour-base mea-
sure. Both of these two properties make us believe that if we could
obtain a good clustering result in the feature space of superpixels,
and then label each superpixel with its corresponding cluster, we
can obtain a satisfying image segmentation result.
Since L
⁄
a
⁄
b
⁄
color metric was specially designed to best
approximate to human vision, and by which using the simple
Euclidean distance could differentiate among colors perceptually.
In this paper, on feature extraction of each superpixel, we first
convert the original RGB color of image into the L
⁄
a
⁄
b
⁄
color
space, and then compute the average value of the L
⁄
a
⁄
b
⁄
color
over the pixels in each superpixel. The obtained average value
is a 3-D vector in L
⁄
a
⁄
b
⁄
color space and will work as the feature
utilized to cluster the superpixels. In later part of this paper, we
call this 3-D L
⁄
a
⁄
b
⁄
color feature as superpixel color for
convenience.
Fig. 1(a) is a natural image with a size of 180 120 from Cor-
el image set. Fig. 1(b) is the superpixel map of this image, which
is generated by using the publicly available superpixel code
(Mori, 2005). In this superpixel map, there are 1071 superpixels
in all. Each superpixel is almost a perceptually homogeneous re-
gion. Fig. 1(c) is the human labeled segmentation for Fig. 1(a).
This ground truth
1
contains three class of labels for each pixel
in image Fig. 1(a). Similar to the computation of our superpixel
color, we can also approximately obtain the label of each super-
pixel in Fig. 1(b) via computing the mode of labels in Fig. 1(c) over
the pixels in each superpixel. When each superpixel is equipped
with a ground truth label, we can not only clearly observe the dis-
tribution of all superpixels in our superpixel color feature space,
but also have an immediate visual evaluation to the clustering re-
sults obtained by using other clustering algorithms in this super-
pixel color space.
Fig. 1(d) illustrates the distribution of 1071 superpixels of
Fig. 1(b) in our superpixel color feature space. Each data point is
marked according to the superpixel label computed from the
ground truth in Fig. 1(c). Fig. 1(d) can be thought as a ground truth
for evaluating a clustering result on these 1071 superpixels. If we
want to obtain a better final segmentation result for Fig. 1(a) sim-
ply by clustering-then-labeling, the clustering result in our super-
pixel color feature space should well approximate Fig. 1(d).
Fig. 1(e)–(g), respectively illustrate the clustering results on
these 1071 superpixels, which are obtained by using three popular
clustering algorithms. The clustering result shown in Fig. 1(e) is
generated by using the K-means clustering algorithm. To compare
with the ground truth in Fig. 1(d), we set K = 3. The clustering re-
sult shown in Fig. 1(f) is generated by using the normalized-cut
clustering algorithm.
2
This algorithm has only one tuning parame-
ter, which is the number of clusters to be generated. Here, we also
set it to be 3. The clustering result shown in Fig. 1(g) is generated
by using the mean-shift (Comaniciu and Meer, 2002) clustering
algorithm. The tuning parameter in this algorithm is called feature
bandwidth and usually denoted by h
r
. We set h
r
= 17 to make the
clustering result having three clusters.
From these clustering results in Fig. 1(e)–(g), we can see that
none of them could well approximate the ground truth in
Fig. 1(d). If we directly label the superpixel with these clustering
results on image, we will not obtain a good segmentation result
comparing with the ground truth in Fig. 1(c). This could partially
explain why the image segmentation methods in (Comaniciu and
Meer, 2002; Shi and Malik, 2000) incorporated the positional infor-
mation of pixels into their clustering process. In following sections,
we will show our clustering results on these 1071 superpixels for
comparison.
1
Both Fig. 1(a) and (c) can be downloaded from: www.cs.toronto.edu/~hexm/data/
corel_subset.mat.
2
Code can be download from www.cis.upenn.edu/~jshi/software/files/
NcutClustering_7.zip.
892 R. Huang et al. / Pattern Recognition Letters 32 (2011) 891–902