the data is obtained. The advantages of these methods
with respect to most state-of-the-art algorithms are that
they can handle noise and outliers in data, and that they
do not need to know the dimensions and, in principle, the
number of subspaces a priori.
1.2 Paper Contributions
In this paper, we propose and study an algorithm based on
sparse representation techniques, called SSC, to cluster a
collection of data points lying in a union of low-dimensional
subspaces. The underlying idea behind the algorithm is
what we call the self-expressiveness property of the data,
which states that each data point in a union of subspaces
can be efficiently represented as a linear or affine combina-
tion of other points. Such a representation is not unique in
general because there are infinitely many ways in which a
data point can be expressed as a combination of other
points. The key observation is that a sparse representation of a
data point ideally corresponds to a combination of a few
points from its own subspace. This motivates solving a
global sparse optimization program whose solution is used
in a spectral clustering framework to infer the clustering of
data. As a result, we can overcome the problems of local
spectral clustering-based algorithms, such as choosing the
right neighborhood size and dealing with points near
the intersection of subspaces, since, for a given data point,
the sparse optimization program automatically picks a few
other points that are not necessarily close to it but that
belong to the same subspace.
Since solving the sparse optimization program is in
general NP-hard, we consider its ‘
1
relaxation. We show
that, under mild conditions on the arrangement of sub-
spaces and data distribution, the proposed ‘
1
-minimization
program recovers the desired solution, guaranteeing the
success of the algorithm. Our theoretical analysis extends
the sparse representation theory to the multisubspace
setting where the number of points in a subspace is
arbitrary, possibly much larger than its dimension. Unlike
block-sparse recovery problems [42], [43], [44], [45], [46],
[47] where the bases for the subspaces are known and
given, we do not have the bases for subspaces nor do we
know which data points belong to which subspace, making
our case more challenging. We only have the sparsifying
dictionary for the union of subspaces given by the matrix of
data points.
The proposed ‘
1
-minimization program can be solved
efficiently using convex programming tools [48], [49], [50]
and does not require initialization. Our algorithm can
directly deal with noise, sparse outlying entries, and
missing entries in the data as well as the more general class
of affine subspaces by incorporating the data corruption or
subspace model into the sparse optimization program.
Finally, through experimental results, we show that our
algorithm outperforms state-of-the-art subspace clustering
methods on the two real-world problems of motion
segmentation (see Fig. 1) and face clustering (see Fig. 2).
Paper organization. In Section 2, we motivate and
introduce the SSC algorithm for clustering data points in a
union of linear subspaces. In Section 3, we generalize the
algorithm to deal with noise, sparse outlying entries, and
missing entries in the data as well as the more general class
of affine subspaces. In Section 4, we investigate theoretical
conditions under which the ‘
1
-minimization program re-
covers the desired sparse representations of data points. In
Section 5, we discuss the connectivity of the similarity
graph and propose a regularization term to increase the
connectivity of points in each subspace. In Section 6, we
verify our theoretical analysis through experiments on
synthetic data. In Section 7, we compare the performance
of SSC with the state of the art on the two real-world
problems of motion segmentation and face clustering.
Finally, Section 8 concludes the paper.
2SPARSE S UBSPACE CLUSTERING
In this section, we introduce the SSC algorithm for
clustering a collection of multisubspace data using sparse
representation techniques. We motivate and formulate the
algorithm for data points that perfectly lie in a union of
linear subspaces. In the next section, we will generalize the
algorithm to deal with data nuisances such as noise, sparse
outlying entries, and missing entries, as well as the more
general class of affine subspaces.
Let fS
‘
g
n
‘¼1
be an arrangement of n linear subspaces
of IR
D
of dimensions fd
‘
g
n
‘¼1
. Consider a given collection
of N noise-free data points fyyyy
i
g
N
i¼1
that lie in the union of
the n subspaces. Denote the matrix containing all the data
points as
YYYY ¼
4
½ yyyy
1
... yyyy
N
¼½YYYY
1
... YYYY
n
; ð1Þ
where YYYY
‘
2 IR
DN
‘
is a rank-d
‘
matrix of the N
‘
>d
‘
points
that lie in S
‘
and 2 IR
NN
is an unknown permutation
matrix. We assume that we do not know a priori the bases
of the subspaces nor do we know which data points belong
to which subspace. The subspace clustering problem refers to
the problem of finding the number of subspaces, their
dimensions, a basis for each subspace, and the segmenta-
tion of the data from YYYY .
To address the subspace clustering problem, we propose
an algorithm that consists of two steps. In the first step, for
each data point, we find a few other points that belong to
the same subspace. To do so, we propose a global sparse
optimization program whose solution encodes information
about the memberships of data points to the underlying
subspace of each point. In the second step, we use these
information in a spectral clustering framework to infer the
clustering of the data.
2.1 Sparse Optimization Program
Our proposed algorithm takes advantage of what we refer
to as the self-expressiveness property of the data, i.e.,
each data point in a union of subspaces can be efficiently
reconstructed by a combination of other points in the dataset.
More precisely, each data point yyyy
i
2[
n
‘¼1
S
‘
can be written as
yyyy
i
¼ YYYYcccc
i
;c
ii
¼ 0; ð2Þ
where cccc
i
¼
4
½ c
i1
c
i2
... c
iN
>
and the constraint c
ii
¼ 0
eliminates the trivial solution of writing a point as a linear
combination of itself. In other words, the matrix of data
points YYYY is a self-expressive dictionary in which each point
can be written as a linear combination of other points.
ELHAMIFAR AND VIDAL: SPARSE SUBSPACE CLUSTERING: ALGORITHM, THEORY, AND APPLICATIONS 2767