Characterizing Humans
on Riemannian Manifolds
Diego Tosato, Mauro Spera, Marco Cristani, Member, IEEE, and
Vittorio Murino, Senior Member, IEE E
Abstract—In surveillance applications, head and body orientation of people is of primary importance for assessing many behavioral
traits. Unfortunately, in this context people are often encoded by a few, noisy pixels so that their characterization is difficult. We face
this issue, proposing a computational framework which is based on an expressive descriptor, the covariance of features. Covariances
have been employed for pedestrian detection purposes, actually a binary classification problem on Riemannian manifolds. In this
paper, we show how to extend to the multiclassification case, presenting a novel descriptor, named weighted array of covariances,
especially suited for dealing with tiny image representations. The extension requires a novel differential geometry approach in which
covariances are projected on a unique tangent space where standard machine learning techniques can be applied. In particular, we
adopt the Campbell-Baker-Hausdorff expansion as a means to approximate on the tangent space the genuine (geodesic) distances on
the manifold in a very efficient way. We test our methodology on multiple benchmark datasets, and also propose new testing sets,
getting convincing results in all the cases.
Index Terms—Pedestrian characterization, covariance descriptors, Riemannian manifolds
Ç
1INTRODUCTION
I
N computer vision, and especially in video surveillance,
the capability of characterizing humans is surely of
primary importance. In this regard, social signal processing
studies [1] support the hypothesis that the body appearance
is critical for inferring many behavioral traits, yielding to
fine activity profiles. For example, head direction is
fundamental for discovering the focus of att ention of
individuals [2], [3] and detecting interacting people [4],
body posture, and gestures during an interaction are
typically indicators of speaking activity [5].
Characterizing humans becomes particularly trouble-
some whenever we handle small, noisy images. In such
cases, tasks such as body or head orientation estimation (see
Fig. 1a) turn out to be serious challenges. This fact induced
researchers to d esign novel features such as robust
classifiers or regressors for best exploiting the available
small bunch of pixels.
Recently, the use of covariance descriptors as composite
features emerged as a powerful means for pedestrian
detection [6]. In general, covariances showed to be naturally
suited for encoding classes of objects with high intraclass
variation, actually exploiting it for systematically encoding
mutual relations among basic cues (such as gradient, pixel
intensity, etc.) [7], [8], [9], [10]. For the pedestrian case,
Tuzel et al. [6] employed a boosting framework on Sym
þ
d
,
namely, the set of positive definite d d symmetric matrices
(covariance matrices). The idea was to build weak learners
by regression over the mappings of the training points on a
suitable tangent space. This tangent space was defined over
the weighted Karcher mean [11] of the positive training data
points so as to preserve their local layout on Sym
þ
d
. The
negative points instead (i.e., all but pe destrians) were
assumed to be spread on the manifold, without including
them in the estimation of the mean.
In this paper, our aim is to move to a multiclass
classification scenario, considering head and body orienta-
tions as object classes. In such a scenario, the above
considerations do not hold any more because we have
many “positive” classes, each of them localized in a different
part of the manifold. As a consequence: 1) Choosing the
Karcher mean of one class would privilege that class with
respect to the others, and 2) the Karcher mean of all classes
is inadequate. Therefore, our first contribution consists of a
theoretical analysis of this space so as to derive a point
individuating a common suitable projection point that do not
penalize any class. Such a point is chosen by analyzing
the local geometry of the manifold of the considered
samples, realizing that whenever the (sectional) curvature
of the manifold is, in general, weak, a good candidate is the
identity. This allows us to consider covariance matrices as
vectors in a euclidean space where state-of-the-art classifiers
can be utilized.
The second contribution consists of providing a novel
measure for calculating distances between the projected
points in such a way that the original geodesic distance is
robustly preserved in a finer way with respect to the adoption
of the euclidean distance. This comes by considering the
1972 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 8, AUGUST 2013
. D. Tosato and M. Spera are with the Dipartimento di Informatica,
University of Verona, Strada le Grazie 15, 37134 Verona, Italy.
E-mail: {diego.tosato, mauro.spera}@univr.it.
. M. Cristani and V. Murino are with the Pattern Analysis and Computer
Vision (PAVIS) Department, Istituto Italiano di Tecnologia, via Morego
30, 16163 Genova, Italy, and the Dipartimento di Informatica, University
of Verona, Strada le Grazie 15, 37134 Verona, Italy.
E-mail: {marco.cristani, vittorio.murino}@univr.it,
{marco.cristani,vittorio.murino}@iit.it.
Manuscript received 23 Dec. 2011; revised 7 Aug. 2012; accepted 30 Nov.
2012; published online 12 Dec. 2012.
Recommended for acceptance by B. Schiele.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2011-12-0924.
Digital Object Identifier no. 10.1109/TPAMI.2012.263.
0162-8828/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society
- 1
- 2
- 3
前往页