LSVM-MDPM Release 4 Notes
Ross Girshick
University of Chicago
rbg@cs.uchicago.edu
Pedro Felzenszwalb
University of Chicago
pff@cs.uchicago.edu
David McAllester
TTI at Chicago
mcallester@ttic.edu
April 21, 2010
1 Introduction
This note describes some recent advances that improve the performance of the object detection
system described in [2]. Some of the improvements included here comprised the UoC-TTI LSVM-
MDPM entry in the PASCAL VOC 2009 comp3 challenge [1] and others were developed subse-
quently. Complete source code for the latest version of the object detection system can be found
at http://people.cs.uchicago.edu/
~
pff/latent/.
2 Models
Figure 1 shows some of the models trained by the current version of the system.
In [2] each object class was represented by a two component mixture of deformable part mod-
els, where each component is bilaterally symmetric. We now use a richer class of models, where
each object class is represented by a three component mixture of asymmetric models. Bilateral
asymmetry allows each component to specialize at the task of detecting left or right object poses.
During detection each component is matched to the image in both left and right orientations. This
means that in effect we have a mixture model with six components, with the extra constraint that
components are grouped into left-right symmetric pairs. The left-right pose distinction is automat-
ically learned by our system in an unsupervised fashion without the use of additional pose labels
(we ignore the incomplete pose labels given by the PASCAL annotations).
2.1 Left-Right Pose Clustering
The input data is a set of images containing instances of an object class. The location and extent of
each instance is specified by a bounding box. As in [2] we use the aspect ratio of the bounding boxes
to separate the instances into different clusters. But now we further break down these clusters to
separate left and right facing examples.
We crop the image region under each bounding box and resize it to a fixed width and height.
Each cropped and resized region, along with its vertically flipped counterpart, is mapped into a
feature space (we use a variant of HOG features described in [2]) where clustering takes place.
The clustering algorithm is a variant of online k-means with the following constraint: no example
and its flipped counterpart may be placed into the same cluster. The algorithm begins by selecting
(uniformly at random) an example and its flipped counterpart. These feature vectors seed the two
1