Shape Context and Chamfer Matching in Cluttered Scenes
A. Thayananthan
B. Stenger
P. H. S. Torr
R. Cipolla
University of Cambridge
Microsoft Research Ltd.
Department of Engineering 7 JJ Thompson Avenue
Cambridge, CB2 1PZ, UK Cambridge, CB3 OFB, UK
at315|bdrs2|cipolla
@eng.cam.ac.uk philtorr@microsoft.com
Abstract
This paper compares two methods for object localization
from contours: shape context and chamfer matching of tem-
plates. In the light of our experiments, we suggest improve-
ments to the shape context: Shape contexts are used to find
corresponding features between model and image. In real
images it is shown that the shape context is highly influ-
enced by clutter, furthermore even when the object is cor-
rectly localized, the feature correspondence may be poor.
We show that the robustness of shape matching can be in-
creased by including a figural continuity constraint. The
combined shape and continuity cost is minimized using
the Viterbi algorithm on features sequentially around the
contour, resulting in improved localization and correspon-
dence. Our algorithm can be generally applied to any fea-
ture based shape matching method.
Chamfer matching correlates model templates with the
distance transform of the edge image. This can be done
efficiently using a coarse-to-fine search over the transfor-
mation parameters. The method is robust in clutter, how-
ever multiple templates are needed to handle scale, rotation
and shape variation. We compare both methods for locat-
ing hand shapes in cluttered images, and applied to word
recognition in EZ-Gimpy images.
1. Introduction
People use multiple visual cues to recognize objects, such
as object color, texture and shape. In the absence of color
and texture information, we can mostly still recognize ob-
jects by their geometry alone, for example in line drawings.
Groupinglowlevel features to segment the object is by itself
a hard problem. A common approach is, therefore, to use a
prototype shape, and search for it in the image. This leads
to the task of shape matching, which has numerous appli-
cations, such as object localization, image retrieval, model
registration, and tracking. One way to represent a shape
is by a set number of feature points, for example Canny
edges. In order to match two shapes, point correspondences
on the two shapes have to be established. Subsequently a
transformation which aligns the two shapes can be found.
The type of transformation depends on the particular set-
ting. Two examples are 2D affine transforms, and non-
rigid thin-plate spline transformations. The two problems
of finding correspondences and estimating the transforma-
tion are tightly coupled: The better the correspondences are
known, the better the transformation can be estimated, and
vice versa. Therefore, many methods are based on an it-
erated two-step algorithm, alternating estimation of corre-
spondence and transformation.
In the next section, we review existing work on shape
based and chamfer matching. The two methods are ex-
plained briefly in section 2, and we outline some of the
problems that arise when applied to scenes with cluttered
background in section 3. In section 4 we show how shape
context matching can be significantly improved by using
a continuity constraint. The dynamic programming algo-
rithm used for optimization readily generalizes to any other
type of feature. Section 5 shows experimental results on
two types of data, images of hands, and words on textured
background.
1.1. Previous Work
Belongie et al. [3] have introduced the shape context de-
scriptor, which characterizes a particular point location on
the shape. This descriptor is the histogram of the relative
polar coordinates of all other points. Corresponding points
on two different shapes have a similar relative position in
each shape, and will ideally have a similar shape context.
Shape context matching has been applied to a variety of ob-
ject recognition problems [3, 13]. The background clutter
in these applications was usually limited.
Sullivan and Carlsson [17] use a topology-based shape
descriptor to find correspondences. The topological type
of all combinations of four points is recorded in a voting
matrix, and one-to-one correspondences are found using a
greedy algorithm. The examples shown did not contain
significant clutter. While their topological descriptor has
1