An Analysis of the SURF Method
1 Introduction
1.1 Context, Motivation and Prev io us Work
Over the last decade, the most successful algorithms to address variou s computer vision problems have
been based on local, affine- i nvariant descriptions of images. The targeted applicati on s encompass,
but are not limited to, i m a ge stitching and registration, image matching and comparison, indexation
and cla ssi fi ca t i on , depth estimation and 3-D reconstruct i o n . Like many i m a g e processing approach es,
a popular and efficient methodology is to extract and compa r e local patches from di ff er ent images.
However, in order to design fast algorithms an d obtain compact and loca l l y invariant representations,
some selection criteria and normalization procedures are required. A sparse representation of the
image is also necessary to avoid extensive patch-wise comparisons that would be computatio n a l l y
expensive. The main challenges are thus to keep most salient features from images (such as corners,
blobs or edges) and then to build a local descripti o n of these features which is invariant to various
perturbations, such as noi sy measurements, photometric changes, or geometric transform a t i on .
Such problems h ave been addressed since the early years of computer vision, resulting in a very
prolific literature. Without being ex h au s ti ve, one may first mention the famous Stephen-Harr i s
corner detector [
9], and the seminal work of Lindeberg on multi-scale feature detection (see e.g. [ 1 2 ] ).
Secondly, invariant local image description from multi-scale analysis is a more recent topic: SIFT
descriptors [
15] –from which SURF [2] is largely inspired– are similarity invariant descriptors of an
image that are also robust to no i se and photometric change. Some algorithm s extend this framework
to fu l l y affine transformation invariance [18, 28], and dense representation [26].
The main interest of the SURF approach [
2] studied in t h i s paper is its fast appr oximation of the
SIFT method. It has been shown to share the same robustness and invariance while being faster to
compute.
1.2 Outline and Algorithm Overview
The SURF algorithm is in itself based on two con secu t i ve steps (feature detection and description)
that are descri bed in Sections
4 and 5. The last step is speci fi c to the application targeted. In this
paper, we chose image matching as an illustrat i on (Sections
5.4 a n d 6).
Multi-scale analysis Similarly to many other approaches, such as the SIFT method [
15], the
detection of features in SURF relies on a scale-space representation, combined wi t h first and second
order differential operators. The original i ty of the SURF algorithm (Speeded Up Robust Features) is
that these operations are speeded up by the use of box filters techniques (s ee e.g. [
25], [27]) tha t are
described in Section
2. For this reason, we wil l use the term box-space to distinguish it from the usual
Gaussian scale-space. While the Gaussian scale space is obta i n ed by convolution of the ini ti a l images
with Gaussian kernels, the discrete box-space is a l so obtained by convolving the origi n a l image with
box filters at various scales. A comp a ri s on between these two scale-spaces is proposed in Section
3.
Feature detection During the detectio n step, the local maxima i n the box-space of the “determi-
nant of Hessian” operator are used to select interest point candidates (Section
4). These candidates
are then valid a t ed if the response is above a gi ven threshold. Both the scale and loca ti on of these
candidates are then refined using quadratic fitting. Typically, a few hundred interest points are
detected in a megapixel image.
Feature description The purpose of the next step descr i bed in Section
5 is to build a descriptor
of the neighborhood of each point of interest that is invar i ant to view-point changes. Thanks to
177