/** @file sift.c
** @brief SIFT - Definition
** @author Andrea Vedaldi
**/
/*
Copyright (C) 2007-12 Andrea Vedaldi and Brian Fulkerson.
All rights reserved.
This file is part of the VLFeat library and is made available under
the terms of the BSD license (see the COPYING file).
*/
/**
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@page sift Scale Invariant Feature Transform (SIFT)
@author Andrea Vedaldi
@par "Credits:" May people have contributed with suggestions and bug
reports. Although the following list is certainly incomplete, we would
like to thank: Wei Dong, Loic, Giuseppe, Liu, Erwin, P. Ivanov, and
Q. S. Luo.
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@ref sift.h implements a @ref sift-usage "SIFT filter object", a
reusable object to extract SIFT features @cite{lowe99object} from one
or multiple images.
- @ref sift-intro
- @ref sift-intro-detector
- @ref sift-intro-descriptor
- @ref sift-intro-extensions
- @ref sift-usage
- @ref sift-tech
- @ref sift-tech-ss
- @ref sift-tech-detector
- @ref sift-tech-detector-peak
- @ref sift-tech-detector-edge
- @ref sift-tech-detector-orientation
- @ref sift-tech-descriptor
- @ref sift-tech-descriptor-can
- @ref sift-tech-descriptor-image
- @ref sift-tech-descriptor-std
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@section sift-intro Overview
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
A SIFT feature is a selected image region (also called keypoint) with
an associated descriptor. Keypoints are extracted by the <b>@ref
sift-intro-detector "SIFT detector"</b> and their descriptors are
computed by the <b>@ref sift-intro-descriptor "SIFT descriptor"</b>. It is
also common to use independently the SIFT detector (i.e. computing the
keypoints without descriptors) or the SIFT descriptor (i.e. computing
descriptors of custom keypoints).
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@subsection sift-intro-detector SIFT detector
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@sa
@ref sift-tech-ss "Scale space technical details",
@ref sift-tech-detector "Detector technical details"
A SIFT <em>keypoint</em> is a circular image region with an
orientation. It is described by a geometric <em>frame</em> of four
parameters: the keypoint center coordinates @e x and @e y, its @e
scale (the radius of the region), and its @e orientation (an angle
expressed in radians). The SIFT detector uses as keypoints image
structures which resemble “blobs”. By searching for blobs
at multiple scales and positions, the SIFT detector is invariant (or,
more accurately, covariant) to translation, rotations, and rescaling
of the image.
The keypoint orientation is also determined from the local image
appearance and is covariant to image rotations. Depending on the
symmetry of the keypoint appearance, determining the orientation can
be ambiguous. In this case, the SIFT detectors returns a list of up to
four possible orientations, constructing up to four frames (differing
only by their orientation) for each detected image blob.
@image html sift-frame.png "SIFT keypoints are circular image regions with an orientation."
There are several parameters that influence the detection of SIFT
keypoints. First, searching keypoints at multiple scales is obtained
by constructing a so-called “Gaussian scale space”. The
scale space is just a collection of images obtained by progressively
smoothing the input image, which is analogous to gradually reducing
the image resolution. Conventionally, the smoothing level is called
<em>scale</em> of the image. The construction of the scale space is
influenced by the following parameters, set when creating the SIFT
filter object by ::vl_sift_new():
- <b>Number of octaves</b>. Increasing the scale by an octave means
doubling the size of the smoothing kernel, whose effect is roughly
equivalent to halving the image resolution. By default, the scale
space spans as many octaves as possible (i.e. roughly <code>
log2(min(width,height)</code>), which has the effect of searching
keypoints of all possible sizes.
- <b>First octave index</b>. By convention, the octave of index 0
starts with the image full resolution. Specifying an index greater
than 0 starts the scale space at a lower resolution (e.g. 1 halves
the resolution). Similarly, specifying a negative index starts the
scale space at an higher resolution image, and can be useful to
extract very small features (since this is obtained by interpolating
the input image, it does not make much sense to go past -1).
- <b>Number of levels per octave</b>. Each octave is sampled at this
given number of intermediate scales (by default 3). Increasing this
number might in principle return more refined keypoints, but in
practice can make their selection unstable due to noise (see [1]).
Keypoints are further refined by eliminating those that are likely to
be unstable, either because they are selected nearby an image edge,
rather than an image blob, or are found on image structures with low
contrast. Filtering is controlled by the follow:
- <b>Peak threshold.</b> This is the minimum amount of contrast to
accept a keypoint. It is set by configuring the SIFT filter object
by ::vl_sift_set_peak_thresh().
- <b>Edge threshold.</b> This is the edge rejection threshold. It is
set by configuring the SIFT filter object by
::vl_sift_set_edge_thresh().
<table>
<caption>Summary of the parameters influencing the SIFT detector.</caption>
<tr style="font-weight:bold;">
<td>Parameter</td>
<td>See also</td>
<td>Controlled by</td>
<td>Comment</td>
</tr>
<tr>
<td>number of octaves</td>
<td> @ref sift-intro-detector </td>
<td>::vl_sift_new</td>
<td></td>
</tr>
<tr>
<td>first octave index</td>
<td> @ref sift-intro-detector </td>
<td>::vl_sift_new</td>
<td>set to -1 to extract very small features</td>
</tr>
<tr>
<td>number of scale levels per octave</td>
<td> @ref sift-intro-detector </td>
<td>::vl_sift_new</td>
<td>can affect the number of extracted keypoints</td>
</tr>
<tr>
<td>edge threshold</td>
<td> @ref sift-intro-detector </td>
<td>::vl_sift_set_edge_thresh</td>
<td>decrease to eliminate more keypoints</td>
</tr>
<tr>
<td>peak threshold</td>
<td> @ref sift-intro-detector </td>
<td>::vl_sift_set_peak_thresh</td>
<td>increase to eliminate more keypoints</td>
</tr>
</table>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@subsection sift-intro-descriptor SIFT Descriptor
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
@sa @ref sift-tech-descriptor "Descriptor technical details"
A SIFT descriptor is a 3-D spatial histogram of the image gradients in
characterizing the appearance of a keypoint. The gradient at each
pixel is regarded as a sample of a three-dimensional elementary
feature vector, formed by the pixel location and the gradient
orientation. Samples are weighed by the gradient norm and accumulated
in a 3-D histogram @em h, which (up to normalization and clamping)
forms the SIFT descriptor of the region. An additional Gaussian
weighting function is applied to give less importance to gradients
farther away from the keypoint center. Orientations are quantized into
eight bins and the spatial coordinates into four each, as follows:
@image html sift-descr-easy.png "The SIFT descriptor is a spatial histogram of the image gradient."
SIFT descriptors are computed by either calling
::vl_sift_calc_keypoint_descriptor or
::vl_sift_calc_raw_descriptor. They accept as input a keypoint
frame, which specifies the descriptor center, its size, and its
orientation on the image plane. The following parameters influence the
descriptor calculation:
- <b>magnification factor</b>. The descriptor size is determined by
multiplying the keypoint scale by this factor. It is set