406 H. Bay, T. Tuytelaars, and L. Van Gool
coined Harris-Laplace and Hessian-Laplace [11]. They used a (scale-adapted)
Harris measure or the determinant of the Hessian matrix to select the location,
and the Laplacian to select the scale. Focusing on speed, Lowe [12] approxi-
mated the Laplacian of Gaussian (LoG) by a Difference of Gaussians (DoG)
filter.
Several other scale-invariant interest point detectors have been proposed. Ex-
amples are the salient region detector proposed by Kadir and Brady [13], which
maximises the entropy within the region, and the edge-based region detector pro-
posed by Jurie et al. [14]. They seem less amenable to acceleration though. Also,
several affine-invariant feature detectors have been proposed that can cope with
longer viewpoint changes. However, these fall outside the scope of this paper.
By studying the existing detectors and from published comparisons [15, 8],
we can conclude that (1) Hessian-based detectors are more stable and repeat-
able than their Harris-based counterparts. Using the determinant of the Hessian
matrix rather than its trace (the Laplacian) seems advantageous, as it fires less
on elongated, ill-localised structures. Also, (2) approximations like the DoG can
bring speed at a low cost in terms of lost accuracy.
Feature Descriptors. An even larger variety of feature descriptors has been
proposed, like Gaussian derivatives [16], moment invariants [17], complex fea-
tures [18, 19], steerable filters [20], phase-based local features [21], and descrip-
tors representing the distribution of smaller-scale features within the interest
point neighbourhood. The latter, introduced by Lowe [2], have been shown to
outperform the others [7]. This can be explained by the fact that they capture
a substantial amount of information about the spatial intensity patterns, while
at the same time being robust to small deformations or localisation errors. The
descriptor in [2], called SIFT for short, computes a histogram of local oriented
gradients around the interest point and stores the bins in a 128-dimensional
vector (8 orientation bins for each of the 4 × 4 location bins).
Various refinements on this basic scheme have been proposed. Ke and Suk-
thankar [4] applied PCA on the gradient image. This PCA-SIFT yields a 36-
dimensional descriptor which is fast for matching, but proved to be less distinc-
tive than SIFT in a second comparative study by Mikolajczyk et al. [8] and slower
feature computation reduces the effect of fast matching. In the same paper [8],
the authors have proposed a variant of SIFT, called GLOH, which proved to be
even more distinctive with the same number of dimensions. However, GLOH is
computationally more expensive.
The SIFT descriptor still seems to be the most appealing descriptor for prac-
tical uses, and hence also the most widely used nowadays. It is distinctive and
relatively fast, which is crucial for on-line applications. Recently, Se et al. [22]
implemented SIFT on a Field Programmable Gate Array (FPGA) and improved
its speed by an order of magnitude. However, the high dimensionality of the de-
scriptor is a drawback of SIFT at the matching step. For on-line applications
on a regular PC, each one of the three steps (detection, description, matching)
should be faster still. Lowe proposed a best-bin-first alternative [2] in order to
speed up the matching step, but this results in lower accuracy.
评论1
最新资源