SURF Speeded Up Robust Features

所需积分/C币:36 2015-10-28 18:06:23 1.05MB PDF
收藏 收藏
举报

SURF算法的经典原文,SURF (Speeded Up Robust Features)也是一种类似于SIFT的兴趣点检测及描述子算法。其通过Hessian矩阵的行列式来确定兴趣点位置,再根据兴趣点邻域点的Haar小波响应来确定描述子,其描述子大小只有64维(也可以扩展到128维,效果更好),是一种非常优秀的兴趣点检测算法。我的博客里面有SURF的算法详解,欢迎相互交流>_<
406 H. Bay, T. Tuytelaars, and L. Van gool coined Harris-Laplace and Hessian-Laplace Il. They used a(scale-adapted) Harris measure or the determinant of the hessian matrix to select the location and the Laplacian to select the scale. Focusing on speed, Lowe12 approxi mated the Laplacian of Gaussian (Log) by a Difference of Gaussians(DoG filter Several other scale-invariant interest point detectors have been proposed. Ex amples are the salient region detector proposed by Kadir and Brady [13, which maximises the entropy within the region, and the edge-based region detector pro- posed by Jurie et al. 14. They seem less amenable to acceleration thougH several affine-invariant feature detectors have been proposed that can cope with longer viewpoint changes. However, these fall outside the scope of this paper By studying the existing detectors and from published comparisons 15 18 we can conclude that(1)Hessian-based detectors are more stable and repeat able than their Harris-based counterparts. Using the determinant of the He essman matrix rather than its trace(the Laplacian) seems advantageous, as it fires less on elongated, ill-localised structures. Also,(2)approximations like the Dog can bring speed at a low cost in terms of lost accuracy Feature Descriptors. An even larger variety of feature descriptors has been proposed, like Gaussian derivatives 16, moment invariants [17, comple tures1819, steerable filters 20, phase-based local features 21, and descrip- tors representing the distribution of smaller-scale features within the interest point neighbourhood. The latter, introduced by Lowe 2, have beer outperform the others 7. This can be explained by the fact that they capture a substantial amount of information about the spatial intensity patterns, while t the same time being robust to small deformations or localisation errors. The descriptor in 2, called SIFT for short, computes a histogram of local oriented gradients around the interest point and stores the bins in a 128-dimensional vector(8 orientation bins for each of the 4 x 4 location bins) Various refinements on this basic scheme have been proposed Ke and suk- thankar 4 applied PCA on the gradient image This PCA-SIFT yields a 36 dimensional descriptor which is fast for matching, but proved to be less distinc tive than SIFT in a second comparative study by Mikolajczyk et al. 8 and slower feature computation reduces the effect of fast matching. In the same paper 8 the authors have proposed a variant of sifT, called gloh, which proved to be even more distinctive with the same number of dimensions. However GLOh is computationally more expensive The sift descriptor still seems to be the most appealing descriptor for prac- tical uses, and hence also the most widely used nowadays. It is distinctive and relatively fast, which is crucial for on-line applications. Recently, Se et al.22 implemented SIFT on a Field Programmable Gate Array(FPga) and improved its speed by an order of magnitude. However, the high dimensionality of the de scriptor is a draw back of sift at the matching step. For on-line applications on a regular PC, each one of the three steps(detection, description, matching should be faster still. Lowe proposed a best-bin-first alternative 2 in order to speed up the matching step, but this results in lower accuracy SURF: Speeded Up Robust Features 407 Our approach. In this paper, we propose a novel detector-descriptor scheme coined SURF(Speeded-Up Robust Features ). The detector is based on the Hes sian matrix 111, but uses a very basic approximation, just as Dog 2 is a very basic Laplacian-based detector. It relies on integral images to reduce the computation time and we therefore call it the 'Fast-Hessian'detector. The de scriptor. on the other hand. describes a distribution of Haar-wavelet responses within the interest point neighbourhood. Again, we exploit integral images for speed. Moreover, only 64 dimensions are used, reducing the time for feature com putation and matching, and increasing simultaneously the robustness. We also present a new indexing step based on the sign of the Laplacian, which increases not only the matching speed, but also the robustness of the descriptor In order to make the paper more self-contained, we succinctly discuss the con- cept of integral images, as defined by 23. They allow for the fast implementation of box type convolution filters. The entry of an integral image I>(x) at a location x=(, y) represents the sum of all pixels in the input image I of a rectangular region formed by the point x and the origin,lx(x)=∑≤0∑01(,j).wth I> calculated, it only takes four additions to calculate the sum of the intensities over any upright, rectangular area, independent of its size 3 Fast-Hessian Detector We base our detector on the Hessian matrix because of its good performance in computation time and accuracy. However, rather than using a different measure for selecting the location and the scale (as was done in the Hessian-Laplace detector 11), we rely on the determinant of the Hessian for both. Given a point , y) in an image I, the Hessian matrix H(x, o)in x at scale o is defined as follows X, o)LayX,o Lay(x, o)Lyy(x,0 where Laa (, o) is the convolution of the Gaussian second order derivative aarzg(o)with the image I in point x, and similarly for Lay(x, o) and Lyy(x, 0) Gaussians are optimal for scale-space analysis, as shown in 24. In practice however, the Gaussian needs to be discretised and cropped(Fig. Ileft half), and even with gaussian filters aliasing still occurs as soon as the resulting images are sub-sampled. Also, the property that no new structures can appear while going to lower resolutions may have been proven in the 1D case, but is known to not apply in the relevant 2D case 25. Hence, the importance of the Gaussian seems to have been somewhat overrated in this regard, and here we test a simpler alternative As gaussian filters are non-ideal in any case, and given Lowe's success with LoG approximations, we push the approximation even further with box filters(Fig. I right half). These approximate second order Gaussian derivatives, and can be evaluated very fast using integral images, independently of size. As shown in the results section, the performance is comparable to the one using the discretised and cropped gaussians 408 H. Bay, T. Tuytelaars, and L. Van gool Fig 1. Left to right: The(discretised and cropped)Gaussian second order partial derivatives in y-direction and y-direction, and our approximations thereof using bo filters. The grey regions are equal to zero The 9 x9 box filters in Fig. Iare approximations for Gaussian second order derivatives with o= 1.2 and represent our lowest scale (i.e. highest spatial resolution. We denote our approximations by Drr, D. and Dra. The weights applied to the rectangular regions are kept simple for computational efficiency, but we need to further balance the relative weights in the expression for the Hessian's determinant with av(1)P Da(9)F=0.912.0.9, where aFis the Frobenius norm. This yields det(h approx Dra Dyy -(0.9Dxy Furthermore, the filter responses are normalised with respect to the mask size This guarantees a constant Frobenius norm for any filter size Scale spaces are usually implemented as image pyramids. The images are repeatedly smoothed with a gaussian and subsequently sub-sampled in order to achieve a higher level of the pyramid. Due to the use of box filters and integral images, we do not have to iteratively apply the same filter to the output of a previously filtered layer, but instead can apply such filters of any size at exactly the same speed directly on the original image, and even in parallel(although the latter is not exploited here). Therefore, the scale space is analysed by up-scaling the filter size rather than iteratively reducing the image size. The output of the above 9 x9 filter is considered as the initial scale layer, to which we will refer as scale s= 1.2(corresponding to Gaussian derivatives with o= 1.2). The following layers are obtained by filtering the image with gradually bigger masks, taking into account the discrete nature of integral images and the specific structure of our filters. Specifically, this results in filters of size9×9,15×15,21×21,27×27, etc. At larger scales, the step between consecutive filter sizes should also scale accordingly. Hence, for each new octave, the filter size increase is doubled(going from 6 to 12 to 24 ). Simultaneously, the sampling intervals for the extraction of the interest points can be doubled as well As the ratios of our filter layout remain constant after scaling, the approx imated Gaussian derivatives scale accordingly. Thus, for example, our 27 X 27 filter corresponds to o =3 X 1.2=3.6=s. Furthermore, as the Frobenius norm remains constant for our filters, they are already scale normalised 26 In order to localise interest points in the image and over scales, a non maximum suppression in a 3 x 3 x 3 neighbourhood is applied. The maxima of the determinant of the Hessian matrix are then interpolated in scale and SURF: Speeded Up robust Features 409 Fig. 2. Left: Detected interest points for a Sunflower field. This kind of scenes shows clearly the nature of the features from Hessian-based detectors. Middle: Haar wavelet ypes used for SURF. Right: Detail of the Graffiti scene showing the size of the de scriptor window at different scales image space with the method proposed by Brown et al. 27. Scale space inter- polation is especially important in our case, as the difference in scale between the first layers of every octave is relatively large. Fig. 2(left )shows an example of the detected interest points using our Fast-Hessian'detector 4 SURF Descriptor he good performance of SiFT compared to other descriptors 8 is remarkable Its mixing of crudely localised information and the distribution of gradient re- lated features seems to yield good distinctive power while fending off the effects of localisation errors in terms of scale or space. Using relative strengths and orientations of gradients reduces the effect of photometric changes The proposed surF descriptor is based on similar properties, with a complex ity stripped down even further. The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point. Then, we construct a square region aligned to the selected orientation and extract the surf descriptor from it. These two steps are now explained in turn. Furthermore, we also propose an upright version of our descriptor(U SURF that is not invariant to image rotation and therefore faster to com pute and better suited for applications where the camera remains more or less horizontal 4.1 Orientation Assignment In order to be invariant to rotation, we identify a reproducible orientation for the interest points. For that purpose, we first calculate the Haar-wavelet responses in and y direction, shown in Fig. 2 and this in a circular neighbourhood of radius 6s around the interest point, with s the scale at which the interest point was detected. Also the sampling step is scale dependent and chosen to be s. In keeping with the rest, also the wavelet responses are computed at that current 410 H. Bay,T. Tuytelaars, and L. Van Gool scale s. Accordingly, at high scales the size of the wavelets is big Therefore, we use again integral images for fast filtering. Only six operations are needed to compute the response in a or y direction at any scale. The side length of the wavelets is 4s Once the wavelet responses are calculated and weighted with a gaussian(o 2.5s) centered at the interest point, the responses are represented as vectors in a space with the horizontal response strength along the abscissa and the vertical response strength along the ordinate. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window covering an angle of 3. The horizontal and vertical responses within the window are summed. The two summed responses then yield a new vector. The longest such vector lends its orientation to the interest point. The size of the sliding window is a parameter, which has been chosen experimentally. Small sizes fire on single dominating wavelet responses, large sizes yield maxima in vector length that are not outspoken. Both result in an unstable orientation of the interest region. Note the U-SurF skips this step 4.2 Descriptor Components For the extraction of the descriptor, the first step consists of constructing a square region centered around the interest point, and oriented along the orienta tion selected in the previous section For the upright version. this transformation is not necessary. The size of this window is 20s. Examples of such square regions are illustrated in Fig. 2 The region is split up regularly into smaller 4x 4 square sub-regions. This keeps important spatial information in. For each sub-region, we compute a few simple features at 5 X5 regularly spaced sample points. For reasons of simplicity, we call dr the Haar wavelet response in horizontal direction and dy the Haar wavelet response in vertical direction(filter size 2s ). "Horizontal"and"vertical"here is defined in relation to the selected interest point orientation. To increase the robustness towards geometric deformations and localisation errors, the responses dr and dy are first weighted with a Gaussian(o=3. 3s) centered at the interest point Then, the wavelet responses d and dy are summed up over each subregion and form a first set of entries to the feature vector. In order to bring in in- formation about the polarity of the intensity changes, we also extract the sum of the absolute values of the responses, dal and d,. Hence, each sub-region has a four-dimensional descriptor vector v for its underlying intensity structure C∑dx,∑dy,∑ld,∑dy|). This results in a descriptor vector for all4×4 sub-regions of length 64. The wavelet responses are invariant to a bias in illumi nation(offset). Invariance to contrast(a scale factor) is achieved by turning the descriptor into a unit vector Fig. B] shows the properties of the descriptor for three distinctively different mage intensity patterns within a subregion. One can imagine combinations of such local intensity patterns, resulting in a distinctive descriptor SURF: Speeded Up robust Features 411 dx ∑d Fig 3. The descriptor entries of a sub-region represent the nature of the underlying intensity pattern. Left: In case of a homogeneous region, all values are relatively low Middle: In presence of frequencies in direction, the value of 2ldxI is high, but all others remain low. If the intensity is gradually increasing in direction, both values ∑dand∑| dxl are high 令SURF SURF DSURF-36 D.SURF-36 808 仝SURF-128 8 0. 8A-SURF-128 米SIFT X GLOH X GLOH 0.6 O PCA-SIFT 20.6+.o PCA-SIFT 302 ::4,岭的04A 0 o…o 0.6 0.6 precisIo 1-precision Fig. 4. The recall vs. (1-precision) graph for different binning methods and two different matching strategies tested on the 'Graffiti sequence(image 1 and 3) with a view change of 30 degrees, compared to the current descriptors. The interest points are computed with our 'Fast Hessian'detector. Note that the interest points are not affine invariant The results are therefore not comparable to the ones in SURF-128 correspond to the extended descriptor. Left: Similarity-threshold-based matching strategy. Right Nearest-neighbour-ratio matching strategy(See section 5 In order to arrive at these surf descriptors, we experimented with fewer and more wavelet features, using d< and da, higher-order wavelets, PCA, median values, average values, etc. From a thorough evaluation, the proposed sets turned out to perform best. We then varied the number of sample points and sub-regions The 4 x 4 sub-region division solution provided the best results Considering finer subdivisions appeared to be less robust and would increase matching times too much. On the other hand, the short descriptor with 3 X 3 subregions(SURF-36 performs worse, but allows for very fast matching and is still quite acceptable in comparison to other descriptors in the literature. Fig. 4 shows only a few of these comparison results(SURF-128 will be explained shortly 412 H. Bay,T. Tuytelaars, and L. Van Gool We also tested an alternative version of the surf descriptor that adds a couple of similar features (SURF-128 ). It again uses the same sums as before but now splits these values up further. The sums of dx and da are computed separately for dy<0 and dy 20. Similarly, the sums of y and y are split up according to the sign of d r, thereby doubling the number of features. The descriptor is more distinctive and not much slower to compute. but slower to match due to its higher dimensionalit In Figure 4 the parameter choices are compared for the standard'Graffiti scene, which is the most challenging of all the scenes in the evaluation set of Mikolajczy k8, as it contains out-of-plane rotation, in-plane rotation as well as brightness changes. The extended descriptor for 4 x 4 subregions (SURF-128) comes out to perform best. Also, SURF performs well and is faster to handle Both outperform the existing state-of-the-art For fast indexing during the matching stage, the sign of the Laplacian (i.e the trace of the Hessian matrix) for the underlying interest point is included Typically, the interest points are found at blob-type structures. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation. This feature is available at no extra computational cost, as it was already computed during the detection phase. In the matching stage, we only compare features if they have the same type of contrast. Hence, this minimal information allows for faster matching and gives a slight increase in performance 5 Experimental Results First, we present results on a standard evaluation set, fot both the detector and the descriptor. Next, we discuss results obtained in a real-life object recognition application. All detectors and descriptors in the comparison are based on the original implementations of authors Standard Evaluation. We tested our detector and descriptor using the image sequences and testing software provided by Mikolajczyk 1. These are images of real textured and structured scenes. Due to space limitations, we cannot show the results on all sequences. For the detector comparison, we selected the two viewpoint changes(Graffiti and Wall), one zoom and rotation(Boat)and lighting changes(Leuven)(see Fig. 6 discussed below). The descriptor evaluations are shown for all sequences except the Bark sequence(see Fig 4 and For the detectors, we use the repeatability score, as described in 9.This indicates how many of the detected interest points are found in both images relative to the lowest total number of interest points found (where only the part of the image that is visible in both images is taken into account) The detector is compared to the difference of Gaussian(Dog) detector by e2, and the Harris- and Hessian-Laplace detectors proposed by Mikola- jczyk15. The number of interest points found is on average very similar for all detectors. This holds for all images, including those from the database used in http://www.robots.ox.ac.uk/-vgg/research/affinE/ SURF: Speeded Up robust Features 413 Table 1. Thresholds, number of detected points and calculation time for the detectors in our comparison.(First image of Graffiti scene, 800 X 640) detecto threshold nb of points comp time(msec) Fast-Hessian 1418 120 Hessian-Laplace 1000 1979 650 Harris-Laplace 2500 1664 1800 DOG default 1520 400 the object recognition experiment, see Table l for an example. As can be seen our Fast-Hessian' detector is more than 3 times faster that DoG and 5 times faster than Hessian-Laplace. At the same time, the repeatability for our detector is comparable(Graffiti, Leuven, Boats) or even better(Wall) than for the com- petitors. Note that the sequences Graffiti and Wall contain out-of-plane rotation resulting in affine deformations, while the detectors in the comparison are only rotation- and scale invariant. Hence, these deformations have to be tackled by the overall robustness of the features The descriptors are evaluated using recall-(1-precision) graphs, as in 4 and 8. For each evaluation, we used the first and the fourth image of the sequence, except for the Graffiti (image 1 and 3) and the Wall scene (image 1 and 5), corresponding to a viewpoint change of 30 and 50 degrees, respectively In figures 4and 7 we compared our SURF descriptor to GLOH, SIFT and PCA SIFT, based on interest points detected with our Fast-Hessian detector. SURF outperformed the other descriptors for almost all the comparisons. In Fig 4] we compared the results using two different matching techniques, one based on the similarity threshold and one based on the nearest neighbour ratio(see 8 for a discussion on these techniques ). This has an effect on the ranking of the descriptors, yet SURF performed best in both cases. Due to space limitations, only results on similarity threshold based matching are shown in Fig. 7 as this technique is better suited to represent the distribution of the descriptor in its feature space 8 and it is in more general use The SurF descriptor outperforms the other descriptors in a systematic and significant way, with sometimes more than 10% improvement in recall for the same level of precision. At the same time, it is fast to compute(see Table 2) The accurate version(SURF-128), presented in section 4 showed slightly ter results than the regular SURF, but is slower to match and therefore less interesting for speed-dependent applications Table 2. Computation times for the joint detector -descriptor implementations, tested on the first image of the graffiti sequence. The thresholds are adapted in order to detect the same number of interest points for all methods. These relative speeds are also representative for other images U-SURFISURF SURF-128SIFT time(ms):2553543911036

...展开详情
试读 14P SURF Speeded Up Robust Features
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    一个资源只可评论一次,评论内容不能少于5个字
    kerrity 求了很久的资源,感谢楼主
    2016-10-09
    回复
    img
    tostq

    关注 私信 TA的资源

    上传资源赚积分,得勋章
    最新推荐
    SURF Speeded Up Robust Features 36积分/C币 立即下载
    1/14
    SURF Speeded Up Robust Features第1页
    SURF Speeded Up Robust Features第2页
    SURF Speeded Up Robust Features第3页
    SURF Speeded Up Robust Features第4页
    SURF Speeded Up Robust Features第5页

    试读已结束,剩余9页未读...

    36积分/C币 立即下载 >