Multi-Scale Categorical Object Recognition Using Contour Fragments

所需积分/C币:9 2012-02-16 11:41:33 4.7MB PDF
收藏 收藏

Multi-Scale Categorical Object Recognition Using Contour Fragments
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE Most similar to our work is that of opelt et al 142,143. Their 'boundary fragment model(BFM) shares much with our earlier work [46]: it uses many fragments of contour arranged in a star constellation learned by boosting and matched with a chamfer distance. Our new work incorporates its advantages of scale invariance. robust detection using mean shift, and reduced supervision(bounding boxes 8 b rather than segmentations ), but there are important differences. We employ a new chamfer distance that Fig. 2. Object model Contour fragments T(black outlines)are treats orientation in a continuous manner. and show arranged about the object centroid (green cross) within the bounding in Section VI-C. I how this leads to improved recog- box b(green). Blue arrows show the expected offsets xf from the centroid, and red circles the spatial uncertainty o. For clarity, onl nition accuracy. Contour fragments are matched in four parts are drawn; in practice, about 100 parts are used local windows relative to the object centroid, rather than across the whole image. The BFM combines several fragments in each weak learner, while our A. Chamfer Matching fragments proved sufficiently discriminative indi- vidually, reducing training expense. Training from The chamfer distance function, originally pro- a sparse set of image locations(Figure 7) results posed in [6], measures the similarity of two con- tours it is a smooth measure with considerable tol in further efficiency. We model scale as an extra dimension in the mean shift mode detection rather crance to noise and misalignment in position. scalc than combining object detections from individual and rotation, and hence very suitable for matching our locally rigid contour fragments to noisy edge scales post-hoc. Subsequent work [43 showed how to share contour fragments between classes, similar maps. It has alrcady proven capable of and cfficient to 50 We compare against these techniques in at recognizing whole object outlines(e. g. [28],[35] Section vI-C.9 [49]), and here we extend it for use in a multi-scale parts-bascd categorical recognition model In its most basic form chamfer distance takes two IL. OBJECT MODEL sets of edgels (edge points), a template t and an edge map E, and evaluates the asymmetric distance As motivated in the introduction, we use a parts- for 2D relative translation x as based object model, shown in Figure 2. We employ a star constellation in which the parts are arranged min(xt+x)-xe2,(1) spatially about a single fiducial point, the object xt∈T centroid. Each training image contains a number of objects, each of which is labeled with a bounding T, and where T denotes the number of edgels in template box b=(btl, bbr)that implicitly defines this cen-T, and I 2 the l2 norm. The chamfer distance troidx=,(btl+ bbr) and also the object scale thus gives the mean distance of edgels in t to their closest edgels in E. For clarity, we will omit the area (6). The object model is defined at scale s= l, and parts derived from objects in superscript(T, E)below where possible images are scale-normalized to this canonical scale The distance is efficiently computed via the dis Each scale-normalized part F=(T, xf, o)is a tance transform(dT) which gives the distances of contour fragment T with expected offset x fr the closest points in E rom the centroid, and spatial uncertainty a DTE(x)=min‖x-x。2 (2) xe∈E II CONTOUR FRAGMENTS and hence the min operation in (1) becomes a Simple look-up This section defines our novel formulation of chamfer matching, before showing how a class DTE(X+X (3) specific codebook of contour fragments is learned xt∈T IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE We also compute the argument distance transform (ADT) which gives the locations of the closest points p(x1) E n e. ADTE(X= arg min x-Xel2 xc∈F φ(x2) The exact Euclidean dT and Adt can be computed simultaneously in linear time [21] X2 It is standard practice to truncate the distance transform to a valuet DTE(x)= min(DTE(x), T) (5)Fig 3. Oriented chamfer matching. For edgel x1 in template T, from xI to the nearest edgel x2 in edge map E, and the difference so that missing edgels due to noisy edge detection between the edgel gradients at these points, o(x1)-(x2) do not have too severe an effect additionally it allows normalization to a standard range [0,1 with orientation specificity parameter A. As we shall cham.T(X)= ∑DTE(x1+x).(6) see below, A is learned for each contour fragment x∈T separately, giving improved discrimination power 1)Edge orientation: Additional robustness is compared with a shared, constant A. The terms in obtained by exploiting edge orientation informa-(8)are illustrated in Figure 3. Note that ocM is con tion. This cue alleviates problems caused by clutter siderably more storage efficient than using discrete edgels which are unlikely to align in both orienta- orientation channels. In Section VI-C.l, we show tion and position. One popular extension to basic that the continuous use of orientation information chamfer matching is to divide the edge map and in oCM gives considerably improved performance template into discrete orientation channels and sum compared with 8-channel chamfer matching and the individual chamfer scores [49]. However, it is Hausdorff matching [30(essentially (1)with the not clear how many channels to use, nor how to summation replaced by a maximization) avoid artifacts at the channel boundaries 2) Matching at multiple scales: We extend OCM Building on [41], we instead augment the robust to multiple scales by simply rescaling the templates chamfer distance (6) with a continuous and explicit T. Treating T as now a set of scale-normalized cost for orientation mismatch, given by the mean edgels, to perform OCM at scale s between T and difference in orientation between edgels in template the original unscaled edge map E, we use the scaled T and the nearest edgels in edge map E edgel set sT=sxt s t. xt E T and calculate ∑|o(x)-(ADTB(xt+x) (T,E) T,E) (9) The function o(x)gives the orientation of edgelx rounding scaled edgel positions to the nearest inte modulo T, and (x1)-(x2)l gives the smallest g circular difference between (x1) and (x2). Edgels 3) Approximate chamfer matching: For cffi are taken modulo because, for edgels on the ciency, one does not need to perform the complete outline of an object, the sign of the edgel gradient sums over template edgels in(6)and(7).Each is not a reliable signal as it depends on the intensity sum represents an cmpirical average, and so one of the background. The normalization by t ensures can sum over only a fraction of the edgels, adjust- that orient(x)∈[0.,1 ing the normalization accordingly. This provides a Our improved matching scheme, called oriented good approximation to the true chamfer distance chamfer matching(OCm), uses a simple linear inter function in considerably reduced time. In practice, polation between the distance and orientation terms even matching only 20% of edgels gave no decrease in detection performance, as demonstrated in Sec 入)·dham,7(x)+入. orient(x),(8) tion vi-c IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE B. Building a Codebook of Contour fragments We nced now a 'codebook,, a sct of represen tative contour fragments and there is a choice in their class-specificity. One could use completely gencric fragments such as lincs, corners, and t- junctions and hope that in combination they can be made discriminative [25]. Instead, we create a class-spccific codebook so that, for instance, the class horse results in, among others, head,,back, and forelegs' fragments, as illustrated in Figure 6 Even individually, these fragments can bc indicative of object presence in an image, and in combination Fig. 4. Initial set of contour fragments. Examples of contor will prove very powerful for object detection fragments extracted at random from the edge maps of horse images The outline of our codebook learning algorithm The +s represent the fragment origins, i. e. vectors(0,0)in(10) is as follows. We start with a large initial set of Many fragments are noisy, and so we apply a clustering step to find the cleaner fragments fragments, randomly chosen from edge maps. These are then clustered based on appearance. Finally, each cluster is subdivided to find fragments that Finally, to ensure the initial set of contour frag agree in centroid position. The resulting sub-clusters ments covers the possible appearances of an object, form the codebook a small random transformation is applied to each The initial set of fragments is generated thus. A fragment. Several differently perturbed but other rectangle r=(rtl, rbr)enclosed within bounding wise similar fragments are likely to result, given box b of a random object is chosen, uniformly at the large number of fragments extracted random. We define vector xfs(cen as the 1) Fragment clustering: Figure 4 shows example scale-normalized vector from the object centroid x fragments extracted at random. While many frag to the rectangle center rcen=D(rt+rbr). Let Er= ments are quite noisy, some fragments are uncut iri denote the set of absolute image positions of tered, due to particular clean training images and edgels within rectangle r. The template T used in the use of random edge thresholds. a clustering oCM is then step is therefore employed with the intuition that these uncluttered fragments should lie at the cluster cen S t xr E Er(.(10)centers To this end, all pairs Ti and T, of fragments in To remove overly generic fragments such as small the initial set are compared in a symmetric fashion straight lines, fragments with edgel density E as follows below a threshold m are immediately discarded (S Ti,S,T, (0)+ (si Tj, Sili) Fragments with edgel density above a threshold n2 are also discarded. since these are likely to contain scaling the fragments(first both to Sj, then both to many background clutter edgels and even if not. will Si) and comparing at zero relative offset Clustering be expensive to match. Edgel scts Er arc computcd is performed on distances i i' using the k-medoids as Er=i ix e Cs.t. x E rand VIllx t]. algorithm, the analogue of k-mcans for non-mctric This equation uses the image gradient VI at the spaces. For the experiments in this paper, a constant sct of cdge points C, given by the Canny non- A=0.4 was used for clustering, chosen to maxi- maximal suppression algorithm. Rather than fix an mize the difference betwcen histograms of distances arbitrary threshold t, we choose a random l for each fragment (uniformly, within the central 50% of the The following transformations are chosen uniformly at random a scaling log s∈l- log su, log sr and rotation 6∈l-9,b-」 about range [minx VIx, maxx VIx), so that at least the fragment center is applied to the edgels, and the vector xr some initial frag ments are relatively clutter-free. As translated (by x E l-r, arl and y el-ar Erp and rotated(by E-pro) about the object centroid. As we showed in [46] we shall see shortly, the clustering step then picks these transformations are crucial to ensure good performance, due to out these cleaner fragments to use as exemplars the limited training data and the use of rigid templates IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 人 X xx 大 Fig. 5. Clustering on appearance only. Four example clusters that have low mutual chamfer distances(11), with(left) the cluster exemplar and (right) the votes(small Xs)of all members for vector f from the object centroid (+) Observe(top left)a 'legs' cluster has resulted in two modes (front and hind) in the voting space. On the bottom row, we see that(left)a very class specific"head'cluster (a) (b)(c has highly consistent votes, whereas (right)a background cluster has uniformly scattered votes. To produce a unique centroid vote and Fig. 6. Clustering on appearance and centroid location. Example remove background fragments, a sub-clustering step is performed sub-clusters that have low mutual chamfer distances (ll) and agree on centroid location. From top to bottom: front legs, back',neck and head.(a) Example members of the sub-cluster. (b) Exemplar d a i for within-cluster and between -cluster fragment contour fragments (centers of the sub-clusters).(c)Votes(Xs) fror the centroid (+) with their mean xf (+ and radial uncertainty o (red pairs. circle). Note that we obtain uncluttered, class-specific exemplars, with Examplc fragment clusters arc shown in Figurc 5. an accurate estimate of location and uncertainty relative to the object Clusters contain relatively uncluttered contour frag- centroi ments of similar appearance. However, this purel appearance-based clustering does not take the vec- clusters to use as the parts F Xf: 0)in the tors xr from the object centroid into account. We model desire each fragment to give a unique estimate of the The clustering step is somewhat similar to that object centroid, and so split cach cluster into sub- used in [331, except that we cluster contour frag clusters which agree on xf. Each fragment casts a ments rather than image patches, and each resulting vote for the object centroid and modes in the voting sub-clustcr has only onc particular location rcla space arc found using mcan shift modc estimation tive to the centroid. Also observe that we have [14]. Each mode defines a sub-cluster, containing all taken a rather unconstrained approach to choosing Fragments within a certain radius. To ensure high contour fragments. Research from psychology [171 those with a sufficient analyzed theories of how to split outline contours number of fragments are kept (for our experiments, into fragments for optimal recognition by humans five fragments were required ). Mode detection Is for example at points of extremal curvature. It would iterated for unassigned fragments until no new sub be interesting future work to investigate such ideas clusters are generated applied in a computer-based system Contour fragments within each sub-cluster now agree both in appcarance(11) and location x rel- ative to the object centroid, shown in Figure 6 IV. OBJECT DETECTION From noisy edge maps, our algorithm has selected In this section, we describe how contour exem- uncluttered and class specific fragments, sincc ran- plars are combined in a boosted sliding window don background fragments are highly unlikely to classifier. Parts are matched to an edge map us agree in position as well as appearance. Within ing oCM with priors on their spatial layout. The each sub-cluster, the central fragment T with lowest classifier is evaluated across the scale-space of the average distance to the other fragments is used as image, and mean shift produces a final set of an exemplar, together with the mean X and radial confidence-valued object detections. The only image variance o of the centroid votes xf(cf. Figure 2). information used by the detector is the edge map e, We show below how boosting selects particular sub- computed using the Canny edge detector [12] IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE For an object centroid hypothesis with location 1) Classifier: We employ a boosted classifier to x and scale s, part F is expected to match the compute probabilities P(obj(x,si/. This combines edge map E at position x=x+ sxf, with spatial part responses v(14)for parts F1 uncertainty so. The chamfer distance is weighted M with a cost that increases away from the expected H(x, s)=>am(vFm, Am(x,s)>0ml+bm,(15) gives a degree of spatial flexibility, allowing parts to snap into place. The location of the minimum where is the zero-one indicator, and(A,a, b, 0) is given b are learned parameters (see ahead to Section V) Each term in the sum corresponds to a part in the TE arg min d (x,s)+(‖x-xl model, and is a decision stump which assigns a (12) weak confidence value according to the comparison where wo(m)is the radially symmetric spatial of part response UFm, Am to threshold om. The weak weighting function for which we usc the quadratic decision stump confidences are summed to produce a strong confidence H, which is then interpreted as 2ifl≤a (x)= (13) a probability using the logistic transformation [27] ∞ o otherwise. P(obj(x, s))=[1+exp(H(x, s))1- (16) The part response v at centroid hypothesis(x, s)is 2)Mode detection: We employ the powerful defined as the chamfer distance at the best match technique of mean shift mode estimation [14]on le hypothesized locations(x, s)E&, weighted T, E) UF,(x,8)= X (14) by their scaled posterior probabilities s P(obj similarly to [34]. Multiplying by s compensates for and this is used in the classifier, described next. the proportionally less dense hypotheses at larger scales. The algorithm models the non-parametric A. Detecting objects distribution over the hypothesis space with the ker- nel density estimator Sliding window classification [4], [25], [52] is simple, effective technique for object detection. P(x,s)o ∑sP A probability P(obj/x s) of object presence at lo (x;,s1)∈ cation(x, s) is calculated across scale-Space using cy- log s-log a boosted classifier which combines multiple part (17) responses v(14). These probabilities are far from in- dependent for example the presence of two distinct where gaussian kernel K uses bandwidths hx, hy neighboring detections is highly unlikely. Hence a and hs for the x, y, and scale dimensions respec mode detection step selects local maxima as the tively (the scale dimension is linearized by taking final set of detections logarithms). Mean shift efficiently locates modes One must choose a set 1 of centroid scale-space (local maxima) of the distribution which are used location hypotheses, sampled frequently enough to as the final set of detections. The density estimate allow detection of all objects present. We use a at each mode is used as a confidence value for the fixed number of test scales, equally spaced log- detection arithmically to cover the range of scales in the training data. Space is sampled over a regular grid V. LEARNING with spacing sigrid for constant Grid(optimized We describe in this section how the classifier h by holdout validation). Increasing the spacing with (15) is learned using the Gentle Adaboost algorithm scale is possible since the search window in(12)is [27]. This takes as input a set of training examples i, proportionally enlarged. each consisting of feature vector fi paired with target 2The hard cut-off at o limits the search range and thus improves value zi=+l, and iteratively builds the classifier efficiency. In practice, increasing the cut-off radius did not appear to For our purposes, training example i represents improve performance location(xi, Si) in one of the training images. The IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE target value zi specifies the presence (zi= +l)or absence (Mi =-1)of the object class. The feature eo⑥ eeee e vector f contains the responses UT, X(xi, si)( 14) e for all codebook entries f. and all ocm orientation S G specificities A from a fixed set A. a given dimension d in the feature vector therefore encodes a pair (a) (F, A). The decision stump parameters a, b, and 0 are learned as described in [50 ig.7. Training examples(a)A pattern of positive(@) and We are free to choose the number. locations. and negative(e) examples are arranged about the true object centroid target values of the training examples. One could the central, larger e). The positive and negative examples are spaced on grids of size d1 and s2 respectively, scaled by the ground-truth densely sample each training image, computing object scale s. The boosting algorithm trains from feature vectors of feature vectors for examples at every point on a part responses(14)computed at these examples. (b )For images with grid in scale-space. This is however unnecessaril no objects present, all negative copies of the same pattern are placed at a number of random scale-space locations. For clarity, only one inefficient because the minimization in(12)means scale is 0 is shown(see text) that neighboring locations often have near identical feature vectors Instead, we use the sparse pattern of examples at random however once a detector is learned shown in Figure 7. For a training object at location from these examples, a retraining step is used to (x, s), positive examples are taken at the 3x3x3 boot-strap the set of training examples [54 4].We scaled grid locations x'=X+(2xS'51 2ys S1)T for evaluate the detector on the training images,and scales s'= si, where(2x, y, is)E-1,, +1 record any false positives or negatives(see ahead The grid is spaced by d(scale-normalized)and to Section VI-A). The classifier is then retrained scaled by 1. The positive examples ensure a strong on the original example set, augmented with new classification response near the true centroid, wide negative examples at the locations of false positives enough that the sliding window classifier need not [16], and duplicate positive examples to correct the be evaluated at every pixel. To ensure the response false negatives. We demonstrate in Section VI-C.2 is localized, negative examples are taken at positions that this procedure allows more parts to be learne x+(Zxs or scales Sy2, with a without over-fitting. larger spacing d2>d1 and scaling 2>1, and the same(2x, ay, is)but now excluding(0,0, 0). This B. Retraining on Test Data particular pattern results in a total of 53 examples The same idea can be put to work on the test data, for each object, which is vastly less than the total if one assigns a degree of trust to the output of the number of scale-space locations in the image. For classifier. One can take a fixed proportion S(e.g training images not containing an object, we create 5= 10%)of detections with strongest confidence all negative examples in the same pattern, at a and assume these are correct, positive detections, number of random scale-space locations and the same proportion of detections with weakest Feature vectors are pre-computed for all exam- confidence and assume there are no objects present ples, usually taking less than an hour on a modern at those locations. The he boosted classifier is retrained machine. Boosting itself is then very quick, taking with the new positive and negative training exam typically less than a minute to converge, since the ples further augmenting the training set weak learners are individually ly quite powerful.A cascade [52] is also learned which resulted in a five VⅠ. EVALUATION fold reduction in the average number of response Wc present a thorough cvaluation of thc clas calculations at test time sification and detection performance of our tech nique on several challenging datasets, investigating different aspects of our system individually, and A. Retraining on Training data comparing against other state-of-the-art methods It is unclear how to place the sparse negative The standard experimental procedure is detailed in training examples optimally throughout the train- Section VI-a, the description of the datasets in ing images, and hence they are initially placed Section VI-B, and the results in Section VI-C IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE A. Procedure inally viewed side-on, considerable out-of-plane ro The image datasets are split into training and test tation is evident. We paired this with the difficult sets. Each model is learned from the training Set Caltech 101 background set [3],[19]. While these only ground-truth bounding boxes provided. images have different textural characteristics, they At test time, the bounding boxes are used only for contain many clutter edges that pose a hard chal evaluating accuracy lenge to our contour-only detector. Images were Mode detection results in a set of centroid hy- down-sampled to a maximum dimension of 320 potheses and con fidences of object presence at these pixels where necessary. The resulting objects have a points. We assign a scaled bounding box centered scale range of roughly 2.5x from smallest to largest on each detection, with aspect-ratio proportional The hrst 50 horse and background images were to that of the average training bounding box. For used for training, the next 50 for holdout validatio a detection to be marked as correct its in Ferred and the final 228 as the test set. We also compare bounding box binf must agree with the ground truth against our earlier work 146] using a single-scale bounding box bgt based on an overlap criterion as horse database. The datasets are available at [l] area(bnf∩bst) area(binf Ubgt. 0.5(from [2]. Each bgt can match 2)Graz 17: We compare against [43] on their 17 to only one binf, and so spurious etections of the class database (listed in Table II). As closely as pos same object count as false positives. For image sible, we use the same training and test sets. Images classification, we use the confidence of the single are down-sampled to a maximum dimension of 320 most confident detection within each image pixels. For some classes, the resulting scale range is The receiver operating characteristic(Roc)curve more than 5x. We test each class individually, paired is used to measure classification accuracy. This plots with an equal number of background test images the trade-off between false positives and false nega tives as a global confidence threshold is varied. The C Results equal error rate (EEr) gives an easily interpretable accuracy measure, while the area under the curve 1) Matching measures: First, we compare the (AUC) takes the whole curve into account and so performance of the object detector using several dif- gives a better measure for comparison purposes ferent matching measures: our proposed OCM with For detection we use two closely related mea- da and with constantλ∈{0,0.5,1},stan sures. The first, the recall-precision(RP)curve, dard 8-channel chamfer matching, and hausdorff plots the trade-off between recall and precision as matching. The experiment was performed against one varies the global threshold. For comparison 100 images in the weizmann test set using 100 with previous work we quote the EER measure parts without retraining(other parameter settings are on the rP curve, but for new results we report specited below) the more representative aUC measure. The second Figure 8 superimposes the RFPPI curves for measure plots recall against the le average number of each matching measure, and the legend reports the false positives per image(RFPPI) as the detection corresponding rp auc statistics. Observe that with threshold is varied [25 The rfppi curve seems no orientation information (=0, identical to 1 more natural than RP for human interpretation since channel, non-oriented chamfer matching), perfor- it is monotonic and stabilizes as more negative im- mance is very poor. The Hausdorff distance also ages are tested(the rp curve can only deteriorate) fails to work well. since it too does not usc orienta However it gives no overall quantitative score, and tion information The 8-channel chamfer matching so the legends in Figures 8 and 11 contain RP auc performs fairly well, but by modeling orientation figures cven though the graphs show RFPPI continuously, our oCM (for x>0) performs as well or better, even if a is kept constant. The RfPPI curve for入=1 appears almost as good B. Datasets as the learned a curve, although the auc numbers 1)Weizmann Horses [10]: This is a challenging confirm that learning a per part is noticeably better set of side-on horse images, containing different However, the extra expense of learning per-part A breeds, colors, and textures, with varied articula- values may mitigate its quantitative advantages in tions, lighting conditions, and scales. While nom- some applications IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1.0 1.00 0.9 0.98 0.96 0.8 0.94 0.7 092 0.6 0.90 0.5 d0.88 INItial detector RP AUC 0.86 士 Retrained training 0.4 Hausdor仟f 0.65l9 -Retrained test e8-channel chamfer 0.7254 0.84 0.3 合OCM(A=0.0) 0.5653 40557085100115130145160175190 0.2 OCM入=05) 07233 umber of parts 0. eOCM(A= 1.0) 0.7658 OOcM( learned) 0.8086 Fig 9. Effect of retraining. Detection performance is graphed as a 0.0 function of number of parts (rounds of boosting The initial detector starts to over-fit as the number of parts is increased above 100 0.2 0.4 0.6 0.8 False Positives per image Retraining prevents this over-fitting allowing an overall performance improvement at the expense of more parts Fig. 8. Detection performance of different contour matching measures. Recall is plotted as a function of the number of false positives per image averaged over the Weizmann test subset. The every fifth edgel( scan-line order) in each fragment, best performance is obtained by our OCM technique with learned a which gave a commensurate speed improvement parameter, although fixed x-1 also performs well We compared detection performance with and with- out the approximation on the Weizmann validation set, using 100 features. With the approximation, 2)Retraining: As described in Sections V-A 0. RP AUC was achieved. whereas without the and V-B, one can boot-strap the detector by retrain- approximation (matching every edgel) only 0.9417 ing. For this experiment on the Weizmann validation was obtained. We conclude that the approximation set,we recorded the RP AUC against the number can improve speed without degrading detection per- of parts:(i)without retraining, (ii) retraining only formance. The slight improvement in performance on the training data(retrained training in Figures 9 may even be significant, since the variance of the and 11), and (ii) retraining both on the training and training part responses is increased slightly ,which test data(retrained test). The confidence parameter may prevent over-fitting was set to E=10% 4Multi-scale Weizmann horses: We now evalu We can draw several conclusions from the results ate on the full Weizmann dataset, showing example in Figure 9. Adding more parts helps performance detections in Figure 10 and quantitative results in on the test data up to a point, but eventually the Figure 11 detector starts to over-fit to the training data and We draw several conclusions. Firstly, we have generalization decreases. By providing more train- shown that retraining on both the training and ing examples by retraining on the training data, we test sets not only helps generalization, but actu can use more parts without over-fitting. Retraining ally considerably improves performance. Turning on the test data maintains the additional accuracy, to Figure 10. we observe that the detector works and gives a further improvement on the full test set, vcry well on thc challenging horse images,de as described below. With only 40 parts, retraining spite wide within-class variation, considerable back on the test data decreases performance, since the ground clutter and even silhouetting. Missed detec- strongest and weakest detections are not sufficiently tions(falsc negatives)tend to occur when thcrc is reliable. Note that retraining does entail significant significant pose change or out-of-plane rotation be extra effort for a relatively modest performance yond the range for which we would expect our side gair on detector to work. Training explicitly for these 3)Approximate chamfer matching: All results in poses or rotations, perhaps sharing features between our evaluation make use of the approximation of views [50], would allow detection of these objects Section III-A. 3, whereby only a subset of fragment False positives occur when the pattern of clutter edgels are used for chamfer matching. We used only edgels is sufficiently similar to our model, as for

试读 15P Multi-Scale Categorical Object Recognition Using Contour Fragments
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
  • 分享小兵

关注 私信
Multi-Scale Categorical Object Recognition Using Contour Fragments 9积分/C币 立即下载
Multi-Scale Categorical Object Recognition Using Contour Fragments第1页
Multi-Scale Categorical Object Recognition Using Contour Fragments第2页
Multi-Scale Categorical Object Recognition Using Contour Fragments第3页

试读结束, 可继续读2页

9积分/C币 立即下载