Acquisition of Localization Conﬁdence for Accurate Object Detection

Acquisition of Localization Conﬁdence for Accurate Object Detection， 现代基于cnn的目标检测器依赖于包围盒回归和非最大抑制来定位对象。类标签的概率自然反映了分类的可信度，而本土化置信度却是不存在的。这使得适当的本地化包围盒在迭代回归过程中退化，甚至在NMS期间被抑制。在本文中，我们提出了IOU网络学习来预测每个检测到的边界盒与匹配的地面真相之间的IOU。网络获得了定位的可信度，通过保持精确的定域包围盒，进一步改进了nms过程，提出了一种基于优化的包围盒细化方法，该方法将预测的loo描述为在mscoco数据集上进行的有效实验，以及它与几种先进的目标探测器的兼容性和适应性。
Acquisition of localization Confidence for Accurate object Detection 3 procedure.(2)Second, the absence of localization confidence makes the widely adopted bounding box regression less interpretable. As an example, previous works 3 report the nonmonotonicity of iterative bounding box regression. That is, bounding box regression may degenerate the loca ization of input, bounding boxes if applied for multiple times(shown as Figure 1(b)p In this paper we introduce loUNet, which predicts the lou between detected bounding boxes and their corresponding groundtruth boxes, making the networks aware of the localization criterion analog to the classification module. This simpl coefficient provides us with new solutions to the aforementioned problems 1. loU is a nlatural criterion for localization accuracy. We can replace classified tion confidence with the predicted loU as the ranking keyword in NMs. This technique, namely IoUguided NMS, help to eliminate the suppression failure caused by the misleading classification confidences 2. We present an optimizationbased hounding box refinement. pr par with the traditional regressionbased methods. during the inference, the predicted loU is used as the optimization objective, as well as an interpretable indicator of the localization confidence. The proposed Precise RoI Pooling ayer enables us to solve the loU optimization by gradient ascent. We show that compared with the regressionbased method, the optimizationbased bounding box refinement empirically provides a IlOnotoniic inprovement Ol the localization accuracy. The method is fully compatible with and can be integrated into various CNNbased detectors 163 10 2 Delving into object localization First of all, we explore two draw backs in object localization: the misalignment between classification confidence and localization accuracy and the nonmonotonic bounding box regression. A standard FPN [16 detector is trained on MSCoCO trainval35k as the baseline and tested on minimal for the stud 2.1 Misaligned classification and localization accuracy With the ob jective to remove duplicated bounding boxes, NMs has been an indispensable component in most object detectors since 4. NMS works in an iterative manner. At each iteration, the bounding box with the maximum lassification confidence is selected anld its neighboring boxes are eliminated usill a predefined overlapping threshold. In SoftNMS 2 algorithm, box elimination is replaced by the dccrcmcnt of confidence, leading to a highcr rccall. Rccontly, a sct of learningbased algorithms have been proposed as alternatives to the parameter free NMS and Soft NMS.24 calculates an overlap matrix of all bounding boxes and performs affinity propagation clustering to select exemplars of clusters as the final detection results. 11 proposes the Gossip Net, a postprocessing network trained for NMS based on bounding boxes and the classification confidence. 12 proposes an endtoend network learning the relation between detected bounding B. Jiang, R. Luo, J. Mao, T. Xiao, and Y Jiang loU with groundtruth box loU with groundtruth box (a) IoU vs. Classification Confidence (b)IoU vs. Localization Confidence Fig 2: The correlation between the lou of bounding boxes with the matched groundtruth and the classification/ localizatiOn confidence. Considering detected bounding boxes having an lot (>0.5)with the corresponding groundtruth, the Pcarson corrclation cocfficicnts arc:(a)0.217, and(b)0.617 (a) The classification confidence indicates the category of a bounding box, but cannot, be interpreted as the localization accuracy (b) To resolve the issue. we propose loUNet to predict the localization confidence or each detected bounding box, i. e, its IoU with corresponding groundtruth o boxes. Ilowever, these parameterbased methods require more computational resources which limits their realworld application In the widelyadopted nms approach, the classification confidence is used fo ranking bounding boxes, which can be problematic. We visualize the distribution of classification confideNCes of all detected bounding boxes before NMS, as showll in Figure 2(a) The xaxis is the IoU between the detected box and its matched groundtruth, whilc the yaxis denotes its classification confidence. Thc Pcarson correlation coefficient indicates that the localization accuracy is not well correlated with the classification confidence We attribute this to the objective used by most of the cnnbased object detectors in distinguishing foreground(positive)samples from background (neg ative) samples. a detected bounding box bocdet is considered positive during training if its loU with one of the groundtruth bounding box is greater than a threshold Strain. This objective can be misaligned with the localization accu racy. Figure1(a) shows cases where bounding boxes having higher classification confidence have poorer localization Rccall that in traditional NMs, when there exists duplicated dctcctions for a single object, the bounding box with maximum classification confidence will be preserved. However due to the misalignment, the bounding box with bette localization will probably get suppressed during the Nms, leading to the poor localization of objects. Figure 3 quantitatively shows the number of positive bounding boxes after NMS. The bounding boxes are grouped by their loU with the matched groundtruth. For multiple detections matched with the same Acquisition of localization Confidence for Accurate object Detection Fig. 3: The number of positive bound 14000 fter the Nms 12000 oUGuided NMs their lou with the matched ground L0000 truth. In traditional nms(blue bar),a NoNMS significant portion of accurately local ized bounding boxes get mistakenly sup pressed due to the misalignment of clas sification confidence and localization ac 2000 curacy, while IoUguided NMS(yellow bar)preserves morc accurately localized loU with groundtruth box groundtruth, only the one with the highest loU is considered positive. Therefore, ONMS could be considered as the upperbound for the number of positive bounding boxes We can see that the absence of localization confidence makes morc than half of detected bounding boxes with IoU >0.9 bcing suppressed in the traditional nms procedure, which degrades the localization quality of the detection results 2.2 Nonmonotonic bounding box regression In general, single object localization can be classified into two categories: bound ing boxbased methods and segmentbased methods. The segmentbased meth 92013 10 aim to generate a pixellevel segment for each instance but, inevitably require additional segmentation annotation. This work focuses on the bounding boxbased methods Single object localization is usually formulated as a bounding box regression task. The core idea is that a network directly learns to transform(i.e,, scale or shift)a bounding box to its designated target. In 9 8 linear regression or fullycOllnlected layer is applied to refine the localization of object proposals generated by external preprocessing modules(e.g, Selective Search 28 or EdgeBoxes 33). Faster RCNN 23 proposes region proposal network(RPN)in which only predefined anchors are used to train an endtoend object detector 14 32 utilize anchorfree, fullyconvolutional networks to handle object scale variation. Meanwhile, Repulsion Loss is proposed in 29 to robustly detect objects with crowd occlusion. Due to its effectiveness and simplicity, bounding box regression has beconme all essential conmponent ill Inost CNNbased detectors a broad set of downstream applications such as tracking and recognition will bencfit from accurately localized bounding boxes. This raises the demand for improving localization accuracy. In a series of object detectors 317621 refined boxes will be fed to the bounding box regressor again and go through the refinement for another time. This procedure is performed for several times namely iterative bounding box regression. Faster Rcnn 23 first performs the bounding box regression twice to transform predefined anchors into final detected bounding boxes. 15 proposes a group recursive learning approach to iteratively B. Jiang, R. Luo, J. Mao, T. Xiao, and Y Jiang 042 esed 0.375 0.37 egression Based 0.360 Iteration Times Iteration Times (a FPN (b)Cascade RCNN Fig 4: Optimizationbased v.s. Regressionbased BBox refinement(a) Compari son in FPN. When applying the regression iteratively, the AP of detection results firstly get improved but drops quickly in later iterations.(b)Camparison in Cascadc rcnn. Iteration 0, 1 and 2 rcprcscnts the lst, 2nd and 3rd regression stages ill Cascade Rcnn. For iteration i>3, we refine the boulding boxes witll the regressor of the third st age. After multiple iteration, AP slightly drops, while the optimizationbased method further improves the Ap by 0. 8% refine detection results and minimize the offsets between object proposals and the groundtruth considcring the global dependency among multiplc proposals GCNN is proposed in 18 which starts with a multiscale regular grid over the image and iteratively pushes the boxes in the grid towards the groundtrut h However, as reported in 3, applying bounding box regression more than twice brings no further improvement. 3 attribute this to the distribution mismatch in multistep bounding box regression and address it by a resampling strategy in multistage bounding box regression Wc cxpcrimcntally show the performance of itcrativc bounding box regression based onl FPn and Cascade RCNn frameworks. The Average Precision(AP)of the results after each iteration are shown as the blue curves in Figure 4(a)and Fi igure 4(b) respectively. The AP curves in Figure 4 show that the improvement on localization accuracy, as the number of iterations increase, is nonmonotonic for iterative bounding box regression. The nonmonotonicity, together with the non interpretability, brings difficulties in applications. Besides, without localization confidence for detected bounding boxes, we can not have finegrained control over the refinement, such as using an adaptive number of iterations for different bounding boxes 3 oUNet To quantitatively analyze the effectiveness of loU prediction, we first present the methodology adopted for training an IoU predictor in Section 3. 1 In Section 3.2 and Section 3. 3 we show how to use loU predictor for NMS and bounding box Acquisition of localization Confidence for Accurate object Detection Standalone lounet Jittered rols FPN FC FC 1024 1024 lol i Classification 1024 1024 RPN BBReg Fig5: Full architecture of the proposed IoUNet described in Section 3.4 Input images are first fed into an FPn backbone. The lou predictor takes the output features from the Fpn backbone. We replace the rol Pooling layer with a Prrol Pooling layer described in Section 3. 3 The IoU predictor shares a similar structure with the rcnn branch. The modules marked within the dashed box form a standalone lounet refinement, respectively. Finally in Section 3. 4 we integrate the loU predictor into existing object detectors such as FPN 16 3.1 Learning to predict IOU Shown in Figurc5 the IoU predictor takes visual fcaturcs from the FPN and estimates the localization accuracy(lou)for each bounding box. We generate bounding boxes and labels for training the loUNet by augmenting the ground truth, instead of taking proposals from RPNs. Specifically, for all groundtruth bounding boxes in the training set, we manually transform them with a set of randomized parameters, resulting in a candidate bounding box set. We then remove from this candidate set the bounding boxes having an loU less than Strain =0.5 with the matched groundtruth. We uniformly sample training data from this candidate set w.r. t. the loU. This data generation process empirically brings better performance and robustness to the IoUNet. For each bounding box the features are extracted from the output of FPn with the proposed Precise RoI Pooling layer(see Section 3. 3). The features are then fed into a twolayer feed forward network for the loU prediction. For a better performance, we use classaware loU predictors The loU predictor is compatible with most existing Rolbased detectors. The accuracy of a standalone tou predictor can he found in Figure As the training procedure is independent of specific detectors, it is robust to the change of the input distributions(e. g, when cooperates with different detectors). In later sections, we will further demonstrate how this module can be jointly optimized in a full detection pipeline(i.e, jointly with RPNs and RCNN) B. Jiang, R. Luo, J. Mao, T. Xiao, and Y Jiang Algorit hm 1 ToUguided NMs Classification confidence and loca. liza tion confi dence are disentangled in the algorithm. We use the localization confidence(the predicted IoU) to rank all detected bounding boxes, and update the classification confidence based on a clusteringlike rule Input: B=(61, ..,6n,S. I, 3nms B is a set of detected bounding boxes S and T are functions(neural networks) mapping bounding boxes to their classifi cation confidence and loU estimation(localization confidence) respectively us is the nms threshold Output: D, the set of detected bounding boxes with classification scores while B≠edo L.8L9G秒87I bm f arg max I(bi) 4:B←B、{bm s←S(bm) forb,∈Bdo if loU(bm, bi)> Snms then S(b;) B←B\{b} d if end for 12: D+DU(bm, s,1 13: end while 14: ret 3.2 lOUguided NMs We resolve the misalignment between classification confidence and localization accuracy with a novel ToUguided Nms procedure, where the classification confi dence and localization confidence(an estimation of the IoU) are disentangled In short, we use the predicted loU instead of the classification confidence as the ranking keyword for bounding boxes. Analog to the traditional NMs, the box having the highest loU with a groundtruth will be selected to eliminate all other boxes having an overlap greater than a given threshold Snms. To determine the classification scores, when a box i eliminates box j, we update the class fication confidence si of box i by Si= max(Si, Si). This procedure can also be interpreted as a confidence clustering: for a group of bounding boxes matching the same groundtruth, we take the most confident prediction for the class label A psuedocode for this algorithm can be found in Algorithm I TOUguided NMs resolves the misa. ignment between classification confidence and localization accuracy. Quantitative results show that our method outperforms traditional NMS and other variants such as SoftNMS 2. Using IoUguided NMS as the postprocessor further pushes forward the performance of several stateof theart object detectors Acquisition of Localization Confidence for Accurate Object Detection Algorithm 2 Optimizationbased bounding box refinement. npu B is a set of detected bounding boxes, in the form of (r0, 10, 21, y1) f is the feature map of the input image steps. A is the step size, and n21 is rlystop threshold and Q2<0 is an localization degeneration tolerance Function PrPool extracts the feature representation for a given bounding box and function lou denotes the estimation of lou by the loUNet utput t of final detection bounding b 2 for i=1 tot do 3:forb;∈ B and b;≠Ado 4 gra (PrPool(F, biD 5 Preu score <loU(PrPool(, bi)) le(grad, b NewScore< IoU(PrPool(F,bi) if Prev Score New Score< 321 or NewScore Prec Score< Q2 then 9 ←A∪{b} 10: end if 11 nd 12 end for 13: return B 3.3 Bounding box refinement as an optimization procedure The problem of bounding box refinement can formulated mathematically as finding the optimal C=arg min crit(transform(box det: c), bo. tgt) where bo. det is the detected bounding box, boxgt is a(targeting) groundtruth bounding box and transform is a bounding box transformation function taking c as parameter and transform the given bounding box. crit is a criterion measur ing the distance between two bounding boxes. In the original Fast RCNn 5 framework. crit is chosen as an smoothL1 distance of coordinates in logscale while in 32, crit is chosen as the In(IoU) between two bounding boxes Rcgrcssionbascd algorithms directly estimate the optimal solution C* with a feedforward neural network. However, iterative bounding box regression methods are vulnerable to the change in the input distribution 3 and may result in non monotonic localization improvement, as shown in Figure 4 To tackle these issues. we propose an optimizationbased bounding box refinement method utilizing IoUNet as a robust localization accuracy(loU) estimator. Furthermore, IOU estimator can be used as an earlystop condition to implement iterative refinement ptive steps LoUNet directly estimates IoU(bo. det, bo. xgt). While the proposed Precise Rol Pooling layer enables the computation of the gradient of lou w.r.t. boundin B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang 1. RoI Pooling 2. RoI Align 3. PrRol Pooling (x1」,y1」) (x1,y1) (x1,y1) U1,1 (r2,y/2 (x2,y) x2,2 ●●●J● bi) ∑f(an,b)∥N f(a, g)dcdy (x2x1+1)×(y2mn」+1) 21 Fig 6 Illustration of Rol Pooling, Rol Align and PrRol Pooling box coordinated we can directly use gradient ascent method to find the optimal solution to Equation I Shown in Algorithm 2 viewing the estimation of the IoU as an optimization objective, we iteratively refine the bounding box coordinates with the computed gradient and maxiMize the loU between the detected bounding box and its matched groundtruth. Besides, the predicted loU is an interpretable indicator of thc localization confidence on cach bounding box and helps cxplain the performed transformation. In the implementation, shown in Algorithm 2 Line 6, we manually scale up the gradient w.r.t. the coordinates with the size of the bounding box on that axis(e. g, we scale up Vx1 with width(bi)). This is equivalent to perform the optimization in logscaled coordinates (c/w, y/h, log w, log h )as in 5. We also employ a onestep bounding box regression for an initialization of the coordinates Precise RoI Pooling. We introduce Precise Rol Pooling(PrRol Pooling, for short)powering our bounding box refinement F It avoids any quantization of coordinates and has a continuous gra radient on bounding box coordinates. Given the feature map F before Rol/PrRol Pooling(e.g. from Conv4 in ResNet50 let wi; be the feature at one discrete location(i,j) on the feature map. Using bilinear interpolation, the discrete feature map can be considered continuous at any continuous coordinates(m, y) f(a,y)=∑C(m,y,1,)×0 2, whcrc IC( 3, i,j)=ma.(0, 1 i)x mar(0, 1y j) is the interpolation coefficient. Then denote a bin of a Rol as bin=1(1, 91), 2, y2)), where(a1, 31) and(2: 12)are the continuous coordinates of the topleft and bottomright G.We prefer Precise RolPooling layer to RolAlign layer [10 as Precise RoIPooling er is continuously differentiable w r.t. the coordinates while rolAlign is not Thecodeisreleasedat:https://github.com/vacancy/preciseroipooling
 5.60MB
3D Object Proposals using Stereo Imagery for Accurate Object Class Detection
201610133D Object Proposals using Stereo Imagery for Accurate Object Class Detection
 《IoUNet: Acquisition of Localization Confidence for Accurate Object Detection》论文笔记 552201812191. 前言 目前基于CNN的目标检测器是依赖于边界框回归与非极大值抑制去定位目标。但是预测框的分类反映的是分类的置信度，并不能反应定位的置信度，这就会掉之预测框在回归的过程中发生退化或者被NMS抑制。这篇文章提出了：（1）IoU_Net去预测检测框与GT框的IOU，从而使得网络获得定位置信度，（2）这样通过保持定位精度来改进NMS过程。还提出了一种基于优化的边界框优化方法，以预测IoU为目标。该方...
 1.24MB
SingleShot Object Detection with Enriched Semantics.pptx
20190701SingleShot Object Detection with Enriched Semantics
 1.16MB
Transparent and Specular Object Reconstruction(2000)
20150506Because the majority of object acquisition approaches rely on observing light reﬂected off a surface, objects made of materials that exhibit signiﬁcant effects of global light transport or that are...
 1.91MB
自然文本识别
20140505一篇比较好的论文，希望对大家有帮助。好东西大家分享。
 156KB
BPSKlike Methods for HybridSearch Acquisition of Galileo Signals
20130508BOC信号的捕获单双边带性能分析 BPSKlike Methods for HybridSearch Acquisition of Galileo Signals
 451KB
Signal Digitalizing by Undersampling, an Approach for the Data Acquisition of the Interferometer to be Used Aboard the LISA Satellite
20200119Signal Digitalizing by Undersampling, an Approach for the Data Acquisition of the Interferometer to be Used Aboard the LISA Satellite，周围，不来克斯梅尔，We present a method for digitalizing ...
 4.27MB
《21天学通c#》课件
20091016Arguments vs. adjuncts. Discover valid subcategorization frames for each verb. Learning from data not annotated with SF information.
 188KB
A kind of virtual instrument based data acquisition for vehicle test
20120928A kind of virtual instrument based data acquisition for vehicle test
 256KB
acquisition performance analysis for BOC modulated signals
20111220一种对卫星导航信号的BOC调制信号的捕获分析。
 3.50MB
Parallel acquisition of spreading sequences in directsequence spreadspectrum
20181107伊利诺伊大学香槟分校的一篇博士学位论文，详尽分析了并行捕获方法的原理，希望对初学伪码捕获的同学有帮助。
 1.53MB
Learning to rank for blind image quality
20180624as “the quality of image Ia is better than that of image Ib” for training a robust BIQA model. The preference label, representing the relative quality of two images, is generally precise and ...
 12.72MB
英文原版Data Acquisition Using LabVIEW 1st Edition
20190923analyze data, publish results, and distribute systems.This handson tutorial guide helps you harness the power of LabVIEW for data acquisition. This book begins with a quick introduction to LabVIEW, ...
 14.19MB
Practical data acquisition for instrumentation and control systems
20181210book of Practical data acquisition for instrumentation and control systems
 2.73MB
Domain Adaptation for Visual Recognition
20190107With the availability of a multitude of image acquisition sen sors, variations due to illumination, and viewpoint among others, com puter vision applications present a very natural test bed for ...
 6.76MB
2019 AnalogandAlgorithmAssisted Ultralow Power Biosignal Acquisition Systems
20190907This book discusses the design and implementation aspects of ultralow power biosignal acquisition platforms that exploit analogassisted and algorithmic approaches for power savings.The authors ...
 1.87MB
deep learning
20200320with the rapid progress of computing hardware and image acquisition equipment, the deep learningbased data processing approach offers a new channel for excavating the massive data from an SHM system, ...
 8.72MB
BOC(x,y) signal acquisition techniques and performance
20180308in the future GNSS signals will open the field to enhanced navigation performances and spectrum compatibility: GALILEO signals plan proposes to use BOC modulation for Open Service (OS), Safety of Life...
 585KB
NI.Vision.Acquisition.Software.v8.5_keygen
20081017NI Vision Acquisition Software v8.5 keygen
 1.6MB
Full Autonomous Quadcopter for Indoor 3D Reconstruction
20190506Inflight data acquisition is not only used for navigation, but also for reconstruction of a 3D surface model of an arbitrary object in real time. This project builds on the open source autopilot ...

下载
人工智能机器学习1.rar
人工智能机器学习1.rar

下载
毕业设计学生信息管理系统客户端毕设.zip
毕业设计学生信息管理系统客户端毕设.zip

下载
2021小学生西师大版6下.pdf
2021小学生西师大版6下.pdf

下载
20215年级下册旅游版英语.pdf
20215年级下册旅游版英语.pdf

下载
2018知识图谱发展报告.pdf
2018知识图谱发展报告.pdf

下载
20212年级下册人教版数学.pdf
20212年级下册人教版数学.pdf

下载
20216年级下册数学北师版.pdf
20216年级下册数学北师版.pdf

下载
20216年级下册人教版语文.pdf
20216年级下册人教版语文.pdf

下载
20210515兴业证券兴证策略风格与估值系列166：“百周年”行情徐徐展开，医药白酒券商本周领涨.pdf
20210515兴业证券兴证策略风格与估值系列166：“百周年”行情徐徐展开，医药白酒券商本周领涨.pdf

下载
奥精医疗：奥精医疗首次公开发行股票并在科创板上市招股说明书.PDF
奥精医疗：奥精医疗首次公开发行股票并在科创板上市招股说明书.PDF