CVPR2021目标检测论文列表及摘要.docx_目标检测方面需要看哪些论文资源-CSDN文库

需积分: 30 20 浏览量 2021-12-20 19:55:47 上传评论 1 收藏 56KB DOCX 举报

资源详情

资源评论

资源推荐

CVPR2021 目标检测论文列表，共 70 篇

3D 目标检测

1. 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D

Object Detection

Abstract:3D object detection is an important yet demanding task that heavily

relies on difficult to obtain 3D annotations. To reduce the required amount of

supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D

object detection applicable to both indoor and outdoor scenes. We leverage a

teacher-student mutual learning framework to propagate information from the

labeled to the unlabeled train set in the form of pseudo-labels. However, due to

the high task complexity, we observe that the pseudo-labels suffer from

significant noise and are thus not directly usable. To that end, we introduce a

confidence-based filtering mechanism, inspired by FixMatch. We set confidence

thresholds based upon the predicted objectness and class probability to filter low-

quality pseudo-labels. While effective, we observe that these two measures do

not sufficiently capture localization quality. We therefore propose to use the

estimated 3D IoU as a localization metric and set category-aware self-adjusted

thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone

detector on indoor datasets while we use PV-RCNN on the autonomous driving

dataset, KITTI. Our method consistently improves state-of-the-art methods on

both ScanNet and SUN-RGBD benchmarks by significant margins under all label

ratios (including fully labeled setting). For example, when training using only 10%

labeled data on ScanNet, 3DIoUMatch achieves 7.7 absolute improvement on

mAP@0.25 and 8.5 absolute improvement on mAP@0.5 upon the prior art. On

KITTI, we are the first to demonstrate semi-supervised 3D object detection and

our method surpasses a fully supervised baseline from 1.8% to 7.6% under

different label ratio and categories.

2. Categorical Depth Distribution Network for Monocular 3D Object

Detection

Abstract:Monocular 3D object detection is a key problem for autonomous

vehicles, as it provides a solution with simple configuration compared to typical

multi-sensor systems. The main challenge in monocular 3D detection lies in

accurately predicting object depth, which must be inferred from object and scene

cues due to the lack of direct range measurement. Many methods attempt to

directly estimate depth to assist in 3D detection, but show limited performance as

a result of depth inaccuracy. Our proposed solution, Categorical Depth

Distribution Network (CaDDN), uses a predicted categorical depth distribution for

each pixel to project rich contextual feature information to the appropriate depth

interval in 3D space. We then use the computationally efficient bird’s-eye-view

projection and single-stage detector to produce the final output detections. We

design CaDDN as a fully differentiable end-to-end approach for joint depth

estimation and object detection. We validate our approach on the KITTI 3D object

detection benchmark, where we rank 1?

?among published monocular methods.

We also provide the first monocular 3D detection results on the newly released

Waymo Open Dataset. We provide a code release for CaDDN which is made

available.

3. ST3D: Self-training for Unsupervised Domain Adaptation on 3D

Object Detection

Abstract:We present a new domain adaptive self-training pipeline, named ST3D,

for unsupervised domain adaptation on 3D object detection from point clouds.

First, we pre-train the 3D detector on the source domain with our proposed

random object scaling strategy for mitigating the negative effects of source

domain bias. Then, the detector is iteratively improved on the target domain by

alternatively conducting two steps, which are the pseudo label updating with the

developed quality-aware triplet memory bank and the model training with

curriculum data augmentation. These specific designs for 3D object detection

enable the detector to be trained with consistent and high-quality pseudo labels

and to avoid overfitting to the large number of easy examples in pseudo labeled

data. Our ST3D achieves state-of-the-art performance on all evaluated datasets

and even surpasses fully supervised results on KITTI 3D object detection

benchmark. Code will be available at https://github.com/CVMI-Lab/ST3D.

4. Center-based 3D Object Detection and Tracking

Abstract:Three-dimensional objects are commonly represented as 3D boxes in a

point-cloud. This representation mimics the well-studied image-based 2D

bounding-box detection but comes with additional challenges. Objects in a 3D

world do not follow any particular orientation, and box-based detectors have

difficulties enumerating all orientations or fitting an axis-aligned bounding box to

rotated objects. In this paper, we instead propose to represent, detect, and track

3D objects as points. Our framework, CenterPoint, first detects centers of objects

using a keypoint detector and regresses to other attributes, including 3D size, 3D

orientation, and velocity. In a second stage, it refines these estimates using

additional point features on the object. In CenterPoint, 3D object tracking

simplifies to greedy closest-point matching. The resulting detection and tracking

algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art

performance on the nuScenes benchmark for both 3D detection and tracking,

with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open

Dataset, Center-Point outperforms all previous single model methods by a large

margin and ranks first among all Lidar-only submissions. The code and pretrained

models are available at https://github.com/tianweiy/CenterPoint.

2D 目标检测

5. Scaled-YOLOv4: Scaling Cross Stage Partial Network

Abstract:We show that the YOLOv4 object detection neural network based on

the CSP approach, scales both up and down and is applicable to small and large

networks while maintaining optimal speed and accuracy. We propose a network

scaling approach that modifies not only the depth, width, resolution, but also

structure of the network. YOLOv4-large model achieves state-of-the-art results:

55.5% AP (73.4% AP 50 ) for the MS COCO dataset at a speed of ~ 16 FPS on

Tesla V100, while with the test time augmentation, YOLOv4-large achieves

56.0% AP (73.3 AP 50 ). To the best of our knowledge, this is currently the

highest accuracy on the COCO dataset among any published work. The

YOLOv4-tiny model achieves 22.0% AP (42.0% AP 50 ) at a speed of ~443 FPS

on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the

YOLOv4-tiny achieves 1774 FPS.

6 You Only Look One-level Feature

Abstract:This paper revisits feature pyramids networks (FPN) for one-stage

detectors and points out that the success of FPN is due to its divide-and-conquer

solution to the optimization problem in object detection rather than multi-scale

feature fusion. From the perspective of optimization, we introduce an alternative

way to address the problem instead of adopting the complex feature pyramids -

utilizing only one-level feature for detection. Based on the simple and efficient

solution, we present You Only Look One-level Feature (YOLOF). In our method,

two key components, Dilated Encoder and Uniform Matching, are proposed and

bring considerable improvements. Extensive experiments on the COCO

benchmark prove the effectiveness of the proposed model. Our YOLOF achieves

comparable results with its feature pyramids counterpart RetinaNet while being

2.5× faster. Without transformer layers, YOLOF can match the performance of

DETR in a single-level feature manner with 7× less training epochs. Code is

available at https://github.com/megvii-model/YOLOF.

7. Sparse R-CNN: End-to-End Object Detection with Learnable

Proposals

8. End-to-End Object Detection with Fully Convolutional Network

9. Dynamic Head: Unifying Object Detection Heads with Attentions

Abstract:The complex nature of combining localization and classification in

object detection has resulted in the flourished development of methods. Previous

works tried to improve the performance in various object detection heads but

failed to present a unified view. In this paper, we present a novel dynamic head

framework to unify object detection heads with attentions. By coherently

combining multiple self-attention mechanisms between feature levels for scale-

awareness, among spatial locations for spatial-awareness, and within output

channels for task-awareness, the proposed approach significantly improves the

representation ability of object detection heads without any computational

overhead. Further experiments demonstrate that the effectiveness and efficiency

of the proposed dynamic head on the COCO benchmark. With a standard

ResNeXt-101-DCN backbone, we largely improve the performance over popular

object detectors and achieve a new state-of-the-art at 54.0 AP. The code will be

released at https://github.com/microsoft/DynamicHead.

10. Generalized Focal Loss V2: Learning Reliable Localization Quality

Estimation for Dense Object Detection

Abstract:Localization Quality Estimation (LQE) is crucial and popular in the

recent advancement of dense object detectors since it can provide accurate

ranking scores that benefit the Non-Maximum Suppression processing and

improve detection performance. As a common practice, most existing methods

predict LQE scores through vanilla convolutional features shared with object

classification or bounding box regression. In this paper, we explore a completely

novel and different perspective to perform LQE – based on the learned

distributions of the four parameters of the bounding box. The bounding box

distributions are inspired and introduced as "General Distribution" in GFLV1,

which describes the uncertainty of the predicted bounding boxes well. Such a

property makes the distribution statistics of a bounding box highly correlated to its

real localization quality. Specifically, a bounding box distribution with a sharp peak

usually corresponds to high localization quality, and vice versa. By leveraging the

close correlation between distribution statistics and the real localization quality,

we develop a considerably lightweight Distribution-Guided Quality Predictor

(DGQP) for reliable LQE based on GFLV1, thus producing GFLV2. To our best

knowledge, it is the first attempt in object detection to use a highly relevant,

statistical representation to facilitate LQE. Extensive experiments demonstrate

the effectiveness of our method. Notably, GFLV2 (ResNet101) achieves 46.2 AP

at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at

14.6 FPS) by absolute 2.6 AP on COCO test-dev, without sacrificing the efficiency

both in training and inference.

11. UP-DETR: Unsupervised Pre-training for Object Detection with

Transformers

Abstract:Object detection with transformers (DETR) reaches competitive

performance with Faster R-CNN via a transformer encoder-decoder architecture.

Inspired by the great success of pre-training transformers in natural language

processing, we propose a pretext task named random query patch detection to

Unsupervisedly Pre-train DETR (UP-DETR) for object detection. Specifically, we

randomly crop patches from the given image and then feed them as queries to

the decoder. The model is pre-trained to detect these query patches from the

original image. During the pre-training, we address two critical issues: multi-task

learning and multi-query localization. (1) To trade off classification and localization

preferences in the pretext task, we freeze the CNN backbone and propose a

patch feature reconstruction branch which is jointly optimized with patch

detection. (2) To perform multi-query localization, we introduce UP-DETR from

single-query patch and extend it to multi-query patches with object query shuffle

and attention mask. In our experiments, UP-DETR significantly boosts the

performance of DETR with faster convergence and higher average precision on

object detection, one-shot detection and panoptic segmentation. Code and pre-

training models: https://github.com/dddzg/up-detr.

12. MobileDets: Searching for Object Detection Architectures for

Mobile Accelerators

Abstract:Inverted bottleneck layers, which are built upon depth-wise

convolutions, have been the predominant building blocks in state-of-the-art object

detection models on mobile devices. In this work, we investigate the optimality of

this design pattern over a broad range of mobile accelerators by revisiting the

usefulness of regular convolutions. We discover that regular convolutions are a

potent component to boost the latency-accuracy trade-off for object detection on

accelerators, provided that they are placed strategically in the network via neural

architecture search. By incorporating regular convolutions in the search space

and directly optimizing the network architectures for object detection, we obtain a

family of object detection models, MobileDets, that achieve state-of-the-art results

across mobile accelerators. On the COCO object detection task, MobileDets

outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile CPU

inference latencies. MobileDets also outperform MobileNetV2+SSDLite by 1.9

mAP on mobile CPUs, 3.7 mAP on Google EdgeTPU, 3.4 mAP on Qualcomm

Hexagon DSP and 2.7 mAP on Nvidia Jetson GPU without increasing latency.

Moreover, MobileDets are comparable with the state-of-the-art MnasFPN on

mobile CPUs even without using the feature pyramid, and achieve better mAP

scores on both EdgeTPUs and DSPs with up to 2× speedup. Code and models

are available in the TensorFlow Object Detection API [16]:

https://github.com/tensorflow/models/tree/master/research/object_detection.

13. Tracking Pedestrian Heads in Dense Crowd

Abstract:Tracking humans in crowded video sequences is an important

constituent of visual scene understanding. Increasing crowd density challenges

visibility of humans, limiting the scalability of existing pedestrian trackers to higher

crowd densities. For that reason, we propose to revitalize head tracking with

Crowd of Heads Dataset (CroHD), consisting of 9 sequences of 11,463 frames

with over 2,276,838 heads and 5,230 tracks annotated in diverse scenes. For

evaluation, we proposed a new metric, IDEucl, to measure an algorithm’s efficacy

in preserving a unique identity for the longest stretch in image coordinate space,

thus building a correspondence between pedestrian crowd motion and the

performance of a tracking algorithm. Moreover, we also propose a new head

detector, HeadHunter, which is designed for small head detection in crowded

scenes. We extend HeadHunter with a Particle Filter and a color histogram based

re-identification module for head tracking. To establish this as a strong baseline,

we compare our tracker with existing state-of-the-art pedestrian trackers on

剩余29页未读，继续阅读

评论收藏

内容反馈

舒心远航

粉丝: 301
资源: 7

CVPR2021目标检测论文列表及摘要.docx

评论0

最新资源

CVPR2021目标检测论文列表及摘要.docx

评论0

CVPR2021目标跟踪论文列表及摘要.docx

2020CVPR目标跟踪论文列表及摘要.docx

CVPR2017目标检测论文

CVPR2021行人检测重识别论文列表及摘要.docx

CVPR2020目标检测论文列表及摘要.docx

CVPR2020行人检测论文列表及摘要.docx

CVPR 2021 论文和开源代码合集.docx

CVPR2018目标检测论文

cvpr2021_GAN_总结.docx

2021CVPR-目标检测.zip

CVPR2021全论文.txt

CVPR 2022 最新论文分类整理及开源代码合集.docx

CVPR2021-Paper-Code-Interpretation:cvpr2021cvpr2020cvpr2019cvpr2018cvpr2017 论文代码解读直播合集，极市团队整理

cvpr2021_latex.zip

CVPR.2016.91.docx

顶会CVPR 2021上与【图像分类】相关的论文（5篇）

AI技术综述pdf合集

CVPR2021-纸面代码解释：cvpr2021cvpr2020cvpr2019cvpr2018cvpr2017论文，极市团队整理

3GPP 5G标准中文版

putty安装包.zip

芯片设计技术（全流程介绍）.pdf

MPU6050.zip

KSVD_for_denosing-master.zip

notepad++最新版(7.9.2)

2021华为芯片研发岗位笔试题

2019和2021年华为单板通用硬件笔试题及答案

PID套用代码PID套用代码PID套用代码

模拟IC设计—模拟CMOS集成电路设计第二版（课后答案）

最新资源