objectdetectionsurveydeeplearningpart.pdf资源-CSDN文库

需积分: 9 30 浏览量 2021-06-07 17:18:44 上传评论收藏 7.5MB PDF 举报

目标检测是计算机视觉中一个基础且具有挑战性的问题，其目的是在自然图像中定位预定义类别中的对象实例。目标检测不仅是众多图像处理任务中的核心环节，而且在智能监控、自动驾驶、机器人视觉等领域发挥着至关重要的作用。近年来，随着深度学习技术的快速发展，目标检测领域迎来了突破性的进展。在深度学习中，卷积神经网络（CNN）成为学习数据特征表示的强大工具，这使得从大规模图像数据集中直接学习到的特征表达对于目标检测来说变得日益重要。深度学习技术已经导致通用目标检测领域的显著突破。深度学习用于目标检测不仅体现在算法层面，而且体现在更广泛的方面，包括检测框架、对象特征表示、对象提议生成、上下文建模、训练策略和评估指标等。本综述涉及300多个研究贡献，它们涵盖了通用目标检测的许多方面，包括但不限于以下几个方面： 1. 检测框架：目标检测框架是检测算法的骨架，它决定了如何组织和利用各种算法组件来实现目标检测任务。常见的深度学习目标检测框架包括R-CNN系列、YOLO系列、SSD以及Faster R-CNN等。 2. 对象特征表示：特征表示是目标检测的关键环节，通过深度网络可以提取图像中的关键信息，形成有效的特征描述符。这包括了如何使用卷积神经网络来捕捉图像的空间特征，以及如何融合不同层的特征来提高检测的准确度。 3. 对象提议生成：对象提议（Region Proposals）技术是一种选取图像中有意义区域的方法，目的是减少需要在这些区域上运行检测算法的计算量。代表性的提议生成算法有选择性搜索（Selective Search）、边缘框（Edge Boxes）等。 4. 上下文建模：上下文信息在目标检测中扮演着重要角色，它可以帮助算法理解场景的整体结构以及各对象之间的相互关系。深度学习可以通过多尺度特征融合、注意力机制等技术来更好地建模场景上下文信息。 5. 训练策略：深度学习模型的训练策略包括正负样本的选取、损失函数的选择和优化算法的实现。多任务学习、在线难样本挖掘、数据增强等策略也是提高模型泛化能力的关键因素。 6. 评估指标：在目标检测任务中，评价模型性能的指标包括精确度（Precision）、召回率（Recall）和平均精度均值（mean Average Precision, mAP）。这些指标能够全面反映目标检测算法的准确性、鲁棒性和泛化能力。在文章的作者通过识别未来研究的有前景的方向，为后续研究提供了指导。这些潜在的研究方向可能包括如何进一步提升检测速度和准确性、如何减少对大量标注数据的依赖、如何设计更鲁棒的检测算法以应对各种复杂环境等。深度学习在目标检测领域取得了显著的成就，但也存在一些挑战需要进一步探索和解决。未来的目标检测技术将继续朝着更快、更准、更智能的方向发展。

资源推荐

资源详情

资源评论

International Journal of Computer Vision (2020) 128:261–318

https://doi.org/10.1007/s11263-019-01247-4

Deep Learning for Generic Object Detection: A Survey

Li Liu

1,2

· Wanli Ouyang

· Xiaogang Wang

· Paul Fieguth

· Jie Chen

· Xinwang Liu

· Matti Pietikäinen

Received: 6 September 2018 / Accepted: 26 September 2019 / Published online: 31 October 2019

Abstract

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances

from a large number of predeﬁned categories in natural images. Deep learning techniques have emerged as a powerful

strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the ﬁeld of

generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of

the recent achievements in this ﬁeld brought about by deep learning techniques. More than 300 research contributions are

included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation,

object proposal generation, context modeling, training strategies, and evaluation metrics. We ﬁnish the survey by identifying

promising directions for future research.

Keywords Object detection · Deep learning · Convolutional neural networks · Object recognition

1 Introduction

As a longstanding, fundamental and challenging problem

in computer vision, object detection (illustrated in Fig. 1)

has been an active area of research for several decades (Fis-

Communicated by Bernt Schiele.

Li Liu

li.liu@oulu.ﬁ

Wanli Ouyang

wanli.ouyang@sydney.edu.au

Xiaogang Wang

xgwang@ee.cuhk.edu.hk

Paul Fieguth

pﬁeguth@uwaterloo.ca

Jie Chen

jie.chen@oulu.ﬁ

Xinwang Liu

xinwangliu@nudt.edu.cn

Matti Pietikäinen

matti.pietikainen@oulu.ﬁ

National University of Defense Technology, Changsha, China

University of Oulu, Oulu, Finland

University of Sydney, Camperdown, Australia

Chinese University of Hong Kong, Sha Tin, China

University of Waterloo, Waterloo, Canada

chler and Elschlager 1973). The goal of object detection is

to determine whether there are any instances of objects from

given categories (such as humans, cars, bicycles, dogs or

cats) in an image and, if present, to return the spatial loca-

tion and extent of each object instance (e.g., via a bounding

box Everingham et al. 2010; Russakovsky et al. 2015). As

the cornerstone of image understanding and computer vision,

object detection forms the basis for solving complex or high

level vision tasks such as segmentation, scene understand-

ing, object tracking, image captioning, event detection, and

activity recognition. Object detection supports a wide range

of applications, including robot vision, consumer electronics,

security, autonomous driving, human computer interaction,

content based image retrieval, intelligent video surveillance,

and augmented reality.

Recently, deep learning techniques (Hinton and Salakhut-

dinov 2006; LeCun et al. 2015) have emerged as powerful

methods for learning feature representations automatically

from data. In particular, these techniques have provided

major improvements in object detection, as illustrated in

Fig. 3.

As illustrated in Fig. 2, object detection can be grouped

into one of two types (Grauman and Leibe 2011; Zhang et al.

2013): detection of speciﬁc instances versus the detection of

broad categories. The ﬁrst type aims to detect instances of

a particular object (such as Donald Trump’s face, the Eiffel

Tower, or a neighbor’s dog), essentially a matching problem.

123

262 International Journal of Computer Vision (2020) 128:261–318

Fig. 1 Most frequent keywords in ICCV and CVPR conference papers

from 2016 to 2018. The size of each word is proportional to the fre-

quency of that keyword. We can see that object detection has received

signiﬁcant attention in recent years

Fig. 2 Object detection includes localizing instances of a particular

object (top), as well as generalizing to detecting object categories in

general (bottom). This survey focuses on recent advances for the latter

problem of generic object detection

The goal of the second type is to detect (usually previ-

ously unseen) instances of some predeﬁned object categories

(for example humans, cars, bicycles, and dogs). Historically,

much of the effort in the ﬁeld of object detection has focused

on the detection of a single category ( typically faces and

pedestrians) or a few speciﬁc categories. In contrast, over

the past several years, the research community has started

moving towards the more challenging goal of building gen-

eral purpose object detection systems where the breadth of

object detection ability rivals that of humans.

Krizhevsky et al. (2012a) proposed a Deep Convo-

lutional Neural Network (DCNN) called AlexNet which

achieved record breaking image classiﬁcation accuracy in the

Large Scale Visual Recognition Challenge (ILSVRC) (Rus-

sakovsky et al. 2015). Since that time, the research focus in

most aspects of computer vision has been speciﬁcally on deep

learning methods, indeed including the domain of generic

object detection (Girshick et al. 2014;Heetal.2014;Gir-

shick 2015; Sermanet et al. 2014; Ren et al. 2017). Although

tremendous progress has been achieved, illustrated in Fig. 3,

we are unaware of comprehensive surveys of this subject

(a) (b)

Fig. 3 An overview of recent object detection performance: we can

observe a signiﬁcant improvement in performance (measured as mean

average precision) since the arrival of deep learning in 2012. a Detection

results of winning entries in the VOC2007-2012 competitions, and b

top object detection competition results in ILSVRC2013-2017 (results

in both panels use only the provided training data)

over the past 5 years. Given the exceptionally rapid rate of

progress, this article attempts to track recent advances and

summarize their achievements in order to gain a clearer pic-

ture of the current panorama in generic object detection.

1.1 Comparison with Previous Reviews

Many notable object detection surveys have been published,

as summarized in Table 1. These include many excellent sur-

veys on the problem of speciﬁc object detection, such as

pedestrian detection (Enzweiler and Gavrila 2009; Geron-

imo et al. 2010; Dollar et al. 2012), face detection (Yang

et al. 2002; Zafeiriou et al. 2015), vehicle detection (Sun et al.

2006) and text detection (Ye and Doermann 2015). There are

comparatively few recent surveys focusing directly on the

problem of generic object detection, except for the work by

Zhang et al. (2013) who conducted a survey on the topic

of object class detection. However, the research reviewed

in Grauman and Leibe (2011), Andreopoulos and Tsotsos

(2013) and Zhang et al. (2013) is mostly pre-2012, and there-

fore prior to the recent striking success and dominance of

deep learning and related methods.

Deep learning allows computational models to learn

fantastically complex, subtle, and abstract representations,

driving signiﬁcant progress in a broad range of problems such

as visual recognition, object detection, speech recognition,

natural language processing, medical image analysis, drug

discovery and genomics. Among different types of deep neu-

ral networks, DCNNs (LeCun et al. 1998, 2015; Krizhevsky

et al. 2012a) have brought about breakthroughs in processing

images, video, speech and audio. To be sure, there have been

many published surveys on deep learning, including that of

Bengio et al. (2013), LeCun et al. (2015), Litjens et al. (2017),

Gu et al. (2018), and more recently in tutorials at ICCV and

CVPR.

In contrast, although many deep learning based methods

have been proposed for object detection, we are unaware of

123

International Journal of Computer Vision (2020) 128:261–318 263

Table 1 Summary of related object detection surveys since 2000

No. Survey title References Year Venue Content

1 Monocular pedestrian detection:

survey and experiments

Enzweiler and Gavrila (2009) 2009 PAMI An evaluation of three pedestrian

detectors

2 Survey of pedestrian detection for

advanced driver assistance

systems

Geronimo et al. (2010) 2010 PAMI A survey of pedestrian detection

for advanced driver assistance

systems

3 Pedestrian detection: an evaluation

of the state of the art

Dollar et al. (2012) 2012 PAMI A thorough and detailed evaluation

of detectors in monocular images

4 Detecting faces in images: a survey Yang et al. (2002) 2002 PAMI First survey of face detection from

a single image

5 A survey on face detection in the

wild: past, present and future

Zafeiriou et al. (2015) 2015 CVIU A survey of face detection in the

wild since 2000

6 On road vehicle detection: a review Sun et al. (2006) 2006 PAMI A review of vision based on-road

vehicle detection systems

7 Text detection and recognition in

imagery: a survey

Ye and Doermann (2015) 2015 PAMI A survey of text detection and

recognition in color imagery

8 Toward category level object

recognition

Ponce et al. (2007) 2007 Book Representative papers on object

categorization, detection, and

segmentation

9 The evolution of object

categorization and the challenge

of image abstraction

Dickinson et al. (2009) 2009 Book A trace of the evolution of object

categorization over 4 decades

10 Context based object

categorization: a critical survey

Galleguillos and Belongie (2010) 2010 CVIU A review of contextual information

for object categorization

11 50 years of object recognition:

directions forward

Andreopoulos and Tsotsos (2013) 2013 CVIU A review of the evolution of object

recognition systems over

5 decades

12 Visual object recognition Grauman and Leibe (2011) 2011 Tutorial Instance and category object

recognition techniques

13 Object class detection: a survey Zhang et al. (2013) 2013 ACM CS Survey of generic object detection

methods before 2011

14 Feature representation for

statistical learning based object

detection: a review

Li et al. (2015b) 2015 PR Feature representation methods in

statistical learning based object

detection, including handcrafted

and deep learning based features

15 Salient object detection: a survey Borji et al. (2014) 2014 arXiv A survey for salient object

detection

16 Representation learning: a review

and new perspectives

Bengio et al. (2013) 2013 PAMI Unsupervised feature learning and

deep learning, probabilistic

models, autoencoders, manifold

learning, and deep networks

17 Deep learning LeCun et al. (2015) 2015 Nature An introduction to deep learning

and applications

18 A survey on deep learning in

medical image analysis

Litjens et al. (2017) 2017 MIA A survey of deep learning for

image classiﬁcation, object

detection, segmentation and

registration in medical image

analysis

19 Recent advances in convolutional

neural networks

Gu et al. (2018) 2017 PR A broad survey of the recent

advances in CNN and its

applications in computer vision,

speech and natural language

processing

20 Tutorial: tools for efﬁcient object

detection

− 2015 ICCV15 A short course for object detection

only covering recent milestones

123

264 International Journal of Computer Vision (2020) 128:261–318

Table 1 continued

No. Survey title References Year Venue Content

21 Tutorial: deep learning for objects

and scenes

− 2017 CVPR17 A high level summary of recent

work on deep learning for visual

recognition of objects and scenes

22 Tutorial: instance level recognition − 2017 ICCV17 A short course of recent advances

on instance level recognition,

including object detection,

instance segmentation and

human pose prediction

23 Tutorial: visual recognition and

beyond

− 2018 CVPR18 A tutorial on methods and

principles behind image

classiﬁcation, object detection,

instance segmentation, and

semantic segmentation

24 Deep learning for generic object

detection

Ours 2019 VISI A comprehensive survey of deep

learning for generic object

detection

any comprehensive recent survey. A thorough review and

summary of existing work is essential for further progress in

object detection, particularly for researchers wishing to enter

the ﬁeld. Since our focus is on generic object detection, the

extensive work on DCNNs for speciﬁc object detection, such

as face detection (Li et al. 2015a; Zhang et al. 2016a;Huetal.

2017), pedestrian detection (Zhang et al. 2016b; Hosang et al.

2015), vehicle detection (Zhou et al. 2016b) and trafﬁc sign

detection (Zhu et al. 2016b) will not be considered.

1.2 Scope

The number of papers on generic object detection based on

deep learning is breathtaking. There are so many, in fact, that

compiling any comprehensive review of the state of the art is

beyond the scope of any reasonable length paper. As a result,

it is necessary to establish selection criteria, in such a way

that we have limited our focus to top journal and conference

papers. Due to these limitations, we sincerely apologize to

those authors whose works are not included in this paper. For

surveys of work on related topics, readers are referred to the

articles i n Table 1. This survey focuses on major progress of

the last 5 years, and we restrict our attention to still pictures,

leaving the important subject of video object detection as a

topic for separate consideration in the future.

The main goal of this paper is to offer a comprehensive

survey of deep learning based generic object detection tech-

niques, and to present some degree of taxonomy, a high

level perspective and organization, primarily on the basis

of popular datasets, evaluation metrics, context modeling,

and detection proposal methods. The intention is that our

categorization be helpful for readers to have an accessi-

ble understanding of similarities and differences between

a wide variety of strategies. The proposed taxonomy gives

researchers a framework to understand current research and

to identify open challenges for future research.

The remainder of this paper is organized as follows.

Related background and the progress made during the last

2 decades are summarized in Sect. 2. A brief introduction

to deep learning is given in Sect. 3. Popular datasets and

evaluation criteria are summarized in Sect. 4. We describe

the milestone object detection frameworks in Sect. 5.From

Sects. 6 to 9, fundamental sub-problems and the relevant

issues involved in designing object detectors are discussed.

Finally, in Sect. 10, we conclude the paper with an overall

discussion of object detection, state-of-the- art performance,

and future research directions.

2 Generic Object Detection

2.1 The Problem

Generic object detection, also called generic object category

detection, object class detection, or object category detec-

tion (Zhang et al. 2013), is deﬁned as follows. Given an

image, determine whether or not there are instances of objects

from predeﬁned categories (usually many categories, e.g.,

200 categories in the ILSVRC object detection challenge)

and, if present, to return the spatial location and extent of

each instance. A greater emphasis is placed on detecting

a broad range of natural categories, as opposed to speciﬁc

object category detection where only a narrower predeﬁned

category of interest (e.g., faces, pedestrians, or cars) may

be present. Although thousands of objects occupy the visual

world in which we live, currently the research community is

primarily interested in the localization of highly structured

objects (e.g., cars, faces, bicycles and airplanes) and artic-

123

International Journal of Computer Vision (2020) 128:261–318 265

(a) (b)

(d)

(c)

Fig. 4 Recognition problems related to generic object detection: a

image level object classiﬁcation, b bounding box level generic object

detection, c pixel-wise semantic segmentation, d instance level semantic

segmentation

ulated objects (e.g., humans, cows and horses) rather than

unstructured scenes (such as sky, grass and cloud).

The spatial location and extent of an object can be deﬁned

coarsely using a bounding box (an axis-aligned rectangle

tightly bounding the object) (Everingham et al. 2010;Rus-

sakovsky et al. 2015), a precise pixelwise segmentation mask

(Zhang et al. 2013), or a closed boundary (Lin et al. 2014;

Russell et al. 2008), as illustrated in Fig. 4. To the best of

our knowledge, for the evaluation of generic object detec-

tion algorithms, it is bounding boxes which are most widely

used in the current literature (Everingham et al. 2010;Rus-

sakovsky et al. 2015), and therefore this is also the approach

we adopt in this survey. However, as the research community

moves towards deeper scene understanding (from image level

object classiﬁcation to single object localization, to generic

object detection, and to pixelwise object segmentation), it is

anticipated that future challenges will be at the pixel level

(Lin et al. 2014).

There are many problems closely related to that of generic

object detection

. The goal of object classiﬁcation or object

categorization (Fig. 4a) is to assess the presence of objects

from a given set of object classes in an image; i.e., assigning

one or more object class labels to a given image, determin-

ing the presence without the need of location. The additional

requirement to locate the instances in an image makes detec-

tion a more challenging task than classiﬁcation. The object

recognition problem denotes the more general problem of

identifying/localizing all the objects present in an image,

subsuming the problems of object detection and classiﬁca-

tion (Everingham et al. 2010; Russakovsky et al. 2015; Opelt

To the best of our knowledge, there is no universal agreement in the

literature on the deﬁnitions of various vision subtasks. Terms such as

detection, localization, recognition, classiﬁcation, categorization, veri-

ﬁcation, identiﬁcation, annotation, labeling, and understanding are often

differently deﬁned (Andreopoulos and Tsotsos 2013).

Fig. 5 Taxonomy of challenges in generic object detection

et al. 2006; Andreopoulos and Tsotsos 2013). Generic object

detection is closely related to semantic image segmentation

(Fig. 4c), which aims to assign each pixel in an image to a

semantic class label. Object instance segmentation (Fig. 4d)

aims to distinguish different instances of the same object

class, as opposed to semantic segmentation which does not.

2.2 Main Challenges

The ideal of generic object detection is to develop a general-

purpose algorithm that achieves two competing goals of high

quality/accuracy and high efﬁciency (Fig. 5). As illustrated

in Fig. 6, high quality detection must accurately local-

ize and recognize objects in images or video frames, such

that the large variety of object categories in the real world

can be distinguished (i.e., high distinctiveness), and that

object instances from the same category, subject to intra-

class appearance variations, can be localized and recognized

(i.e., high robustness). High efﬁciency requires that the entire

detection task runs in real time with acceptable memory and

storage demands.

2.2.1 Accuracy Related Challenges

Challenges in detection accuracy stem from (1) the vast range

of intra-class variations and (2) the huge number of object

categories.

Intra-class variations can be divided into two types: intrin-

sic factors and imaging conditions. In terms of intrinsic

factors, each object category can have many different object

instances, possibly varying in one or more of color, tex-

ture, material, shape, and size, such as the “chair” category

shown in Fig. 6i. Even in a more narrowly deﬁned class, such

as human or horse, object instances can appear in different

poses, subject to nonrigid deformations or with the addition

of clothing.

123

剩余57页未读，继续阅读

评论收藏

内容反馈

Data+Science+Insight

粉丝: 1w+
资源: 54

object detection survey deep learning part.pdf

最新资源

object detection survey deep learning part.pdf

A Survey of Deep Learning-based Object Detection.pdf

Object Detection With Deep Learning：A Review.pdf

Deep Learning Object Detection.rar

deep_learning_object_detection：使用深度学习的对象检测的纸质清单

A survey on deep learning in medical image analysis.pdf

matlab集成c代码-A-paper-list-of-object-detection-using-deep-learning:一篇使用深度

matlab导入mat代码-deep-learning-for-object-detection-yolov2:“用于对象检测的深度学习”视频

Google MediaPipe Objectron (3D Object Detection) 三维目标检测.zip

TensorFlow Object Detection API所涉及文件.zip

Object Detection with Discriminatively Trained Part-Based Models

藏经阁-Cloud for Cognitive Computing (AI, Deep Learning...).pdf

matlab集成c代码-deep_learning_object_detection-master:deep_learning_object_

matlab集成c代码-hoya012-deep_learning_object_detection:hoya012-deep_learnin

matlab高斯金字塔代码-deep_learning_object_detection:deep_learning_object_detec

Comprehensive_survey_of_deep_learning_in_remote_sensing.pdf

Deep.Learning.with.TensorFlow

Object Detection and Recognition Using Deep Learning in OpenCV [Chapter 1 and 2]

A Survey on Deep Domain Adaptation and Tiny Object Detection

Advanced Applied Deep Learning.pdf

Scene text detection and recognition with advances in deep learning.pdf

用卷积滤波器matlab代码-deep_learning_object_detection:deep_learning_object_dete

Python库 | ObjectDetection-aeye-0.0.3.tar.gz

ImVoteNet_Boosting_3D_Object_Detection_in_Point_Cloud.pdf

[machine_learning_mastery系列]deep_learning_with_python.pdf(with code)

Occlusion Handling in Generic Object Detection A Review.pdf

最新资源