1
A Survey of Deep Learning-based Object Detection
Licheng Jiao, Fellow, IEEE, Fan Zhang, Fang Liu, Senior Member, IEEE, Shuyuan Yang, Senior Member, IEEE,
Lingling Li, Member, IEEE, Zhixi Feng, Member, IEEE, and Rong Qu, Senior Member, IEEE
Abstract—Object detection is one of the most important and
challenging branches of computer vision, which has been widely
applied in peoples life, such as monitoring security, autonomous
driving and so on, with the purpose of locating instances of
semantic objects of a certain class. With the rapid development
of deep learning networks for detection tasks, the performance
of object detectors has been greatly improved. In order to
understand the main development status of object detection
pipeline, thoroughly and deeply, in this survey, we first analyze
the methods of existing typical detection models and describe
the benchmark datasets. Afterwards and primarily, we provide a
comprehensive overview of a variety of object detection methods
in a systematic manner, covering the one-stage and two-stage
detectors. Moreover, we list the traditional and new applications.
Some representative branches of object detection are analyzed
as well. Finally, we discuss the architecture of exploiting these
object detection methods to build an effective and efficient system
and point out a set of development trends to better follow the
state-of-the-art algorithms and further research.
Index Terms—Object detection, deep learning, typical
pipelines, classification, localization.
I. INTRODUCTION
O
BJECT detection has been attracting increasing amounts
of attention in recent years due to its wide range of
applications and recent technological breakthroughs. This task
is under extensive investigation in both academia and real
world applications, such as monitoring security, autonomous
driving, transportation surveillance, drone scene analysis, and
robotic vision. Among many factors and efforts that lead to
the fast evolution of image object detection techniques, a
notable contribution should be attributed to the development
of deep convolution neural networks and GPUs computing
power. At present, deep learning model has been widely used
in the whole field of computer vision, including general image
object detection and domain-specific object detection. State-
of-the-art object detectors almost use deep learning networks
as their both backbone and detection network for extracting
features from the input images, classification and localization
respectively. Object detection is a computer technology related
to computer vision and image processing that deals with
detecting instances of semantic objects of a certain class
(such as humans, buildings, or cars) in digital images and
videos. Well-researched domains of image object detection
include multi-categories detection, edge detection, salient ob-
ject detection, pose detection, face detection and pedestrian
detection. Because a rising number of applications need scene
Key Laboratory of Intelligent Perception and Image Understanding of Min-
istry of Education, International Research Center for Intelligent Perception and
Computation, Joint International Research Laboratory of Intelligent Perception
and Computation, School of Artificial Intelligence, Xidian University, Xian,
Shaanxi Province 710071, China e-mail: (lchjiao@mail.xidian.edu.cn).
understanding, as an important part image object detection has
been widely used in many areas of modern life. So far many
benchmarks play an important role in object detection field,
such as Caltech [1], KITTI [2], ImageNet [3], PASCAL VOC
[4], and MS COCO [5]. In ECCV VisDrone 2018 contest, the
organizer release a novel dataset benchmark contains a large
amount of images and videos based on the drone platform.
Pre-existing domain-specific image object detectors usually
can be divided into two categories, the one is two-stage
detector, the most representative one, Faster R-CNN [6]. The
other is one-stage detector, such as YOLO [7], SSD [8]. Two-
stage detectors have high localization and object recognition
accuracy, while the one-stage detectors achieve high inference
speed. The two stage of two-stage detectors is divided by ROI
(Region of Interest) pooling layer. For instance, in Faster R-
CNN, the first stage, called RPN, a Region Proposal Network,
proposes candidate object bounding boxes. The second stage,
features are extracted by RoIPool operation from each can-
didate box for the following classification and bounding-box
regression missions [9]. Fig.1 (a) shows the basic architec-
ture of two-stage detectors. The one-stage detectors propose
predicted boxes from input images directly without region
proposal step, thus they are time efficient and can be used
for real-time devices. Fig.1 (b) exhibits the basic architecture
of one-stage detectors.
Our survey is focus on describing and analyzing deep
learning based image object detection. The existing surveys
always cover a series of domain of general object detection and
may not contain the-state-of-the-art methods which provide
some novel solutions and newly directions of these tasks
because of rapid development. We list very novel solutions
proposed recently but neglect to discuss the basics so that
readers can see the cutting edge of the field more easily.
Different from previous object detection surveys, in this paper
we systematically and comprehensively review deep learning
based object detection methods and most importantly the up
to date detection solutions while research trends. Our survey is
featured by in-depth analysis and discussion in various aspects,
many of which, to the best of our knowledge, are the first time
in this field. It is our intention to provide an overview how
different deep learning methods are being used rather than a
full summary of all related papers. To get into the field, we
recommend readers refer to [10] [11] [12] for more details of
early methods.
The rest of the paper is organized as follows. Image
object detectors need a powerful backbone network for rich
feature extracting. We discuss backbone networks in section 2
below. The typical pipeline domain-specific image detectors
act as basics and milestone of the task. In section 3, we
will elaborate the most representative and pioneering deep
arXiv:1907.09408v1 [cs.CV] 11 Jul 2019
- 1
- 2
- 3
前往页