基于深度学习的目标检测综述.pdf_目标检测综述资源-CSDN文库

版权申诉

5星 · 超过95%的资源 102 浏览量 2021-08-04 13:22:04 上传评论 1 收藏 11.8MB PDF 举报

目标检测是计算机视觉领域中最为重要且具有挑战性的分支之一。目标检测的任务在于定位并识别图像中具有特定类别语义的实例。随着深度学习技术的飞速发展，特别是在图像目标检测技术方面，目标检测的性能得到了极大的提升。为此，研究者们对目标检测的发展现状、方法、系统架构和应用领域进行了全面的分析和总结，以期构建出高效和有效的目标检测系统。深度学习技术特别是深度卷积神经网络（CNN）的发展和GPU计算能力的提升，极大地推动了图像目标检测技术的快速发展。在学术界和现实世界应用中，目标检测被广泛研究，涉及到安全监控、自动驾驶、交通监控、无人机场景分析、机器人视觉等多个领域。例如，安全监控需要对视频流进行实时分析以检测异常行为；自动驾驶汽车需要准确地识别出道路、行人、交通标志等以确保行车安全；无人机和机器人则需要通过目标检测技术来感知和理解其周边的环境。在目标检测技术中，传统方法一般基于图像处理技术，例如滑动窗口、特征提取和模板匹配等。而深度学习方法则利用神经网络自动学习和提取图像中的特征，并进行分类与定位。随着技术的进步，基于深度学习的目标检测方法分为单阶段检测器（如YOLO、SSD）和两阶段检测器（如R-CNN、Faster R-CNN）两大类。单阶段检测器的特点是速度快，适合于实时系统，但精度略低于两阶段检测器；两阶段检测器则在精确度上更有优势，但计算成本相对较高。深度学习目标检测模型通常使用标准的基准数据集进行训练和评估，这些数据集包括但不限于Pascal VOC、COCO、ImageNet等。这些数据集包含了大量经过人工标注的图像，涵盖了各种物体类别和场景。通过这些数据集的训练，模型能够学会识别和定位新的图像中的目标。除了传统的应用领域之外，目标检测的新应用场景也在不断涌现。例如，在医疗影像分析中，目标检测可以帮助医生识别CT、MRI等医学图像中的病变区域；在智能零售中，目标检测技术可以用于自动结算，提升购物体验；在工业视觉检测中，目标检测可以用于识别产品质量缺陷，从而提高生产效率。为了构建出一个有效和高效的系统，目标检测方法的体系架构显得尤为重要。系统架构涉及数据预处理、模型选择、训练策略、推理优化等方面。深度学习模型通常需要大量数据进行训练，以提升其泛化能力。数据增强和迁移学习等技术可以在数据不足的情况下提升模型性能。同时，硬件加速器如GPU和TPU的使用可以大幅提高模型训练和推理的速度。目标检测技术的未来发展趋势包括但不限于：进一步提高检测的精度和速度，以便在更多场景下应用；研究轻量级的网络结构，以适应边缘计算和移动设备；融合多模态数据，提升对复杂场景的理解和检测能力；提高模型的鲁棒性和安全性，使其能够应对各种恶意攻击。此外，研究者们也在探索无监督或半监督学习方法，以减少对大量标注数据的依赖。基于深度学习的目标检测已经成为推动计算机视觉技术发展的重要驱动力，并在许多领域展现出巨大的应用潜力。对现有技术和方法的深入分析和理解，对于指导未来研究和开发具有重要的意义。随着深度学习和相关硬件技术的不断进步，目标检测技术必将在各个领域发挥更加重要的作用。

资源推荐

资源详情

资源评论

A Survey of Deep Learning-based Object Detection

Licheng Jiao, Fellow, IEEE, Fan Zhang, Fang Liu, Senior Member, IEEE, Shuyuan Yang, Senior Member, IEEE,

Lingling Li, Member, IEEE, Zhixi Feng, Member, IEEE, and Rong Qu, Senior Member, IEEE

Abstract—Object detection is one of the most important and

challenging branches of computer vision, which has been widely

applied in peoples life, such as monitoring security, autonomous

driving and so on, with the purpose of locating instances of

semantic objects of a certain class. With the rapid development

of deep learning networks for detection tasks, the performance

of object detectors has been greatly improved. In order to

understand the main development status of object detection

pipeline, thoroughly and deeply, in this survey, we ﬁrst analyze

the methods of existing typical detection models and describe

the benchmark datasets. Afterwards and primarily, we provide a

comprehensive overview of a variety of object detection methods

in a systematic manner, covering the one-stage and two-stage

detectors. Moreover, we list the traditional and new applications.

Some representative branches of object detection are analyzed

as well. Finally, we discuss the architecture of exploiting these

object detection methods to build an effective and efﬁcient system

and point out a set of development trends to better follow the

state-of-the-art algorithms and further research.

Index Terms—Object detection, deep learning, typical

pipelines, classiﬁcation, localization.

I. INTRODUCTION

BJECT detection has been attracting increasing amounts

of attention in recent years due to its wide range of

applications and recent technological breakthroughs. This task

is under extensive investigation in both academia and real

world applications, such as monitoring security, autonomous

driving, transportation surveillance, drone scene analysis, and

robotic vision. Among many factors and efforts that lead to

the fast evolution of image object detection techniques, a

notable contribution should be attributed to the development

of deep convolution neural networks and GPUs computing

power. At present, deep learning model has been widely used

in the whole ﬁeld of computer vision, including general image

object detection and domain-speciﬁc object detection. State-

of-the-art object detectors almost use deep learning networks

as their both backbone and detection network for extracting

features from the input images, classiﬁcation and localization

respectively. Object detection is a computer technology related

to computer vision and image processing that deals with

detecting instances of semantic objects of a certain class

(such as humans, buildings, or cars) in digital images and

videos. Well-researched domains of image object detection

include multi-categories detection, edge detection, salient ob-

ject detection, pose detection, face detection and pedestrian

detection. Because a rising number of applications need scene

Key Laboratory of Intelligent Perception and Image Understanding of Min-

istry of Education, International Research Center for Intelligent Perception and

Computation, Joint International Research Laboratory of Intelligent Perception

and Computation, School of Artiﬁcial Intelligence, Xidian University, Xian,

Shaanxi Province 710071, China e-mail: (lchjiao@mail.xidian.edu.cn).

understanding, as an important part image object detection has

been widely used in many areas of modern life. So far many

benchmarks play an important role in object detection ﬁeld,

such as Caltech [1], KITTI [2], ImageNet [3], PASCAL VOC

[4], and MS COCO [5]. In ECCV VisDrone 2018 contest, the

organizer release a novel dataset benchmark contains a large

amount of images and videos based on the drone platform.

Pre-existing domain-speciﬁc image object detectors usually

can be divided into two categories, the one is two-stage

detector, the most representative one, Faster R-CNN [6]. The

other is one-stage detector, such as YOLO [7], SSD [8]. Two-

stage detectors have high localization and object recognition

accuracy, while the one-stage detectors achieve high inference

speed. The two stage of two-stage detectors is divided by ROI

(Region of Interest) pooling layer. For instance, in Faster R-

CNN, the ﬁrst stage, called RPN, a Region Proposal Network,

proposes candidate object bounding boxes. The second stage,

features are extracted by RoIPool operation from each can-

didate box for the following classiﬁcation and bounding-box

regression missions [9]. Fig.1 (a) shows the basic architec-

ture of two-stage detectors. The one-stage detectors propose

predicted boxes from input images directly without region

proposal step, thus they are time efﬁcient and can be used

for real-time devices. Fig.1 (b) exhibits the basic architecture

of one-stage detectors.

Our survey is focus on describing and analyzing deep

learning based image object detection. The existing surveys

always cover a series of domain of general object detection and

may not contain the-state-of-the-art methods which provide

some novel solutions and newly directions of these tasks

because of rapid development. We list very novel solutions

proposed recently but neglect to discuss the basics so that

readers can see the cutting edge of the ﬁeld more easily.

Different from previous object detection surveys, in this paper

we systematically and comprehensively review deep learning

based object detection methods and most importantly the up

to date detection solutions while research trends. Our survey is

featured by in-depth analysis and discussion in various aspects,

many of which, to the best of our knowledge, are the ﬁrst time

in this ﬁeld. It is our intention to provide an overview how

different deep learning methods are being used rather than a

full summary of all related papers. To get into the ﬁeld, we

recommend readers refer to [10] [11] [12] for more details of

early methods.

The rest of the paper is organized as follows. Image

object detectors need a powerful backbone network for rich

feature extracting. We discuss backbone networks in section 2

below. The typical pipeline domain-speciﬁc image detectors

act as basics and milestone of the task. In section 3, we

will elaborate the most representative and pioneering deep

arXiv:1907.09408v1 [cs.CV] 11 Jul 2019

learning-based approaches proposed before June 2019. The

common used datasets and metrics will be described in section

4. The analyses of general image object detection methods

are systematically explained in section 5. In section 6, we

describe ﬁve typical ﬁelds for object detection and several

popular branches of object detection. The development trend

is summarized in section 7.

II. BACKBONE NETWORKS

Backbone network is acting as the basic feature extractor for

object detection task which takes images as input and outputs

feature maps of the corresponding input image. Most of these

networks are the network for classiﬁcation task taking out the

last fully connected layers. The improved version of basic

classiﬁcation network is also available. For instance, Lin et

al. [13] add or subtract layers or replace some layers with

special designed layers. To better meet speciﬁc requirements,

some works [7] [14] utilize the newly designed backbone for

feature extracting.

For different requirements about accuracy vs. efﬁciency,

people can choose deeper and densely connected backbones,

like ResNet [9], ResNeXt [15], AmoebaNet [16] etc. or

lightweight backbones like MobileNet [17], ShufﬂeNet [18],

SqueezeNet [19], Xception [20], MobileNetV2 [21] etc. When

applied to mobile devices, lightweight backbones can meet

the requirements. Wang et al. [22] propose a novel real-time

object detection system by combining PeleeNet with [8] and

optimizing the architecture for fast processing speed. But the

more precise applications need high accuracy thus complicated

backbones. On the other hand, the real-time acquirements like

video or webcam not only need high processing speed but high

accuracy [7], which require ﬁnely designed backbone to adapt

to the detection architecture also make a trade-off between

speed and accuracy.

To explore more competitive detecting accuracy, deeper

and densely connected backbone is adopting to replace the

shallower and sparse connected counterpart. He et al. [9]

utilize ResNet [23] rather than VGG [24] which is adopted

in Faster R-CNN [6] for further accuracy gain because of its

high capacity to capture rich features.

The newly high performance classiﬁcation networks can

improve the precision and reduce the complexity of object

detection task. This is an effective way to further improve

network performance because backbone network is acting as a

feature extractor. As is known to all, the quality of the features

determines the upper bound of network performance, thus it is

an important step that needs further exploration. Please refer

to [25] for more details.

III. TYPICAL BASELINES

With the advent of deep learning and increasing computing

power, great progress has been made in general object de-

tection domain. When the ﬁrst CNN-based object detector R-

CNN was proposed, a series of signiﬁcant contributions have

been made which promote the development of general object

detection. We introduce some representative object detection

architectures for beginners to get started in this domain.

A. R-CNN

R-CNN is a region based CNN detector. As Ross Girshick

et al. [26] propose R-CNN which could be used in object

detection tasks, their works are the ﬁrst to show that a CNN

could lead to dramatically higher object detection performance

on PASCAL VOC datasets [4] than those systems based on

simpler HOG-like features. Deep learning method is veriﬁed

effective and efﬁcient in the ﬁeld of object detection.

R-CNN detector consists of four modules. The ﬁrst module

generates category-independent region proposals. The second

module extracts a ﬁxed-length feature vector from each region

proposal. The third module is a set of class-speciﬁc linear

SVMs to classify the objects in one image. The last module

is a bounding-box regressor for precisely bounding-box pre-

diction. For detailed, ﬁrst, to generate region proposals, the

authors adopt selective search method. Then, a CNN is used

for extracting a 4096-dimensional feature vector from each

region proposal. Because the fully connected layer needs input

vectors of ﬁxed length, the region proposal features should

have the same size. The authors adopt a ﬁxed 227 × 227

pixel as the input size of CNN. As we know, the objects

in various images have different size and aspect ratio, which

makes the region proposals extracted by the ﬁrst module

different in size. Regardless of the size or aspect ratio of

the candidate region, the authors warp all pixels in a tight

bounding box around it to the required size 227 × 227. The

feature extraction network consists of ﬁve convolutional layers

and two fully connected layers. And all CNN parameters are

shared across all categories. Each category trains category-

independent SVMs which dont share parameters between

different SVMs.

Pre-training on lager dataset followed by ﬁne-tuning on the

speciﬁed dataset is a good training method for deep convo-

lutional neural networks to achieve fast convergence. First,

Ross Girshick et al. [26] pre-train the CNN on a large scale

dataset (ImageNet classiﬁcation dataset [3]). The last fully

connected layer is replaced by the CNNs ImageNet speciﬁc

1000-way classiﬁcation layer. The next step is ﬁne-tuning the

CNN parameters on the warped proposal windows uses SGD

(stochastic gradient descent). The last fully connected layer

is the (N+1)-way classiﬁcation layer (N: object classes, 1:

background) which is randomly initialized.

When setting positive examples and negative examples the

authors divide into two parts. The one is deﬁning the IoU

(intersection over union) overlap threshold 0.5 in ﬁne-tuning

process, below which region proposals are deﬁned as negatives

while surpass which object proposals are deﬁned as positives.

As well, the object proposals whose maximum IoU overlap

with a ground-truth class are assigned to the ground-truth

box. The other is setting parameters when training SVMs.

In contrast, only the ground-truth boxes are taken as positive

examples for their respective classes and proposals have less

than 0.3 IoU overlap with all ground-truth instances of one

class as a negative proposal for that class. Because those

proposals with overlap between 0.5 and 1 but not ground truth

expand the number of positive examples by approximately

30×, the big set can avoid overﬁtting during ﬁne-tuning the

box regression.

Another improvement is Fast R-CNN uses a RoI pooling

layer to extract a ﬁxed size feature map from region proposals

have different size. This operation with no need for warping

regions and reserves the spatial information of features of

region proposals. For fast detection, Ross Girshick uses trun-

cated SVD which accelerates the forward pass of computing

the fully connected layers.

Experiment results show that Fast R-CNN has 66.9% mAP

while R-CNN has 66.0% on PASCAL VOC 2007 dataset [4].

Training time drops to 9.5 hours as compared to R-CNN

with 84h, 9 times faster. For test rate (s/image), Fast R-CNN

with truncated SVD (0.32s) is 213× faster than R-CNN (47s).

Those experiments were proceeding on an Nvidia K40 GPU,

all of which demonstrated that Fast R-CNN did accelerate

object detection.

C. Faster R-CNN

Faster R-CNN [6] makes an improvement in region-based

CNN baseline after Fast R-CNN proposed 3 months. Fast R-

CNN uses selective search for proposing RoI, which is slow

and needs the same running time as the detection network.

Faster R-CNN replaces it with a novel RPN (region proposal

network) that is a fully convolutional network to efﬁciently

predict region proposals with a wide range of scales and

aspect ratios. RPN accelerates the generating speed of region

proposals as well as shares fully-image convolutional features

and a common set of convolutional layers with the detection

network. The procedure is simpliﬁed in Fig.3 (b). Another

novel method for different sized object detection is using

multi-scale anchors as reference. The anchors can greatly sim-

plify the process of generating various sized region proposals

with no need of multiple scales of input images or features.

On the outputs (feature maps) of the last shared convolutional

layer, sliding a ﬁxed size window (3 × 3), the center point of

each feature window is relative to a point of the original input

image which is the center point of k (3 × 3) anchor boxes.

The author set anchor boxes have 3 different scales and 3

aspect ratios. The region proposal is parameterized relative to

a reference anchor box. Then measure the distance between

predicted box and the corresponding ground truth to optimize

the location of the predicted boxes.

Experiments indicated that Faster R-CNN has greatly im-

proved both precision and detection efﬁciency. On PASCAL

VOC 2007 test set, Faster R-CNN achieved mAP of 69.9% as

compared to Fast R-CNN of 66.9% with shared convolutional

computations. As well, total running time of Faster R-CNN

(198ms) is nearly 10 times lower than Fast R-CNN (1830ms)

with the same VGG [24] backbone, and processing rate is 5fps

vs. 0.5fps.

D. Mask R-CNN

Mask R-CNN [9] is an extending work to Faster R-CNN

mainly for instance segmentation task. Regardless of the

adding parallel mask branch, Mask R-CNN can be seen a

more accurate object detector. Kaiming He et al. use Faster R-

CNN with a ResNet [23]-FPN [13] (feature pyramid network,

a backbone extracts RoI features from different levels of the

feature pyramid according to their scale) backbone for feature

extraction achieves excellent precision and processing speed.

FPN contains a bottom-up pathway and a top-down pathway

with lateral connections. The bottom-up pathway is a backbone

ConvNet which computes a feature hierarchy consisting of

feature maps at several scales with a scaling step of 2.

The top-down pathway produces higher resolution features

by upsampling spatially coarser, but semantically stronger,

feature maps from higher pyramid levels. At the beginning,

the top pyramid feature maps are captured by the output

of the last convolutional layer of the bottom-up pathway.

Each lateral connection merges feature maps of the same

spatial size from the bottom-up pathway and the top-down

pathway. While the dimensions of feature maps are different,

the 1 × 1 convolutional layer can change the dimension. Once

undergoing a lateral connection operation, there will form a

new pyramid level and predictions are independently made

on each level. Because higher-resolution feature maps are

important for detecting small objects while lower-resolution

feature maps are rich in semantic information, feature pyramid

network extracts signiﬁcant features.

Another way to improve accuracy is replacing RoI pooling

with RoIAlign for extracting a small feature map from each

RoI, as shown in Fig.2. Traditional RoI pooling quantizes

ﬂoating-number in two steps to get approximate feature values

in each bin. First, quantization was applied for calculating the

coordinate of each RoI in feature maps, given the coordinates

of RoIs in the input images and down sampling stride. Then

the authors divide RoI feature maps into bins to generate

feature maps with the same size, which is also quantized

during the process. These two quantization operations cause

misalignments between the RoI and the extracted features.

To address this, at those two steps, RoIAlign avoids any

quantization of the RoI boundaries or bins. First it computes

the ﬂoating-number of the coordinates of each RoI feature map

followed by a bilinear interpolation operation to compute the

exact values of the features at four regularly sampled locations

in each RoI bin. Then it aggregates the results using max or

average pooling to get values of each bin. Fig. 2 is an example

of RoIAlign operation.

Experiments showed that with the above two improvements

the precision got promotion. Using ResNet-FPN backbone

improved 1.7 points box AP and RoIAlign operation improved

1.1 points box AP on MS COCO detection dataset.

E. YOLO

YOLO [7] (you only look once) is a one-stage object

detector proposed by Joseph Redmon et al. after Faster R-

CNN [6]. The main contribution is real-time detecting full

images and webcam. Firstly, it is due to this pipeline only

predicts less than 100 bounding boxes per image while Fast

R-CNN using selective search predicts 2000 region proposals

per image. Secondly, YOLO frames detection as a regression

problem, so a uniﬁed architecture can extract features from

input images straightly for predicting bounding boxes and

class probabilities. YOLO base network runs at 45 frames

Fig. 2. RoIAlign operation. The ﬁrst step calculates ﬂoating number coordi-

nates of an object in the feature map. Next step utilizes bilinear interpolation

to compute the exact values of the features at four regularly sampled locations

in the separated bin.

per second with no batch processing on a Titan X GPU as

compared to Fast R-CNN at 0.5fps and Faster R-CNN at 7fps.

YOLO pipeline ﬁrst divides the input image into an S × S

grid, where a grid cell is responsible for detecting the object

whose center falls into. The conﬁdence scores multiplied by

two parts, P (object) denoting the probability of the box

contains an object and IOU (intersection over union) showing

how accurate the box contain that object. Each grid cell

predicts B bounding boxes (x, y, w, h) and conﬁdence scores

for them and C-dimension conditional class probabilities for

C categories. The feature extraction network contains 24

convolutional layers followed by 2 fully connected layers.

When pre-training on ImageNet dataset, the authors use the

ﬁrst 20 convolutional layers and an average pooling layer

followed by a fully connected layer. For detection, the whole

network is used for better performance. In order to get ﬁne-

grained visual information improving detection precision, in

detection stage double the input resolution of 224 × 224 in

pre-training stage.

The experiments showed that YOLO was not good at

accurate localization and localization error was the main

component of prediction error. Fast R-CNN makes many

background false positives mistakes while YOLO is 3 times

less than it. Training and testing on PASCAL VOC dataset,

YOLO achieves 63.4% mAP with 45 fps as compared to Fast

R-CNN (70.0% mAP, 0.5fps) and Faster R-CNN (73.2% mAP,

7fps).

F. YOLOv2

YOLOv2 [28] is a second version of YOLO [7], which

adopts many design decisions from past works with novel

concepts to improve YOLOs speed and precision.

Batch Normalization. Fixed distribution of inputs to a

ConvNet layer would have positive consequences for the

layers. It is impractical to normalize the entire training set

because the optimization step uses stochastic gradient descent.

Since SGD uses mini-batches during training, each mini-

batch produces estimates of the mean and variance of each

activation. Computing the mean and variance value of the

mini-batch of size m, then normalize the activations of number

m to have mean zero and variance 1. Finally the elements of

each mini-batch are sampled from the same distribution. This

operation can be seen as a BN layer [29] outputs activations

with the same distribution. YOLOv2 add a BN layer ahead

of each convolutional layer which accelerates the network

to get convergence and helps regularize the model. Batch

normalization gets more than 2% improvement in mAP.

High Resolution Classiﬁer. In YOLO backbone, the clas-

siﬁer adopts an input resolution of 224 × 224 then increases

the resolution to 448 for detection. This process needs the

network adjust to a new resolution inputs when switches to

object detection task. To address this, YOLOv2 adds a ﬁne-

tuning process on the classiﬁcation network at 448 × 448 for

10 epochs on ImageNet dataset which increases the mAP at

4%.

Convolutional With Anchor Boxes. In original YOLO net-

works, coordinates of predicted boxes are directly generating

by fully connected layers. Faster R-CNN uses anchor boxes as

reference to generate offsets with predicted boxes. YOLOv2

adopts this prediction mechanism and ﬁrstly removes fully

connected layers. Then it predicts class and objectness for

every anchor box. This operation increases 7% recall while

mAP decreases 0.3%.

Predicting the size and aspect ratio of anchor boxes using

dimension clusters. In Faster R-CNN the size and aspect ratio

of anchor boxes is identiﬁed empirically. For easier learning

to predict good detections, YOLOv2 uses K-means clustering

on the training set bounding boxes to automatically get good

priors. Using dimension clusters along with directly predicting

the bounding box center location improves YOLO by almost

5% over the above version with anchor boxes.

Fine-Grained Features. For localizing smaller objects, high-

resolution feature maps can provide useful information. Simi-

lar to the identity mappings in ResNet, YOLOv2 concatenates

the higher resolution features with the low resolution features

by stacking adjacent features into different channels which

gives a modest 1% performance increase.

Multi-Scale Training. For networks to be robust to run

on images of different sizes, every 10 batches the net-

work randomly chooses a new image dimension size from

{320, 352, ..., 608}. This means the same network can predict

detections at different resolutions. At high resolution detection,

YOLOv2 achieves 78.6% mAP and 40fps as compared to

YOLO with 63.4% mAP and 45fps on VOC 2007.

As well, YOLOv2 proposes a new classiﬁcation backbone

namely Darknet-19 with 19 convolutional layers and 5 max-

pooling layers which requires less operations to process an

image yet achieves high accuracy. The more competitive

YOLOv2 version has 78.6% mAP and 40fps as compared

to Faster R-CNN with ResNet backbone of 76.4% mAP and

5fps, and SSD500 has 76.8% mAP and 19fps. As mentioned

above, YOLOv2 can achieve high detecting precision while

high processing rate which beneﬁt from 7 main improvements

and a new backbone.

剩余28页未读，继续阅读

评论收藏

内容反馈

版权申诉

zzdq64

2022-12-03

资源简直太好了，完美解决了当下遇到的难题，这样的资源很难不支持~
番皂泡

2023-06-14

这篇综述的结论让我意识到，当前领域中所需的诸多原始概念和技术有些时候也能产生重大的影响。
魏水华

2023-06-14

阅读本文让我从一个初学者变成了一个系统性的了解者。
两斤香菜

2023-06-14

通过阅读本文，我对目前深度学习目标检测领域的研究趋势有了一个更全面的了解。
易烫YCC

2023-06-14

虽然作者并没有对过时和失败的方法进行评价，但他们对研究的归纳总结是十分详尽的。

前往

页

Fun_He

粉丝: 19
资源: 104

基于深度学习的目标检测综述.pdf

基于深度学习的显著性目标检测综述.pdf

人工智能论文：基于深度学习的目标检测技术综述.pdf

基于深度学习的光学遥感图像目标检测方法综述.pdf

基于深度学习的目标检测研究综述.pdf

基于深度学习的目标检测算法综述.pdf

DeepLearning深度学习教程_第八章_目标检测.pdf

基于深度学习的目标检测综述_李丹.pdf

基于深度学习的目标检测算法总览pdf文件.pdf

基于深度学习的小目标检测与识别.pdf

基于深度学习的图像增强技术综述.pdf

基于深度学习的反舰导弹目标检测研究综述.pdf

基于深度学习的图像目标检测算法综述.pdf

基于深度学习的目标检测技术综述.pdf

基于候选区域的深度学习目标检测算法综述.pdf

基于深度学习的小目标检测算法综述.pdf

基于深度学习的三维目标检测算法综述.pdf

基于深度学习的目标检测系统性文献综述.pdf

目标检测综述PPT-Object Detection in 20 Years: A Survey

基于深度学习的视频目标跟踪检测算法研究综述.pdf

基于深度学习的轮廓检测算法：综述.pdf

基于深度学习的目标检测综述20190711.pdf

基于深度学习的目标检测综述 (1).pdf

深度学习目标检测方法综述.pdf

基于深度学习的YOLO目标检测综述.pdf

最新资源