【免费】Fast-RCNN-孙超1资源-CSDN文库

需积分: 0 179 浏览量 2022-08-03 14:25:13 上传评论收藏 10.42MB PDF 举报

《Fast R-CNN - 孙超1》 Fast R-CNN是Ross Girshick提出的一种高效区域卷积网络（Region-based Convolutional Network）方法，专门用于目标检测。相较于以往的工作，Fast R-CNN在提高训练和测试速度的同时，也提升了检测精度。此方法能够快速有效地利用深度卷积网络对对象提议进行分类。 Fast R-CNN的主要创新点包括： 1. **联合学习分类与定位**：Fast R-CNN采用了一种单阶段的训练算法，可以同时学习对对象提议的分类和其空间位置的精炼。这解决了以往多阶段训练过程中时间和效率的问题。 2. **提高训练速度**：与R-CNN相比，Fast R-CNN的训练速度提高了9倍，使用非常深的VGG16网络进行训练时更为显著。 3. **提升测试速度**：在测试阶段，Fast R-CNN的速度比R-CNN快了213倍，比SPPnet快了10倍，这极大地优化了实时应用的性能。 4. **精度提升**：Fast R-CNN在PASCAL VOC 2012数据集上的平均精度（mAP）表现更优，体现了其在准确率方面的优势。 5. **代码开源**：Fast R-CNN的实现基于Python和C++（使用Caffe库），并在MIT开源许可证下提供，方便研究者和开发者使用。 **深度卷积网络在目标检测中的作用**：深度卷积网络（ConvNets）近年来在图像分类和目标检测任务上取得了显著的进步。相比于图像分类，目标检测需要处理更多的复杂性，例如精确地定位物体。因此，现有的方法通常采用多阶段的训练管道，这既慢又不够优雅。 **解决挑战**：目标检测的两个主要挑战是处理大量的候选物体位置（通常称为“提议”）和将这些粗略的位置精细化。Fast R-CNN通过改进的训练策略，能够在保持速度的同时，平衡这两个挑战，实现更精确的定位和更高的分类准确性。 **Fast R-CNN与R-CNN和SPPnet的比较**： - 相较于R-CNN，Fast R-CNN在训练速度上有显著提升，且在测试阶段的运行速度更快，表明其在处理速度上的优势。 - 与SPPnet相比，Fast R-CNN在训练VGG16网络时速度提升了3倍，测试速度提升了10倍，并且在精度上更胜一筹。 Fast R-CNN通过集成和优化的训练算法，简化了深度学习目标检测的流程，显著提升了训练和测试效率，成为当时最先进的技术之一，为后续的YOLO、Faster R-CNN等方法的发展奠定了基础。

资源详情

资源评论

资源推荐

Fast R-CNN

Ross Girshick

Microsoft Research

rbg@microsoft.com

Abstract

This paper proposes a Fast Region-based Convolutional

Network method (Fast R-CNN) for object detection. Fast

R-CNN builds on previous work to efﬁciently classify ob-

ject proposals using deep convolutional networks. Com-

pared to previous work, Fast R-CNN employs several in-

novations to improve training and testing speed while also

increasing detection accuracy. Fast R-CNN trains the very

deep VGG16 network 9⇥ faster than R-CNN, is 213⇥ faster

at test-time, and achieves a higher mAP on PASCAL VOC

2012. Compared to SPPnet, Fast R-CNN trains VGG16 3⇥

faster, tests 10⇥ faster, and is more accurate. Fast R-CNN

is implemented in Python and C++ (using Caffe) and is

available under the open-source MIT License at https:

//github.com/rbgirshick/fast-rcnn.

1. Introduction

Recently, deep ConvNets [14, 16] have signiﬁcantly im-

proved image classiﬁcation [14] and object detection [9, 19]

accuracy. Compared to image classiﬁcation, object detec-

tion is a more challenging task that requires more com-

plex methods to solve. Due to this complexity, current ap-

proaches ( e.g., [9, 11, 19, 25]) train models in multi-stage

pipelines that are slow and inelegant.

Complexity arises because detection requires the ac-

curate localization of objects, creating two primary chal-

lenges. First, numerous candidate object locations (often

called “proposals”) must be processed. Second, these can-

didates provide only rough localization that must be reﬁned

to achieve precise localization. Solutions to these problems

often compromise speed, accuracy, or simplicity.

In this paper, we streamline the training process for state-

of-the-art ConvNet-based object detectors [9, 11]. We pro-

pose a single-stage training algorithm that jointly learns to

classify object proposals and reﬁne their spatial locations.

The resulting method can train a very deep detection

network (VGG16 [20]) 9⇥ faster than R-CNN [9] and 3⇥

faster than SPPnet [11]. At runtime, the detection network

processes images in 0.3s (excluding object proposal time)

while achieving top accuracy on PASCAL VOC 2012 [7]

with a mAP of 66% (vs. 62% for R-CNN).

1.1. R-CNN and SPPnet

The Region-based Convolutional Network method (R-

CNN) [9] achieves excellent object detection accuracy by

using a deep ConvNet to classify object proposals. R-CNN,

however, has notable drawbacks:

1. Training is a multi-stage pipeline. R-CNN ﬁrst ﬁne-

tunes a ConvNet on object proposals using log loss.

Then, it ﬁts SVMs to ConvNet features. These SVMs

act as object detectors, replacing the softmax classi-

ﬁer learnt by ﬁne-tuning. In the third training stage,

bounding-box regressors are learned.

2. Training is expensive in space and time. For SVM

and bounding-box regressor training, features are ex-

tracted from each object proposal in each image and

written to disk. With very deep networks, such as

VGG16, this process takes 2.5 GPU-days for the 5k

images of the VOC07 trainval set. These features re-

quire hundreds of gigabytes of storage.

3. Object detection is slow. At test-time, features are

extracted from each object proposal in each test image.

Detection with VGG16 takes 47s / image (on a GPU).

R-CNN is slow because it performs a ConvNet forward

pass for each object proposal, without sharing computation.

Spatial pyramid pooling networks (SPPnets) [11] were pro-

posed to speed up R-CNN by sharing computation. The

SPPnet method computes a convolutional feature map for

the entire input image and then classiﬁes each object pro-

posal using a feature vector extracted from the shared fea-

ture map. Features are extracted for a proposal by max-

pooling the portion of the feature map inside the proposal

into a ﬁxed-size output (e.g., 6 ⇥ 6). Multiple output sizes

are pooled and then concatenated as in spatial pyramid pool-

ing [15]. SPPnet accelerates R-CNN by 10 to 100⇥ at test

time. Training time is also reduced by 3⇥ due to faster pro-

posal feature extraction.

All timings use one Nvidia K40 GPU overclocked to 875 MHz.

arXiv:1504.08083v2 [cs.CV] 27 Sep 2015

SPPnet also has notable drawbacks. Like R-CNN, train-

ing is a multi-stage pipeline that involves extracting fea-

tures, ﬁne-tuning a network with log loss, training SVMs,

and ﬁnally ﬁtting bounding-box regressors. Features are

also written to disk. But unlike R-CNN, the ﬁne-tuning al-

gorithm proposed in [11] cannot update the convolutional

layers that precede the spatial pyramid pooling. Unsurpris-

ingly, this limitation (ﬁxed convolutional layers) limits the

accuracy of very deep networks.

1.2. Contributions

We propose a new training algorithm that ﬁxes the disad-

vantages of R-CNN and SPPnet, while improving on their

speed and accuracy. We call this method Fast R-CNN be-

cause it’s comparatively fast to train and test. The Fast R-

CNN method has several advantages:

1. Higher detection quality (mAP) than R-CNN, SPPnet

2. Training is single-stage, using a multi-task loss

3. Training can update all network layers

4. No disk storage is required for feature caching

Fast R-CNN is written in Python and C++ (Caffe

[13]) and is available under the open-source MIT Li-

cense at https://github.com/rbgirshick/

fast-rcnn.

2. Fast R-CNN architecture and training

Fig. 1 illustrates the Fast R-CNN architecture. A Fast

R-CNN network takes as input an entire image and a set

of object proposals. The network ﬁrst processes the whole

image with several convolutional (conv) and max pooling

layers to produce a conv feature map. Then, for each ob-

ject proposal a region of interest (RoI) pooling layer ex-

tracts a ﬁxed-length feature vector from the feature map.

Each feature vector is fed into a sequence of fully connected

(fc) layers that ﬁnally branch into two sibling output lay-

ers: one that produces softmax probability estimates over

K object classes plus a catch-all “background” class and

another layer that outputs four real-valued numbers for each

of the K object classes. Each set of 4 values encodes reﬁned

bounding-box positions for one of the K classes.

2.1. The RoI pooling layer

The RoI pooling layer uses max pooling to convert the

features inside any valid region of interest into a small fea-

ture map with a ﬁxed spatial extent of H ⇥ W (e.g., 7 ⇥ 7),

where H and W are layer hyper-parameters that are inde-

pendent of any particular RoI. In this paper, an RoI is a

rectangular window into a conv feature map. Each RoI is

deﬁned by a four-tuple ( r, c, h, w) that speciﬁes its top-left

corner (r, c) and its height and width (h, w).

Deep

ConvNet

Conv

feature map

RoI

projection

RoI

pooling

layer

FCs

RoI feature

vector

softmax

bbox

regressor

Outputs:

For each RoI

Figure 1. Fast R-CNN architecture. An input image and multi-

ple regions of interest (RoIs) are input into a fully convolutional

network. Each RoI is pooled into a ﬁxed-size feature map and

then mapped to a feature vector by fully connected layers (FCs).

The network has two output vectors per RoI: softmax probabilities

and per-class bounding-box regression offsets. The architecture is

trained end-to-end with a multi-task loss.

RoI max pooling works by dividing the h ⇥ w RoI win-

dow into an H ⇥ W grid of sub-windows of approximate

size h/H ⇥ w/W and then max-pooling the values in each

sub-window into the corresponding output grid cell. Pool-

ing is applied independently to each feature map channel,

as in standard max pooling. The RoI layer is simply the

special-case of the spatial pyramid pooling layer used in

SPPnets [11] in which there is only one pyramid level. We

use the pooling sub-window calculation given in [11].

2.2. Initializing from pre-trained networks

We experiment with three pre-trained ImageNet [4] net-

works, each with ﬁve max pooling layers and between ﬁve

and thirteen conv layers (see Section 4.1 for network de-

tails). When a pre-trained network initializes a Fast R-CNN

network, it undergoes three transformations.

First, the last max pooling layer is replaced by a RoI

pooling layer that is conﬁgured by setting H and W to be

compatible with the net’s ﬁrst fully connected layer (e.g.,

H = W =7for VGG16).

Second, the network’s last fully connected layer and soft-

max (which were trained for 1000-way ImageNet classiﬁ-

cation) are replaced with the two sibling layers described

earlier (a fully connected layer and softmax over K +1cat-

egories and category-speciﬁc bounding-box regressors).

Third, the network is modiﬁed to take two data inputs: a

list of images and a list of RoIs in those images.

2.3. Fine-tuning for detection

Training all network weights with back-propagation is an

important capability of Fast R-CNN. First, let’s elucidate

why SPPnet is unable to update weights below the spatial

pyramid pooling layer.

The root cause is that back-propagation through the SPP

layer is highly inefﬁcient when each training sample (i.e.

RoI) comes from a different image, which is exactly how

R-CNN and SPPnet networks are trained. The inefﬁciency

剩余8页未读，继续阅读

评论收藏

内容反馈

正版胡一星

粉丝: 26
资源: 304

Fast-RCNN-孙超1

评论0

最新资源

Fast-RCNN-孙超1

评论0

R-CNN-孙超1

You Only Look Once- Unified, Real-Time Object Detection-孙超1

OHEM-孙超1

参考资料-大功率LLC谐振变换器中谐振电感的优化研究-孙超.zip

基于深度学习的软件实体识别方法_孙超.caj

物资租赁系统+孙超+丁亮+方文博+王晨旭.zip

Improvement of the coercivity and corrosion resistance of the Nd-Fe-B sintered magnets by Al nano-particles doped

【景观大数据】各式商业街模型 55个 SU模型库SU模型库.rar

DDEA-DEV:DDEA开发仓库

电源DCDC开关电源设计LLC谐振变换器技术资料论文资料50个合集.zip

论文研究-养鸡场单片机灯光控制电路设计 .pdf

辽中凹陷北段断裂差异演化特征

用户均衡与系统最优原则下交通分配模型的建立与分析

模糊控制算法在干道信号协调控制中的应用研究

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Chrome Header Editor 插件

Goby红队版-win-x64-2.4.7版本

软件工程导论(第六版)课后习题答案1

OpenVAS GVM 中文翻译补丁

第四届网鼎杯赛前训练(20241019)

安全认证cisp教材全套

最新资源