全卷积文献中文翻译版资源-CSDN文库

fcn

1星需积分: 12 59 浏览量 2018-05-31 11:28:49 上传评论收藏 2.66MB PDF 举报

资源推荐

资源详情

资源评论

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long

∗

Evan Shelhamer

∗

Trevor Darrell

UC Berkeley

{jonlong,shelhamer,trevor}@cs.berkeley.edu

Abstract

Convolutional networks are powerful visual models that

yield hierarchies of features. We show that convolu-

tional networks by themselves, trained end-to-end, pixels-

to-pixels, exceed the state-of-the-art in semantic segmen-

tation. Our key insight is to build “fully convolutional”

networks that take input of arbitrary size and produce

correspondingly-sized output with efﬁcient inference and

learning. We deﬁne and detail the space of fully convolu-

tional networks, explain their application to spatially dense

prediction tasks, and draw connections to prior models. We

adapt contemporary classiﬁcation networks (AlexNet [22],

the VGG net [34], and GoogLeNet [35]) into fully convolu-

tional networks and transfer their learned representations

by ﬁne-tuning [5] to the segmentation task. We then deﬁne a

skip architecture that combines semantic information from

a deep, coarse layer with appearance information from a

shallow, ﬁne layer to produce accurate and detailed seg-

mentations. Our fully convolutional network achieves state-

of-the-art segmentation of PASCAL VOC (20% relative im-

provement to 62.2% mean IU on 2012), NYUDv2, and SIFT

Flow, while inference takes less than one ﬁfth of a second

for a typical image.

1. Introduction

Convolutional networks are driving advances in recog-

nition. Convnets are not only improving for whole-image

classiﬁcation [22, 34, 35], but also making progress on lo-

cal tasks with structured output. These include advances

in bounding box object detection [32, 12, 19], part and key-

point prediction [42, 26], and local correspondence [26, 10].

The natural next step in the progression from coarse to

ﬁne inference is to make a prediction at every pixel. Prior

approaches have used convnets for semantic segmentation

[30, 3, 9, 31, 17, 15, 11], in which each pixel is labeled with

the class of its enclosing object or region, but with short-

comings that this work addresses.

∗

Authors contributed equally

384

256

4096

backward/learning

forward/inference

pixelwise prediction

segmentation g.t.

256

384

Figure 1. Fully convolutional networks can efﬁciently learn to

make dense predictions for per-pixel tasks like semantic segmen-

tation.

We show that a fully convolutional network (FCN)

trained end-to-end, pixels-to-pixels on semantic segmen-

tation exceeds the state-of-the-art without further machin-

ery. To our knowledge, this is the ﬁrst work to train FCNs

end-to-end (1) for pixelwise prediction and (2) from super-

vised pre-training. Fully convolutional versions of existing

networks predict dense outputs from arbitrary-sized inputs.

Both learning and inference are performed whole-image-at-

a-time by dense feedforward computation and backpropa-

gation. In-network upsampling layers enable pixelwise pre-

diction and learning in nets with subsampled pooling.

This method is efﬁcient, both asymptotically and abso-

lutely, and precludes the need for the complications in other

works. Patchwise training is common [30, 3, 9, 31, 11], but

lacks the efﬁciency of fully convolutional training. Our ap-

proach does not make use of pre- and post-processing com-

plications, including superpixels [9, 17], proposals [17, 15],

or post-hoc reﬁnement by random ﬁelds or local classiﬁers

[9, 17]. Our model transfers recent success in classiﬁca-

tion [22, 34, 35] to dense prediction by reinterpreting clas-

siﬁcation nets as fully convolutional and ﬁne-tuning from

their learned representations. In contrast, previous works

have applied small convnets without supervised pre-training

[9, 31, 30].

Semantic segmentation faces an inherent tension be-

tween semantics and location: global information resolves

what while local information resolves where. Deep feature

hierarchies encode location and semantics in a nonlinear

local-to-global pyramid. We deﬁne a skip architecture to

take advantage of this feature spectrum that combines deep,

coarse, semantic information and shallow, ﬁne, appearance

information in Section 4.2 (see Figure 3).

In the next section, we review related work on deep clas-

siﬁcation nets, FCNs, and recent approaches to semantic

segmentation using convnets. The following sections ex-

plain FCN design and dense prediction tradeoffs, introduce

our architecture with in-network upsampling and multi-

layer combinations, and describe our experimental frame-

work. Finally, we demonstrate state-of-the-art results on

PASCAL VOC 2011-2, NYUDv2, and SIFT Flow.

2. Related work

Our approach draws on recent successes of deep nets

for image classiﬁcation [22, 34, 35] and transfer learning

[5, 41]. Transfer was ﬁrst demonstrated on various visual

recognition tasks [5, 41], then on detection, and on both

instance and semantic segmentation in hybrid proposal-

classiﬁer models [12, 17, 15]. We now re-architect and ﬁne-

tune classiﬁcation nets to direct, dense prediction of seman-

tic segmentation. We chart the space of FCNs and situate

prior models, both historical and recent, in this framework.

Fully convolutional networks To our knowledge, the

idea of extending a convnet to arbitrary-sized inputs ﬁrst

appeared in Matan et al. [28], which extended the classic

LeNet [23] to recognize strings of digits. Because their net

was limited to one-dimensional input strings, Matan et al.

used Viterbi decoding to obtain their outputs. Wolf and Platt

[40] expand convnet outputs to 2-dimensional maps of de-

tection scores for the four corners of postal address blocks.

Both of these historical works do inference and learning

fully convolutionally for detection. Ning et al. [30] deﬁne

a convnet for coarse multiclass segmentation of C. elegans

tissues with fully convolutional inference.

Fully convolutional computation has also been exploited

in the present era of many-layered nets. Sliding window

detection by Sermanet et al. [32], semantic segmentation

by Pinheiro and Collobert [31], and image restoration by

Eigen et al. [6] do fully convolutional inference. Fully con-

volutional training is rare, but used effectively by Tompson

et al. [38] to learn an end-to-end part detector and spatial

model for pose estimation, although they do not exposit on

or analyze this method.

Alternatively, He et al. [19] discard the non-

convolutional portion of classiﬁcation nets to make a

feature extractor. They combine proposals and spatial

pyramid pooling to yield a localized, ﬁxed-length feature

for classiﬁcation. While fast and effective, this hybrid

model cannot be learned end-to-end.

Dense prediction with convnets Several recent works

have applied convnets to dense prediction problems, includ-

ing semantic segmentation by Ning et al. [30], Farabet et al.

[9], and Pinheiro and Collobert [31]; boundary prediction

for electron microscopy by Ciresan et al. [3] and for natu-

ral images by a hybrid convnet/nearest neighbor model by

Ganin and Lempitsky [11]; and image restoration and depth

estimation by Eigen et al. [6, 7]. Common elements of these

approaches include

• small models restricting capacity and receptive ﬁelds;

• patchwise training [30, 3, 9, 31, 11];

• post-processing by superpixel projection, random ﬁeld

regularization, ﬁltering, or local classiﬁcation [9, 3, 11];

• input shifting and output interlacing for dense output [32,

31, 11];

• multi-scale pyramid processing [9, 31, 11];

• saturating tanh nonlinearities [9, 6, 31]; and

• ensembles [3, 11],

whereas our method does without this machinery. However,

we do study patchwise training 3.4 and “shift-and-stitch”

dense output 3.2 from the perspective of FCNs. We also

discuss in-network upsampling 3.3, of which the fully con-

nected prediction by Eigen et al. [7] is a special case.

Unlike these existing methods, we adapt and extend deep

classiﬁcation architectures, using image classiﬁcation as su-

pervised pre-training, and ﬁne-tune fully convolutionally to

learn simply and efﬁciently from whole image inputs and

whole image ground thruths.

Hariharan et al. [17] and Gupta et al. [15] likewise adapt

deep classiﬁcation nets to semantic segmentation, but do

so in hybrid proposal-classiﬁer models. These approaches

ﬁne-tune an R-CNN system [12] by sampling bounding

boxes and/or region proposals for detection, semantic seg-

mentation, and instance segmentation. Neither method is

learned end-to-end. They achieve state-of-the-art segmen-

tation results on PASCAL VOC and NYUDv2 respectively,

so we directly compare our standalone, end-to-end FCN to

their semantic segmentation results in Section 5.

We fuse features across layers to deﬁne a nonlinear local-

to-global representation that we tune end-to-end. In con-

temporary work Hariharan et al. [18] also use multiple lay-

ers in their hybrid model for semantic segmentation.

3. Fully convolutional networks

Each layer of data in a convnet is a three-dimensional

array of size h × w × d, where h and w are spatial dimen-

sions, and d is the feature or channel dimension. The ﬁrst

layer is the image, with pixel size h × w, and d color chan-

nels. Locations in higher layers correspond to the locations

in the image they are path-connected to, which are called

their receptive ﬁelds.

Convnets are built on translation invariance. Their ba-

sic components (convolution, pooling, and activation func-

tions) operate on local input regions, and depend only on

relative spatial coordinates. Writing x

for the data vector

at location (i, j) in a particular layer, and y

for the follow-

剩余9页未读，继续阅读

评论收藏

内容反馈

WQL257

2022-01-11

哪里翻译了，全是英文

Lucian_s

粉丝: 2
资源: 3

全卷积文献中文翻译版

卷积神经网络相关文献整理

深度学习卷积神经网络图像参考文献

Python-图卷积网络相关文献汇总

cocos2d游戏开发教程(国外书籍译文)

国外的cocos2d教学翻译。第一弹：第1节到第5节。

2D游戏引擎的介绍和摘抄以及参考文献，基于C++

fcn全卷积网络代码

YOLOX原文献中文翻译版pdf

Java程序的简单介绍 英文文献及中文翻译.doc

传感器技术外文文献及中文翻译.doc

JAVA方面计算机专业开题报告外文文献及中文翻译.pdf

基于单片机的温度控制外文文献及中文翻译.doc

全卷积神经网络研究综述.pdf

全卷积网络模型源码

卷积神经网络英文版综述

基于深度学习的全卷积网络图像裂纹检测.pdf

全卷积神经网络实战.py

关于JSP的英文技术文献及中文翻译

jsp相关英文文献及对应中文翻译

单片机论文外文文献和中文翻译(有出处).doc

外文文献及中文翻译-Access2000-Relational-Database.doc

图像检索毕业设计开题报告+文献综述+外文翻译

基于全卷积神经网络的灌区无人机正射影像渠系提取.pdf

基于全卷积神经网络的手术器械图像语义分割算法.pdf

图像识别外文文献翻译.doc

单片机的外文文献及中文翻译.doc

.net文献 英文原文+中文翻译（1万字）

最新资源

Java程序的简单介绍英文文献及中文翻译.doc

.net文献英文原文+中文翻译（1万字）