2017年整理的深度学习文献（1）_深度学习文献资源-CSDN文库

共41个文件

pdf：38个

caj：3个

深度学习

机器学习

paper

需积分: 18 108 浏览量 2017-10-29 17:31:38 上传评论 1 收藏 66.23MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

深度学习文献.rar （41个子文件）

深度学习文献

44444.pdf 457KB

融入深度学习的偏最小二乘优化方法_朱志鹏.caj 308KB

2222.pdf 1.13MB

33333.pdf 690KB

3.pdf 239KB

untitled.pdf 2.57MB

构建多尺度深度卷积神经网络行为识别模型_刘智.caj 279KB

2.pdf 1.94MB

paper

tradaboost.pdf 190KB

Why Does Unsupervised Pre-training Help Deep Learning.pdf 1.39MB

5 Efficient and Accurate Approximations of Nonlinear Convolutional Networks.pdf 540KB

Instance Normalization__The Missing Ingredient for Fast Stylization.pdf 4.52MB

vgg techincal report.pdf 195KB

V ERY D EEP C ONVOLUTIONAL N ETWORKS_FOR L ARGE -S CALE I MAGE R ECOGNITION.pdf 195KB

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning_and Large-Scale Data Collection.pdf 6.35MB

rethinking inception.pdf 505KB

xinception.pdf 928KB

1 __________________.pdf 1.1MB

Inception-v4, Inception-ResNet and_the Impact of Residual Connections on Learning.pdf 935KB

Deep Transfer Learning through_Selective Joint Fine-tuning.pdf 1.34MB

2 Batch Normalization_ Accelerating Deep Network Training by_Reducing Internal Covariate Shift.pdf 169KB

0 __________________.pdf 800KB

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.pdf 1.06MB

Multi-Scale Dense Convolutional Networks for Efficient Prediction.pdf 2.7MB

srivastava14a.pdf 2.74MB

A RBITRARY S TYLE T RANSFER IN R EAL - TIME WITH_A DAPTIVE I NSTANCE N ORMALIZATION.pdf 2.97MB

O N L ARGE -B ATCH T RAINING FOR D EEP L EARNING __G ENERALIZATION G AP AND S HARP M INIMA.pdf 699KB

OptNet_ Differentiable Optimization as a Layer in Neural Networks.pdf 533KB

crnn.pdf 307KB

Learning and Transferring Mid-Level Image Representations_using Convolutional Neural Networks.pdf 1.41MB

6 G ENERALIZING R ESIDUAL.pdf 408KB

Fully Convolutional Networks for Semantic Segmentation.pdf 2.72MB

Residual Networks Behave Like Ensembles of_Relatively Shallow Networks.pdf 597KB

Deep Neural Networks are Easily Fooled__High Confidence Predictions for Unrecognizable Images.pdf 9.52MB

Perceptual Losses for Real-Time Style Transfer_and Super-Resolution.pdf 9.81MB

DeCAF_ A Deep Convolutional Activation Feature_for Generic Visual Recognition.pdf 3.21MB

vgg.pdf 195KB

7 inception______.pdf 1.16MB

4 Delving Deep into Rectifiers_.pdf 2.18MB

1000.pdf 3.49MB

基于深度学习的图像检索系统_胡二雷.caj 1.34MB

Perceptual Losses for Real-Time Style Transfer

and Super-Resolution

Justin Johnson, Alexandre Alahi, Li Fei-Fei

{jcjohns, alahi, feifeili}@cs.stanford.edu

Department of Computer Science, Stanford University

Abstract. We consider image transformation problems, where an input

image is transformed into an output image. Recent methods for such

problems typically train feed-forward convolutional neural networks us-

ing a per-pixel loss between the output and ground-truth images. Parallel

work has shown that high-quality images can be generated by deﬁning

and optimizing perceptual loss functions based on high-level features ex-

tracted from pretrained networks. We combine the beneﬁts of both ap-

proaches, and propose the use of perceptual loss functions for training

feed-forward networks for image transformation tasks. We show results

on image style transfer, where a feed-forward network is trained to solve

the optimization problem proposed by Gatys et al in real-time. Com-

pared to the optimization-based method, our network gives similar qual-

itative results but is three orders of magnitude faster. We also experiment

with single-image super-resolution, where replacing a per-pixel loss with

a perceptual loss gives visually pleasing results.

Keywords: Style transfer, super-resolution, deep learning

1 Introduction

Many classic problems can be framed as image transformation tasks, where a

system receives some input image and transforms it into an output image. Exam-

ples from image processing include denoising, super-resolution, and colorization,

where the input is a degraded image (noisy, low-resolution, or grayscale) and the

output is a high-quality color image. Examples from computer vision include se-

mantic segmentation and depth estimation, where the input is a color image and

the output image encodes semantic or geometric information about the scene.

One approach for solving image transformation tasks is to train a feed-

forward convolutional neural network in a supervised manner, using a per-pixel

loss function to measure the diﬀerence between output and ground-truth images.

This approach has been used for example by Dong et al for super-resolution [1],

by Cheng et al for colorization [2], by Long et al for segmentation [3], and by

Eigen et al for depth and surface normal prediction [4,5]. Such approaches are

eﬃcient at test-time, requiring only a forward pass through the trained network.

However, the per-pixel losses used by these methods do not capture perceptual

diﬀerences between output and ground-truth images. For example, consider two

arXiv:1603.08155v1 [cs.CV] 27 Mar 2016

2 Johnson et al

Style Content Gatys et al [10] Ours

Ground Truth Bicubic SRCNN [11] Perceptual loss

Fig. 1. Example results for style transfer (top) and ×4 super-resolution (bottom). For

style transfer, we achieve similar results as Gatys et al [10] but are three orders of

magnitude faster. For super-resolution our method trained with a perceptual loss is

able to better reconstruct ﬁne details compared to methods trained with per-pixel loss.

identical images oﬀset from each other by one pixel; despite their perceptual

similarity they would be very diﬀerent as measured by per-pixel losses.

In parallel, recent work has shown that high-quality images can be generated

using perceptual loss functions based not on diﬀerences between pixels but in-

stead on diﬀerences between high-level image feature representations extracted

from pretrained convolutional neural networks. Images are generated by mini-

mizing a loss function. This strategy has been applied to feature inversion [6] by

Mahendran et al, to feature visualization by Simonyan et al [7] and Yosinski et

al [8], and to texture synthesis and style transfer by Gatys et al [9,10]. These

approaches produce high-quality images, but are slow since inference requires

solving an optimization problem.

In this paper we combine the beneﬁts of these two approaches. We train feed-

forward transformation networks for image transformation tasks, but rather than

using per-pixel loss functions depending only on low-level pixel information, we

train our networks using perceptual loss functions that depend on high-level

features from a pretrained loss network. During training, perceptual losses mea-

sure image similarities more robustly than per-pixel losses, and at test-time the

transformation networks run in real-time.

We experiment on two tasks: style transfer and single-image super-resolution.

Both are inherently ill-posed; for style transfer there is no single correct output,

and for super-resolution there are many high-resolution images that could have

generated the same low-resolution input. Success in either task requires semantic

reasoning about the input image. For style transfer the output must be semanti-

cally similar to the input despite drastic changes in color and texture; for super-

resolution ﬁne details must be inferred from visually ambiguous low-resolution

inputs. In principle a high-capacity neural network trained for either task could

implicitly learn to reason about the relevant semantics; however in practice we

Perceptual Losses for Real-Time Style Transfer and Super-Resolution 3

need not learn from scratch: the use of perceptual loss functions allows the trans-

fer of semantic knowledge from the loss network to the transformation network.

For style transfer our feed-forward networks are trained to solve the opti-

mization problem from [10]; our results are similar to [10] both qualitatively and

as measured by objective function value, but are three orders of magnitude faster

to generate. For super-resolution we show that replacing the per-pixel loss with

a perceptual loss gives visually pleasing results for ×4 and ×8 super-resolution.

2 Related Work

Feed-forward image transformation. In recent years, a wide variety of feed-

forward image transformation tasks have been solved by training deep convolu-

tional neural networks with per-pixel loss functions.

Semantic segmentation methods [3,5,12,13,14,15] produce dense scene labels

by running a network in a fully-convolutional manner over an input image, train-

ing with a per-pixel classiﬁcation loss. [15] moves beyond per-pixel losses by

framing CRF inference as a recurrent layer trained jointly with the rest of the

network. The architecture of our transformation networks are inspired by [3] and

[14], which use in-network downsampling to reduce the spatial extent of feature

maps followed by in-network upsampling to produce the ﬁnal output image.

Recent methods for depth [5,4,16] and surface normal estimation [5,17] are

similar in that they transform a color input image into a geometrically meaning-

ful output image using a feed-forward convolutional network trained with per-

pixel regression [4,5] or classiﬁcation [17] losses. Some methods move beyond

per-pixel losses by penalizing image gradients [5] or using a CRF loss layer [16]

to enforce local consistency in the output image. In [2] a feed-forward model is

trained using a per-pixel loss to transform grayscale images to color.

Perceptual optimization. A number of recent papers have used optimiza-

tion to generate images where the objective is perceptual, depending on high-

level features extracted from a convolutional network. Images can be generated to

maximize class prediction scores [7,8] or individual features [8] in order to under-

stand the functions encoded in trained networks. Similar optimization techniques

can also be used to generate high-conﬁdence fooling images [18,19].

Mahendran and Vedaldi [6] invert features from convolutional networks by

minimizing a feature reconstruction loss in order to understand the image in-

formation retained by diﬀerent network layers; similar methods had previously

been used to invert local binary descriptors [20] and HOG features [21].

The work of Dosovitskiy and Brox [22] is particularly relevant to ours, as they

train a feed-forward neural network to invert convolutional features, quickly

approximating a solution to the optimization problem posed by [6]. However,

their feed-forward network is trained with a per-pixel reconstruction loss, while

our networks directly optimize the feature reconstruction loss of [6].

Style Transfer. Gatys et al [10] perform artistic style transfer, combining

the content of one image with the style of another by jointly minimizing the

feature reconstruction loss of [6] and a style reconstruction loss also based on

4 Johnson et al

Input

Image

Image Transform Net

Style Target

Content Target

Loss Network (VGG-16)

Fig. 2. System overview. We train an image transformation network to transform input

images into output images. We use a loss network pretrained for image classiﬁcation

to deﬁne perceptual loss functions that measure perceptual diﬀerences in content and

style between images. The loss network remains ﬁxed during the training process.

features extracted from a pretrained convolutional network; a similar method

had previously been used for texture synthesis [9]. Their method produces high-

quality results, but is computationally expensive since each step of the opti-

mization problem requires a forward and backward pass through the pretrained

network. To overcome this computational burden, we train a feed-forward net-

work to quickly approximate solutions to their optimization problem.

Image super-resolution. Image super-resolution is a classic problem for

which a wide variety of techniques have been developed. Yang et al [23] provide

an exhaustive evaluation of the prevailing techniques prior to the widespread

adoption of convolutional neural networks. They group super-resolution tech-

niques into prediction-based methods (bilinear, bicubic, Lanczos, [24]), edge-

based methods [25,26], statistical methods [27,28,29], patch-based methods [25,30,31,32,33,34,35,36],

and sparse dictionary methods [37,38]. Recently [1] achieved excellent perfor-

mance on single-image super-resolution using a three-layer convolutional neural

network trained with a per-pixel Euclidean loss. Other recent state-of-the-art

methods include [39,40,41].

3 Method

As shown in Figure 2, our system consists of two components: an image trans-

formation network f

and a loss network φ that is used to deﬁne several loss

functions `

, . . . , `

. The image transformation network is a deep residual convo-

lutional neural network parameterized by weights W; it transforms input images

x into output images ˆy via the mapping ˆy = f

(x). Each loss function computes

a scalar value `

(ˆy, y

) measuring the diﬀerence between the output image ˆy and

a target image y

. The image transformation network is trained using stochastic

gradient descent to minimize a weighted combination of loss functions:

∗

= arg min

x,{y

}

i=1

(x), y

)

(1)

Perceptual Losses for Real-Time Style Transfer and Super-Resolution 5

To address the shortcomings of per-pixel losses and allow our loss functions

to better measure perceptual and semantic diﬀerences between images, we draw

inspiration from recent work that generates images via optimization [6,7,8,9,10].

The key insight of these methods is that convolutional neural networks pre-

trained for image classiﬁcation have already learned to encode the perceptual

and semantic information we would like to measure in our loss functions. We

therefore make use of a network φ which as been pretrained for image classi-

ﬁcation as a ﬁxed loss network in order to deﬁne our loss functions. Our deep

convolutional transformation network is then trained using loss functions that

are also deep convolutional networks.

The loss network φ is used to deﬁne a feature reconstruction loss `

feat

and

a style reconstruction loss `

style

that measure diﬀerences in content and style

between images. For each input image x we have a content target y

and a style

target y

. For style transfer, the content target y

is the input image x and the

output image ˆy should combine the content of x = y

with the style of y

; we

train one network per style target. For single-image super-resolution, the input

image x is a low-resolution input, the content target y

is the ground-truth high-

resolution image, and the style reconstruction loss is not used; we train one

network per super-resolution factor.

3.1 Image Transformation Networks

Our image transformation networks roughly follow the architectural guidelines

set forth by Radford et al [42]. We do not use any pooling layers, instead using

strided and fractionally strided convolutions for in-network downsampling and

upsampling. Our network body consists of ﬁve residual blocks [43] using the ar-

chitecture of [44]. All non-residual convolutional layers are followed by spatial

batch normalization [45] and ReLU nonlinearities with the exception of the out-

put layer, which instead uses a scaled tanh to ensure that the output image has

pixels in the range [0, 255]. Other than the ﬁrst and last layers which use 9 × 9

kernels, all convolutional layers use 3 × 3 kernels. The exact architectures of all

our networks can be found in the supplementary material.

Inputs and Outputs. For style transfer the input and output are both color

images of shape 3 × 256 × 256. For super-resolution with an upsampling factor

of f, the output is a high-resolution image patch of shape 3 × 288 × 288 and

the input is a low-resolution patch of shape 3 × 288/f × 288/f. Since the image

transformation networks are fully-convolutional, at test-time they can be applied

to images of any resolution.

Downsampling and Upsampling. For super-resolution with an upsampling

factor of f, we use several residual blocks followed by log

f convolutional layers

with stride 1/2. This is diﬀerent from [1] who use bicubic interpolation to up-

sample the low-resolution input before passing it to the network. Rather than

relying on a ﬁxed upsampling function, fractionally-strided convolution allows

the upsampling function to be learned jointly with the rest of the network.

评论收藏

内容反馈

moses1994

粉丝: 101
资源: 22

2017年整理的深度学习文献（1）

深度学习参考文献1

深度学习参考文献

深度学习经典文献

深度强化学习必读文献

深度学习经典文献打包文献

深度学习最全学习干货（23篇经典文献）

深度学习文献11篇，自己整理的。

深度学习经典论文合集含部分翻译

深度学习经典论文7篇

基于深度学习的中文标准文献语言模型.pdf

文献汇报.pptx 深度学习

基于CiteSpace可视化的国内深度学习文献综述.pdf

基于深度学习的生物医学英文文献中中国学者的身份识别.pdf

时间序列预测与深度学习：文献综述与应用实例.pdf

深度学习经典论文.docx

深度学习卷积神经网络图像参考文献

5篇在线学习（E-Learning）相关英语文献

深度学习-LeCun、Bengio和Hinton的联合综述外文文献及中文翻译

深度学习Deep learing英文论文

基于文献计量的深度学习论文影响力分析及知识图谱.pdf

专业群背景下大学本科生深度学习文献综述.pdf

深度学习的最优化：理论和算法综述论文【包含257篇文献】.zip

基于深度学习的目标检测系统性文献综述.pdf

Python-面向推荐系统的深度学习文献列表

基于中国知网数据库的深度学习文献计量分析.pdf

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

最新资源