【免费】2019-ICLR-PoolingIsNeitherNecessarynorSufficientforApprop_cannotcreateplatformopenglcontext资源-CSDN文库

需积分: 0 145 浏览量 2022-08-04 14:08:51 上传评论收藏 4.54MB PDF 举报

资源详情

资源评论

资源推荐

Under review as a conference paper at ICLR 2019

POOLING IS NEITHER NECESSARY NOR SUFFICIENT

FOR APPROPRIATE DEFORMATION STABILITY IN CNNS

Anonymous authors

Paper under double-blind review

ABSTRACT

Many of our core assumptions about how neural networks operate remain empiri-

cally untested. One common assumption is that convolutional neural networks need

to be stable to small translations and deformations to solve image recognition tasks.

For many years, this stability was baked into CNN architectures by incorporating

interleaved pooling layers. Recently, however, interleaved pooling has largely been

abandoned. This raises a number of questions: Are our intuitions about deforma-

tion stability right at all? Is it important? Is pooling necessary for deformation

invariance? If not, how is deformation invariance achieved in its absence? In this

work, we rigorously test these questions, and ﬁnd that deformation stability in

convolutional networks is more nuanced than it ﬁrst appears: (1) Deformation

invariance is not a binary property, but rather that different tasks require different

degrees of deformation stability at different layers. (2) Deformation stability is

not a ﬁxed property of a network and is heavily adjusted over the course of train-

ing, largely through the smoothness of the convolutional ﬁlters. (3) Interleaved

pooling layers are neither necessary nor sufﬁcient for achieving the optimal form

of deformation stability for natural image classiﬁcation. (4) Pooling confers too

much deformation stability for image classiﬁcation at initialization, and during

training, networks have to learn to counteract this inductive bias. Together, these

ﬁndings provide new insights into the role of interleaved pooling and deformation

invariance in CNNs, and demonstrate the importance of rigorous empirical testing

of even our most basic assumptions about the working of neural networks.

1 INTRODUCTION

Within deep learning, a variety of intuitions have been assumed to be common knowledge without

empirical veriﬁcation, leading to recent active debate (Rahimi & Recht, 2017; LeCun, 2017; Sculley

et al., 2015; 2018). Nevertheless, many of these core ideas have informed the structure of broad

classes of models, with little attempt to rigorously test these assumptions.

In this paper, we seek to address this issue by undertaking a careful, empirical study of one of the

foundational intuitions informing convolutional neural networks (CNNs) for visual object recognition:

the need to make these models stable to small translations and deformations in the input images. This

intuition runs as follows: much of the variability in the visual domain comes from slight changes in

view, object position, rotation, size, and non-rigid deformations of (e.g.) organic objects; represen-

tations which are invariant to such transformations would (presumably) lead to better performance.

This idea is arguably one of the core principles initially responsible for the architectural choices of

convolutional ﬁlters and interleaved pooling LeCun et al. (1998; 2015), as well as the deployment

of parametric data augmentation strategies during training Simard et al. (2003). Yet, despite the

widespread impact of this idea, the relationship between visual object recognition and deformation

stability has not been thoroughly tested, and we do not actually know how modern CNNs realize

deformation stability, if they even do at all.

Moreover, for many years, the very success of CNNs on visual object recognition tasks was thought

to depend on the interleaved pooling layers that purportedly rendered these models insensitive to

small translations and deformations. However, despite this reasoning, recent models have largely

abandoned interleaved pooling layers, achieving similar or greater success without them Springenberg

et al. (2014); He et al. (2016).

Under review as a conference paper at ICLR 2019

These observations raise several critical questions. Is deformation stability necessary for visual object

recognition? If so, how is it achieved in the absence of pooling layers? What role does interleaved

pooling play when it is present?

Here, we seek to answer these questions by building a broad class of image deformations, and

comparing CNNs’ responses to original and deformed images. While this class of deformations is

an artiﬁcial one, it is rich and parametrically controllable, includes many commonly used image

transformations (including afﬁne transforms: translations, shears, and rotations, and thin-plate spline

transforms, among others) and it provides a useful model for probing how CNNs might respond to

natural image deformations. We use these to study CNNs with and without pooling layers, and how

their representations change with depth and over the course of training. Our contributions are as

follows:

•

Networks without pooling are sensitive to deformation at initialization, but ultimately learn

representations that are stable to deformation.

•

The inductive bias provided by pooling is too strong at initialization, and deformation

stability in these networks decrease over the course of training.

•

The pattern of deformation stability across layers for trained networks with and without

pooling converges to a similar structure.

•

Networks both with and without pooling implement and modulate deformation stability

largely through the smoothness of learned ﬁlters.

More broadly, this work demonstrates that our intuitions as to why neural networks work can often be

inaccurate, no matter how reasonable they may seem, and require thorough empirical and theoretical

validation.

1.1 PRIOR WORK

Invariances in non-neural models.

There is a long history of non-neural computer vision models

architecting invariance to deformation. For example, SIFT features are local features descriptors

constructed such that they are invariant to translation, scaling and rotation Lowe (1999). In addition,

by using blurring, SIFT features become somewhat robust to deformations. Another example is the

deformable parts models which contain a single stage spring-like model of connections between pairs

of object parts giving robustness to translation at a particular scale Felzenszwalb et al. (2008).

Deformation invariance and pooling.

Important early work in neuroscience found that in the visual

cortex of cats, there exist special complex-cells which are somewhat insensitive to the precise location

of edges Hubel & Wiesel (1968). These ﬁndings inspired work on the neocognitron, which cascaded

locally-deformation-invariant modules into a hierarchy Fukushima & Miyake (1982). This, in turn,

inspired the use of pooling layers in CNNs LeCun et al. (1990). Here, pooling was directly motivated

as conferring invariance to translations and deformations. For example, LeCun et al. (1990) expressed

this as follows: Each feature extraction in our network is followed by an additional layer which

performs a local averaging and a sub-sampling, reducing the resolution of the feature map. This

layer introduces a certain level of invariance to distortions and translations. In fact, until recently,

pooling was still seen as an essential ingredient in CNNs, allowing for invariance to small shifts and

distortions Simonyan & Zisserman (2014); He et al. (2016); Krizhevsky et al. (2012); Simonyan &

Zisserman (2014); LeCun et al. (2015); Giusti et al. (2013).

Previous theoretical analyses of invariances in CNNs.

A signiﬁcant body of theoretical work

shows formally that scattering networks, which share some architectural components with CNNs, are

stable to deformations Mallat (2012); Sifre & Mallat (2013); Bruna & Mallat (2013); Mallat (2016).

However this work does not apply to widely used CNN architectures for two reasons. First, there are

signiﬁcant architectural differences, including in connectivity, pooling, and non-linearities. Second,

and perhaps more importantly, this line of work assumes that the ﬁlters are ﬁxed wavelets that do not

change during training.

The more recent theoretical study of Bietti & Mairal (2017) uses reproducing kernel Hilbert spaces

to study the inductive biases (including deformation stability) of architectures more similar to the

CNNs used in practice. However, this work assumes the use of interleaved pooling layers between the

convolutional layers, and cannot explain the success of more recent architectures which lack them.

Under review as a conference paper at ICLR 2019

(a) (b)

Figure 1: (a)

Generating deformed images

: To randomly deform an image we: (i) Start with a ﬁxed

evenly spaced grid of control points (here 4x4 control points) and then choose a random source for

each control point within a neighborhood of the point; (ii) we then smooth the resulting vector ﬁeld

using thin plate interpolation; (iii) vector ﬁeld overlayed on original image: the value in the ﬁnal

result at the tip of an arrow is computed using bilinear interpolation of values in a neighbourhood

around the tail of the arrow in the original image; (iv) the ﬁnal result. (b)

Examples of deformed

ImageNet images.

left: original images, right: deformed images. While the images have changed

signiﬁcantly, for example under the

metric, they would likely be given the same label by a human.

Empirical investigations.

Previous empirical investigations of these phenomena in CNNs include

the work of Lenc & Vedaldi (2015), which focused on a more limited set of invariances such as

global afﬁne transformations. More recently, there has been interest in the robustness of networks

to adversarial geometric transformations in the work of Fawzi & Frossard (2015) and Kanbak et al.

(2017). In particular, these studies looked at worst-case sensitivity of the output to such transforma-

tions, and found that CNNs can indeed be quite sensitive to particular geometric transformations (a

phenomenon that can be mitigated by augmenting the training sets). However, this line of work does

not address how deformation sensitivity is generally achieved in the ﬁrst place, and how it changes

over the course of training. In addition, these investigations have been restricted to a limited class of

deformations, which we seek to remedy here.

2 METHODS

2.1 DEFORMATION SENSITIVITY

In order to study how CNN representations are affected by image deformations we ﬁrst need a

controllable source of deformation. Here, we choose a ﬂexible class of local deformations of image

coordinates, i.e., maps

τ : R

→ R

such that

k∇τk

∞

< C

for some

, similar to Mallat (2012).

We choose this class for several reasons. First, it subsumes or approximates many of the canonical

forms of image deformation we would want to be robust to, including:

• Pose: Small shifts in pose or location of subparts

• Afﬁne transformations: translation, scaling, rotation or shear

• Thin-plate spline transforms

• Optical ﬂow: Roth & Black (2007); Rosenbaum et al. (2013)

We show examples of several of these in Section 2 of the supplementary material.

This class also allows us to modulate the strength of image deformations, which we deploy to

investigate how task demands are met by CNNs. Furthermore, this class of deformations approximates

most of the commonly used methods of data augmentation for object recognition Simard et al. (2003);

Wong et al. (2016); Cire¸san et al. (2010).

While it is in principle possible to explore ﬁner-grained distributions of deformation (e.g., choosing

adversarial deformations to maximally shift the representation), we think our approach offers good

coverage over the space, and a reasonable ﬁrst order approximations to the class of natural deforma-

tions. We leave the study of richer transformations—such as those requiring a renderer to produce or

those chosen adversarially Fawzi & Frossard (2015); Kanbak et al. (2017)—as future work.

剩余11页未读，继续阅读

评论收藏

内容反馈

三山卡夫卡

粉丝: 16
资源: 323

2019-ICLR-Pooling Is Neither Necessary nor Sufficient for Approp

评论0

最新资源

2019-ICLR-Pooling Is Neither Necessary nor Sufficient for Approp

评论0

2019-ICLR-Graph Generation via Scattering-作者信息-rrrrr1

2019-ICLR-DEEP GENERATIVE MODELS FOR GENERATING LABELED GRAPHS-R

2019-ICLR-Confidence-based Graph Convolutional Networks for Semi

2019-ICLR-GENERATIVE MODELS FOR GRAPH-BASED PROTEIN DESIGN-RRRR-

2019-ICLR-Graph Classification with Geometric Scattering-基于矩阵的小波

2019-ICLR-CAPSULE GRAPH NEURAL NETWORK-网文-rrr1

2019-ICLR-Graph Classification with Geometric Scattering-作者信息-rr

2019-ICLR-百度-DEEP GEOMETRICAL GRAPH CLASSIFICATION-游走向量化+GNN+图下采

ICLR-2019-Oral.zip

protein-sequence-embedding-iclr2019:“使用来自结构的信息学习蛋白质序列嵌入”的源代码-ICLR 2019-Source code learning

阅读理解-2020-ICLR-Transformer-XH- Multi-Evidence Reasoning with eXt

图卷积神经网络及其应用（Graph neural networks） - ICLR 2019.zip

ICLR-2019-Poster (5).zip

ICLR-2019-Poster (9).zip

斯坦福Jure Leskovec图深度生成模型 - graph_gen-iclr-may19-long.pdf.zip

ICLR-2019-Poster (8).zip

ICLR-2019-Poster (2).zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

安全认证cisp教材全套

OpenVAS GVM 中文翻译补丁

2024最新：Hvv中常见的面试问题

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

全面的安全基线核查清单

CISP、NISP二级、CISE题库最新版（2024年1月更新）

OpenVAS离线资源

最新资源