卷积神经网络CNN笔记（理解CNN数学原理的指南）.pdf

需积分: 45 189 浏览量 2020-02-24 05:16:20 上传评论 1 收藏 773KB PDF 举报

资源推荐

资源详情

资源评论

Convolutional neural networks

Jianxin Wu

LAMDA Group

National Key Lab for Novel Software Technology

Nanjing University, China

wujx2001@gmail.com

February 11, 2020

Contents

1 Preliminaries 3

1.1 Tensor and vectorization . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Vector calculus and the chain rule . . . . . . . . . . . . . . . . . 4

2 CNN overview 5

2.1 The architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 The forward run . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Stochastic gradient descent (SGD) . . . . . . . . . . . . . . . . . 7

2.4 Error back propagation . . . . . . . . . . . . . . . . . . . . . . . 9

3 Layer input, output, and notations 10

4 The ReLU layer 11

5 The convolution layer 13

5.1 What is a convolution? . . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Why to convolve? . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.3 Convolution as matrix product . . . . . . . . . . . . . . . . . . . 18

5.4 The Kronecker product . . . . . . . . . . . . . . . . . . . . . . . 20

5.5 Backward propagation: updating the parameters . . . . . . . . . 21

5.6 Even higher dimensional indicator matrices . . . . . . . . . . . . 22

5.7 Backward propagation: preparing the supervision signal for the

previous layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.8 Fully connected layer as a convolution layer . . . . . . . . . . . . 26

6 The pooling layer 26

7 A case study: the VGG-16 net 29

7.1 VGG-Verydeep-16 . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.2 Receptive ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 Hands-on CNN experiences 31

Exercises 32

This chapter describes how a Convolutional Neural Network (CNN) operates

from a mathematical perspective. This chapter is self-contained, and the focus

is to make it comprehensible for beginners to the CNN ﬁeld.

The convolutional neural network (CNN) has shown excellent performance

in many computer vision, machine learning, and pattern recognition problems.

Many solid papers have been published on this topic, and quite a number of

high quality open source CNN software packages have been made available.

There are also well-written CNN tutorials or CNN software manuals. How-

ever, we believe that introductory CNN material speciﬁcally prepared for be-

ginners is still needed. Research papers are usually very terse and lack details.

It might be diﬃcult for beginners to read such papers. A tutorial targeting

experienced researchers may not cover all the necessary details to understand

how a CNN runs.

This chapter tries to present a document that

• is self-contained. It is expected that all required mathematical background

knowledge is introduced in this chapter itself (or in other chapters in this

book);

• has details for all the derivations. This chapter aims to explain all the

necessary math in detail. We try not to ignore any important step in a

derivation. Thus, it should be possible for a beginner to follow (although

an expert may ﬁnd this chapter a bit tautological);

• ignores implementation details. The purpose is for a reader to under-

stand how a CNN runs at the mathematical level. We will ignore those

implementation details. In CNN, making correct choices for various im-

plementation details is one of the keys to its high accuracy (that is, “the

devil is in the details”). However, we intentionally left this part out, in

order for the reader to focus on the mathematics. After understanding the

mathematical principles and details, it is more advantageous to learn these

implementation and design details through hands-on experience by exper-

imenting with CNN programming. The exercise problems in this chapter

provide opportunities for hands-on CNN programming experiences.

CNNs are useful in a lot of applications, especially in image related tasks.

Applications of CNNs include image classiﬁcation, image semantic segmenta-

tion, object detection in images, etc. We will focus on image classiﬁcation (or

categorization) in this chapter. In image categorization, every image has a ma-

jor object that occupies a large portion of the image. An image is classiﬁed into

one of the classes based on the identity of its main object—e.g., dog, airplane,

bird, etc.

1 Preliminaries

We start with a discussion of some background knowledge that is necessary in

order to understand how a CNN runs. The reader can ignore this section if

he/she is familiar with these basics.

1.1 Tensor and vectorization

Everybody is familiar with vectors and matrices. We use a symbol shown in

boldface to represent a vector—e.g., x ∈ R

is a column vector with D elements.

We use a capital letter to denote a matrix—e.g., X ∈ R

H×W

is a matrix with

H rows and W columns. The vector x can also be viewed as a matrix with 1

column and D rows.

These concepts can be generalized to higher-order matrices—i.e., tensors.

For example, x ∈ R

H×W ×D

is an order 3 (or third order) tensor. It contains

HWD elements, and each of them can be indexed by an index triplet (i, j, d),

with 0 ≤ i < H, 0 ≤ j < W , and 0 ≤ d < D. Another way to view an order

3 tensor is to treat it as containing D channels of matrices. Every channel is

a matrix with size H × W . The ﬁrst channel contains all the numbers in the

tensor that are indexed by (i, j, 0). Note that in this chapter we assume the

index starts from 0 rather than 1. When D = 1, an order 3 tensor reduces to a

matrix.

We have interacted with tensors day-to-day. A scalar value is a zeroth-order

(order 0) tensor; a vector is an order 1 tensor; and a matrix is a second order

tensor. A color image is in fact an order 3 tensor. An image with H rows and

W columns is a tensor with size H × W × 3: if a color image is stored in the

RGB format, it has 3 channels (for R, G and B, respectively), and each channel

is a H ×W matrix (second order tensor) that contains the R (or G, or B) values

of all pixels.

It is beneﬁcial to represent images (or other types of raw data) as a tensor.

In early computer vision and pattern recognition, a color image (which is an

order 3 tensor) was often converted to the grayscale version (which is a matrix)

because we know how to handle matrices much better than tensors. The color

information is lost during this conversion. But color is very important in various

image (or video) based learning and recognition problems, and we do want to

process color information in a principled way—e.g., using a CNN.

Tensors are essential in CNN. The input, intermediate representation, and

parameters in a CNN are all tensors. Tensors with order higher than 3 are also

widely used in CNNs. For example, we will soon see that the convolution kernels

in a convolution layer of a CNN form an order 4 tensor.

A sanity check for Equation 4 is to check the matrix/vector dimensions. Note

that

∂z

∂y

is a row vector with H elements, or a 1 × H matrix. (Be reminded

that

∂z

∂y

is a column vector). Since

∂y

∂x

is an H × W matrix, the vector/matrix

multiplication between them is valid, and the result should be a row vector with

W elements, which matches the dimensionality of

∂z

∂x

For speciﬁc rules to calculate partial derivatives of vectors and matrices,

please refer to Chapter 2 and the Matrix Cookbook .

2 CNN overview

In this section, we will see how a CNN trains and predicts at the abstract level,

with the details left for later sections.

2.1 The architecture

A CNN usually takes an order 3 tensor as its input—e.g., an image with H

rows, W columns, and 3 channels (R, G, B color channels). Higher order ten-

sor inputs, however, can be handled by CNN in a similar fashion. The input

then sequentially goes through a number of processes. One processing step is

usually called a layer, which could be a convolution layer, a pooling layer, a

normalization layer, a fully connected layer, a loss layer, etc.

We will introduce the details of these layers later in this chapter. We will give

detailed introductions to three types of layers: convolution, pooling, and ReLU,

which are the key parts of almost all CNN models. Proper normalization—

e.g., batch normalization—is important in the optimization process for learning

good parameters in a CNN. Although it is not introduced in this chapter, we

will present some related resources in the exercise problems.

For now, let us give an abstract description of the CNN structure ﬁrst.

−→ w

−→ x

−→ · · · −→ x

L−1

−→ w

L−1

−→ x

−→ w

−→ z (5)

The above Equation 5 illustrates how a CNN runs layer by layer in a forward

pass. The input is x

, usually an image (order 3 tensor). It goes through the

processing in the ﬁrst layer, which is the ﬁrst box. We denote the parameters

involved in the ﬁrst layer’s processing collectively as a tensor w

. The output of

the ﬁrst layer is x

, which also acts as the input to the second layer’s processing.

This processing proceeds till all layers in the CNN have been ﬁnished, which

outputs x

One additional layer, however, is added for backward error propagation,

a method that learns good parameter values in the CNN. Let’s suppose the

problem at hand is an image classiﬁcation problem with C classes. A commonly

used strategy is to output x

as a C dimensional vector, whose i-th entry

encodes the prediction (posterior probability of x

coming from the i-th class).

To make x

a probability mass function, we can set the processing in the (L −

1)-th layer as a softmax transformation of x

L−1

(cf. Chapter 9). In other

applications, the output x

may have other forms and interpretations.

剩余34页未读，继续阅读

评论收藏

内容反馈

syp_net

粉丝: 158
资源: 1196

卷积神经网络CNN笔记（理解CNN数学原理的指南）.pdf

计算机视觉——卷积神经网络（CNN）简介.pdf

【深度学习系列】卷积神经网络CNN原理详解（一）——基本原理 深度学习原理.pdf

实验六卷积神经网络CNN框架的实现与应用.pdf

3D 三维卷积神经网络CNN(MATLAB).zip

用MATLAB实现卷积神经网络CNN，并对图像进行特征提取_cnn图像处理matlab,cnn特征提取matalb

卷积神经网络(CNN,ConvNet)及其原理详解.pdf

卷积神经网络--浅显易懂的理解.pdf

CNN原理和简单实现

卷积神经网络的数学推导

卷积神经网络（CNN）.pdf

卷积神经网络CNN学习笔记 pdf

卷积神经网络(CNN)学习笔记.pdf

卷积神经网络CNN参照.pdf

卷积神经网络（CNN）概念理解

卷积神经网络CNN从入门到精通.pdf

卷积神经网络CNN.pdf

卷积神经网络.pdf

卷积神经网络原理.pdf

卷积神经网络原理详解

卷积神经网络结构原理详细介绍

解析深度学习-卷积神经网络原理-书

CNN卷积神经网络数字识别代码同pdf论文

卷积神经网络入门到精通微篇

卷积神经网络(CNN)学习笔记.docx

(完整版)一文读懂卷积神经网络CNN.pdf

直白介绍卷积神经网络（CNN）.pdf

AI学习笔记——卷积神经网络（CNN）

最新资源

【深度学习系列】卷积神经网络CNN原理详解（一）——基本原理深度学习原理.pdf