深度学习相关资料_深度学习资源-CSDN文库

共3个文件

pptx：1个

pdf：1个

zip：1个

5星 · 超过95%的资源需积分: 49 95 浏览量 2015-09-23 19:11:03 上传评论 23 收藏 19.3MB ZIP 举报

深度学习是人工智能领域的一个重要分支，它通过模拟人脑神经网络的工作原理，让计算机能够从大量数据中自动学习特征并进行预测或决策。本压缩包包含的资源将帮助你深入理解这一前沿技术。 "深度学习工具箱"可能是诸如TensorFlow、Keras或PyTorch这样的深度学习框架的源代码注释版。这些工具箱是深度学习实践中的基础，它们提供了一系列高级API，简化了神经网络模型的构建、训练和优化过程。例如，TensorFlow是一个强大的开源库，支持定义、训练和部署各种深度学习模型。Keras则以其用户友好性和高效性而闻名，它允许开发者快速实验和构建复杂的神经网络结构。PyTorch则以其动态计算图特性受到青睐，更适合进行研究和实验。通过阅读注释，你可以了解这些工具箱的工作原理，学习如何利用它们构建自己的深度学习模型。 "配套论文"很可能是深度学习领域的经典或者最新的研究论文，可能涵盖了卷积神经网络（CNN）、循环神经网络（RNN）、长短时记忆网络（LSTM）、生成对抗网络（GAN）等主题。这些论文有助于你理解深度学习的理论基础，追踪学术界的最新进展。例如，LeCun等人提出的卷积神经网络在图像识别中的应用，开启了深度学习的热潮；Hochreiter和Schmidhuber提出的LSTM解决了RNN的长期依赖问题，大大增强了序列数据的处理能力。 "配套的PPT"可能是深度学习课程的讲义或研讨会材料，包含了关键概念的可视化解释、实例演示以及实验步骤。这些PPT可以帮助你直观地理解深度学习的各个环节，如前向传播、反向传播、损失函数、优化算法（如梯度下降、Adam）等，并指导你如何在实际项目中应用这些知识。综合运用这些资源，你可以系统地学习深度学习，从基础知识到高级技巧，逐步提升你的技能水平。无论是为了学术研究还是工业应用，这份压缩包都是一份宝贵的参考资料。在学习过程中，不断实践和探索，结合实际案例，将理论知识转化为解决问题的能力，是成为深度学习专家的关键步骤。

资源详情

资源评论

资源推荐

收起资源包目录

深度学习相关资料.zip （3个子文件）

深度学习相关资料

Notes on Convolutional Neural Networks.pdf 140KB

CNN学习-薛开宇.pptx 5.19MB

DeepLearnToolbox-master.zip 14.06MB

Notes on Convolutional Neural Networks

Jake Bouvrie

Center for Biological and Computational Learning

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, MA 02139

jvb@mit.edu

November 22, 2006

1 Introduction

This document discusses the derivation and implementation of convolutional neural networks

(CNNs) [3, 4], followed by a few straightforward extensions. Convolutional neural networks in-

volve many more connections than weights; the architecture itself realizes a form of regularization.

In addition, a convolutional network automatically provides some degree of translation invariance.

This particular kind of neural network assumes that we wish to learn ﬁlters, in a data-driven fash-

ion, as a means to extract features describing the inputs. The derivation we present is speciﬁc to

two-dimensional data and convolutions, but can be extended without much additional effort to an

arbitrary number of dimensions.

We begin with a description of classical backpropagation in fully connected networks, followed by a

derivation of the backpropagation updates for the ﬁltering and subsampling layers in a 2D convolu-

tional neural network. Throughout the discussion, we emphasize efﬁciency of the implementation,

and give small snippets of MATLAB code to accompany the equations. The importance of writing

efﬁcient code when it comes to CNNs cannot be overstated. We then turn to the topic of learning

how to combine feature maps from previous layers automatically, and consider in particular, learning

sparse combinations of feature maps.

Disclaimer: This rough note could contain errors, exaggerations, and false claims.

2 Vanilla Back-propagation Through Fully Connected Networks

In typical convolutional neural networks you might ﬁnd in the literature, the early analysis consists of

alternating convolution and sub-sampling operations, while the last stage of the architecture consists

of a generic multi-layer network: the last few layers (closest to the outputs) will be fully connected

1-dimensional layers. When you’re ready to pass the ﬁnal 2D feature maps as inputs to the fully

connected 1-D network, it is often convenient to just concatenate all the features present in all the

output maps into one long input vector, and we’re back to vanilla backpropagation. The standard

backprop algorithm will be described before going onto specializing the algorithm to the case of

convolutional networks (see e.g. [1] for more details).

2.1 Feedforward Pass

In the derivation that follows, we will consider the squared-error loss function. For a multiclass

problem with c classes and N training examples, this error is given by

n=1

k=1

− y

)

Here t

is the k-th dimension of the n-th pattern’s corresponding target (label), and y

is similarly

the value of the k-th output layer unit in response to the n-th input pattern. For multiclass classiﬁ-

cation problems, the targets will typically be organized as a “one-of-c” code where the k-th element

of t

is positive if the pattern x

belongs to class k. The rest of the entries of t

will be either zero

or negative depending on the choice of your output activation function (to be discussed below).

Because the error over the whole dataset is just a sum over the individual errors on each pattern, we

will consider backpropagation with respect to a single pattern, say the n-th one:

k=1

− y

)

− y

. (1)

With ordinary fully connected layers, we can compute the derivatives of E with respect to the net-

work weights using backpropagation rules of the following form. Let ` denote the current layer,

with the output layer designated to be layer L and the input “layer” designated to be layer 1. Deﬁne

the output of this layer to be

= f(u

), with u

= W

`−1

+ b

(2)

where the output activation function f (·) is commonly chosen to be the logistic (sigmoid) function

f(x) = (1 + e

−βx

)

−1

or the hyperbolic tangent function f(x) = a tanh(bx). The logistic function

maps [−∞, +∞] → [0, 1], while the hyperbolic tangent maps [−∞, +∞] → [−a, +a]. Therefore

while the outputs of the hyperbolic tangent function will typically be near zero, the outputs of a

sigmoid will be non-zero on average. However, normalizing your training data to have mean 0 and

variance 1 along the features can often improve convergence during gradient descent [5]. With a

normalized dataset, the hyperbolic tangent function is thus preferrable. LeCun recommends a =

1.7159 and b = 2/3, so that the point of maximum nonlinearity occurs at f(±1) = ±1 and will thus

avoid saturation during training if the desired training targets are normalized to take on the values

±1 [5].

2.2 Backpropagation Pass

The “errors” which we propagate backwards through the network can be thought of as “sensitivities”

of each unit with respect to perturbations of the bias

. That is to say,

∂E

∂b

∂E

∂u

∂b

= δ (3)

since in this case

∂u

∂b

= 1. So the bias sensitivity and the derivative of the error with respect to a

unit’s total input is equivalent. It is this derivative that is backpropagated from higher layers to lower

layers, using the following recurrence relation:

= (W

`+1

)

`+1

◦ f

) (4)

where “◦” denotes element-wise multiplication. For the error function (1), the sensitivities for the

output layer neurons will take a slightly different form:

= f

) ◦ (y

− t

Finally, the delta rule for updating a weight assigned to a given neuron is just a copy of the inputs

to that neuron, scaled by the neuron’s delta. In vector form, this is computed as an outer product

between the vector of inputs (which are the outputs from the previous layer) and the vector of

sensitivities:

∂E

∂W

= x

`−1

(δ

)

(5)

∆W

= −η

∂E

∂W

(6)

with analogous expressions for the bias update given by (3). In practice there is often a learning rate

parameter η

speciﬁc to each weight (W)

This nifty interpretation is due to Sebastian Seung

3 Convolutional Neural Networks

Typically convolutional layers are interspersed with sub-sampling layers to reduce computation time

and to gradually build up further spatial and conﬁgural invariance. A small sub-sampling factor is

desirable however in order to maintain speciﬁcity at the same time. Of course, this idea is not new,

but the concept is both simple and powerful. The mammalian visual cortex and models thereof [12,

8, 7] draw heavily on these themes, and auditory neuroscience has revealed in the past ten years

or so that these same design paradigms can be found in the primary and belt auditory areas of the

cortex in a number of different animals [6, 11, 9]. Hierarchical analysis and learning architectures

may yet be the key to success in the auditory domain.

3.1 Convolution Layers

Let’s move forward with deriving the backpropagation updates for convolutional layers in a network.

At a convolution layer, the previous layer’s feature maps are convolved with learnable kernels and

put through the activation function to form the output feature map. Each output map may combine

convolutions with multiple input maps. In general, we have that

= f

i∈M

`−1

∗ k

+ b

where M

represents a selection of input maps, and the convolution is of the “valid” border handling

type when implemented in MATLAB. Some common choices of input maps include all-pairs or all-

triplets, but we will discuss how one might learn combinations below. Each output map is given an

additive bias b, however for a particular output map, the input maps will be convolved with distinct

kernels. That is to say, if output map j and map k both sum over input map i, then the kernels

applied to map i are different for output maps j and k.

3.1.1 Computing the Gradients

We assume that each convolution layer ` is followed by a downsampling layer `+1. The backpropa-

gation algorithm says that in order to compute the sensitivity for a unit at layer `, we should ﬁrst sum

over the next layer’s sensitivies corresponding to units that are connected to the node of interest in

the current layer `, and multiply each of those connections by the associated weights deﬁned at layer

` + 1. We then multiply this quantity by the derivative of the activation function evaluated at the

current layer’s pre-activation inputs, u . In the case of a convolutional layer followed by a downsam-

pling layer, one pixel in the next layer’s associated sensitivity map δ corresponds to a block of pixels

in the convolutional layer’s output map. Thus each unit in a map at layer ` connects to only one unit

in the corresponding map at layer ` + 1. To compute the sensitivities at layer ` efﬁciently, we can

upsample the downsampling layer’s sensitivity map to make it the same size as the convolutional

layer’s map and then just multiply the upsampled sensitivity map from layer `+1 with the activation

derivative map at layer ` element-wise. The “weights” deﬁned at a downsampling layer map are all

equal to β (a constant, see section 3.2), so we just scale the previous step’s result by β to ﬁnish the

computation of δ

. We can repeat the same computation for each map j in the convolutional layer,

pairing it with the corresponding map in the subsampling layer:

= β

`+1



) ◦ up(δ

`+1

)



where up(·) denotes an upsampling operation that simply tiles each pixel in the input horizontally

and vertically n times in the output if the subsampling layer subsamples by a factor of n. As we

will discuss below, one possible way to implement this function efﬁciently is to use the Kronecker

product:

up(x) ≡ x ⊗ 1

n×n

Now that we have the sensitivities for a given map, we can immediately compute the bias gradient

by simply summing over all the entries in δ

∂E

∂b

u,v

(δ

)

Xsky

2018-09-09

资料可以，不过分太贵了

评论收藏

内容反馈

山在岭就在

粉丝: 5535
资源: 34

深度学习相关资料

评论30

最新资源

深度学习相关资料

评论30

AI12深度学习相关资料，用于学习

人工智能_机器学习_深度学习 相关资料.zip

深度学习相关资料源码.zip

mtlab-深度学习相关资料合计.rar

深度学习资料

深度学习资料汇总

深度学习 代码+ 资料

深度学习之神经网络核心原理与算法-课程学习相关代码.zip

人工智能深度学习和智慧农业

深度学习基础课程课件ppt

复旦大学_深度学习与神经网络书籍

竞赛资料源码-Python利用深度学习进行文本摘要的综合指南、知识图谱深度学习相关资料整理.zip

神经网络&深度学习经典算法解析

深度学习Deep learing英文论文

深度学习 中文版 github

Python利用深度学习进行文本摘要的综合指南、知识图谱深度学习相关资料整理、维基大规模平行文本语料.zip

深度学习：蒲公英书手推

基于深度学习的智慧家居用户行为分析与情景控制系统

深度学习三巨头在Nature上共同发表的名为《深度学习》的综述文章

范力文-知乎搜索中的深度学习实践-脱敏.pdf

文本挖掘课程PDF

DeepLearning-Lan Goodfellow 深度学习-赵申剑译

AI人工智能-机器视觉-深度学习等技术资料合集.zip

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

AutoSAR标准协议4.2.2

光伏-储能并网系统仿真.rar

XCP协议的规范文档

最新资源

人工智能_机器学习_深度学习相关资料.zip

深度学习代码+ 资料

深度学习中文版 github

Qt 5实现串口调试助手（源工程文件、0积分下载）