没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
机器学习系列(1)
深度前馈神经网络--原理解释、公式推导及Python实现
深度神经网络的原理解释:
定义
变量约束
前向传播
反向传播
数据集
公式推导:
前向过程
BP算法
梯度下降
Python实现:
见文章内容
申明
本文原理解释及公式推导部分均由LSayhi完成,供学习参考,可传播;代码实现部分的框架由Coursera提供,由LSayhi完成,详细数据及代码可在
github查阅。 https://github.com/LSayhi/DeepLearning/tree/master/Coursera-deeplearning%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0
(https://github.com/LSayhi/DeepLearning/tree/master/Coursera-deeplearning%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0)
微信公众号:AI有点可ai
一、原理解释
1.定义: 前馈神经网络,亦称多层感知机,网络的目标是近似一个目标映射 ,记为 。对于预测型神经网络来说,通过学习参数 的值,使得
函数 拟合因变量自变量之间的映射关系;对于分类神经网络,学习参数 的值,使映射 拟合各类别之间的边界。
神经网络模型由输入层、输出层、隐藏层及连接各层之间的权重(参数)组成。
“深度”是指 除去模型中输入层后网络的层数;
“前馈”是指 网络没有反馈的连接方式;
“网络”是指 它是由不同函数g所复合的。
2.变量约束: 以一个输入层特征数为 , 层,即隐藏层数
为 的前馈神经网络为例:
表示第 层第 个线性单元, 表示第 层所有线性单元
表示第 层第 个激活单元, 表示第 层所有激活单元
表示第 层单元的数目
, 中的 表示层数, 表示样本(数据)序号, , 表示样本序号(通常省略[0]), 表示第 个特征序号
3.前向传播: 前向传播指的是输入数据(集)从输入层到输入层的计算过程。在输入数据之前,需要先进行参数初始化,即随机生成 矩阵, 矩阵。
然后每个层的单元根据 矩阵和前一层的数据进行计算线性输出,再由激活函数非线性化,层叠往复,最后得到输出层的输出数据,称此过程为前向传
播。
4.反向传播: 反向传播指的是当输出层计算出输出数据后,与数据集中对应的标签进行比对,求出损失(Loss,单个样本)和代价(损失的和,整个样本
集),再求解参数 和 的偏导数,进而由梯度下降等方法更新参数 和 ,当损失为0或者达到目标值时停止,由于求解参数 和 的偏导数是一个
反向递推的过程,所以此过程也因此成为反向传播。
5.数据集: 数据集是用来训练和测试网络的数据的集合,包括训练集和测试集。一般来说,训练集用于训练网络达到最佳,测试集用来测试网络泛化性
能。
f y
=
f
(
x
; θ)
θ
f
θ
f
(= 12888
, 定义为输入层的单元数
)
n
x
n
x
L
(
=
L
−
1 + 1,
L
定义为除去输入层的网络层数)
L
−
1
z
[
l
]
i
l i
z
[
l
]
l
a
[
l
]
i
l i
a
[
l
]
l
n
[
l
]
l
m
= 1, 2, … …
,
M
、
z
[
l
](
m
)
a
[
l
](
m
)
l
m x
[0](
m
)
n
m n n
W b
W
W
[
l
]
b
[
l
]
W b W b
式推
二、公式推导
前向传播:
对于一个 层的前馈网络,第 层的线性函数实现
对M个样本向量化后,表示为
那么
,其中g为激活函数,常见的激活函数有sigmoid、tanh、Relu等, 。
特殊地,输入层 另记为 ,输出层记为 ,对M个样本向量化后为 和 。由此,只要给出 ,由公式
和
即可由前向后逐步求出 ,这就是前向传播的过程。
损失和代价:
损失函数:对应于单个样本。
损失函数定义了度量神经网络输出层输出的数据与标签之间的差异值(即损失),其功能类似于均方误差函数
事实上,
是交叉信息熵,最小化交叉信息熵也等效于最大似然准则(MLP),最小化交叉信息熵直观上可以认为和最小化误差相似,能够用来度量被比较对
象的差异大小,但因均方误差函数的特点,在逻辑回归中很可能会导致收敛速度缓慢,并且代价函数不一定是凸函数,不一定能收敛于最小值,可
能会收敛到某个非最小的极小值点,所以我们常用交叉信息熵代替均方误差。这里可以参考个人笔记机器学习中的数学(1)--交叉信息熵
代价函数:对于M个样本集。
代价函数关心的是整个样本集的损失
反向传播
反向传播过程可由下图表示:
Figure : Forward and Backward propagation for LINEAR->RELU->LINEAR->SIGMOID
The purple blocks represent the forward propagation, and the red blocks represent the backward propagation.
网络的目标是使得 最小,在数学上就是多变量函数求最值问题,首先想到的可能是求偏导数,然后另偏导数等于零,找到最小值点,得到
对应参数W和b的值,此法称正规方程法,但是由于矩阵 不一定可逆,所以我们使用梯度下降法来寻找最优的参数W和b。 梯度下降法表示为:
梯度下降法也需要求出代价函数关于参数W和b的偏导数,由求导数的链式法则可知,
由于,
将③代入①和②,可知:
将式③、④、⑤矢量化,并记为
L l
= +
z
[
l
]
w
[
l
]
a
[
l
−
1]
b
[
l
]
= +
Z
[
l
]
W
[
l
]
A
[
l
−
1]
b
[
l
]
=
g
( ) =
g
( + )
A
[
l
]
Z
[
l
]
W
[
l
]
A
[
l
−
1]
b
[
l
]
l
= 1, 2, … …
,
L
a
[0]
x a
[
L
]
X A
[
L
]
X
=
g
( ) =
g
( + )
A
[
l
]
Z
[
l
]
W
[
l
]
A
[
l
−
1]
b
[
l
]
=
X
a
[0]
A
[
L
]
L
( ,
y
)
= log
( )
+ (1
−
) log
(
1
−
)
y
y
(
m
)
a
[
L
](
m
)
y
(
m
)
a
[
L
](
m
)
MSE
=
(
−
y
[
m
]
y
(
m
)
)
2
⎯ ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
√
L
( ,
y
)
= log
( )
+ (1
−
) log
(
1
−
)
y
y
(
m
)
a
[
L
](
m
)
y
(
m
)
a
[
L
](
m
)
J
(
X
,
Y
;
W
,
b
) =
−
( log
( )
+ (1
−
) log
(
1
−
)
)
1
M
∑
m
=1
M
y
(
m
)
a
[
L
](
m
)
y
(
m
)
a
[
L
](
m
)
(7)
J
(
X
,
Y
;
W
,
b
)
X
X
T
W
=
W
−
α
∂
J
∂
W
b
=
b
−
α
∂
J
∂
b
= = = ( )
∂
J
∂
W
[
l
]
ij
∂
J
∂
a
[
l
]
i
∂
a
[
l
]
i
∂
z
[
l
]
i
∂
z
[
l
]
i
∂
W
[
l
]
ij
∂
J
∂
a
[
l
]
i
∂
a
[
l
]
i
∂
z
[
l
]
i
a
[
l
−
1]
j
∂
J
∂
a
[
l
]
i
g
′
z
[
l
]
i
a
[
l
−
1]
j
(1)
= = ( )
∂
J
∂
b
[
l
]
i
∂
J
∂
a
[
l
]
i
∂
a
[
l
]
i
∂
z
[
l
]
i
∂
z
[
l
]
i
∂
b
[
l
]
i
∂
J
∂
a
[
l
]
i
g
′
z
[
l
]
i
(2)
= = ( )
∂
J
∂
z
[
l
]
i
∂
J
∂
a
[
l
]
i
∂
a
[
l
]
i
∂
z
[
l
]
i
∂
J
∂
a
[
l
]
i
g
′
z
[
l
]
i
(3)
=
∂
J
∂
W
[
l
]
ij
∂
J
∂
z
[
l
]
i
a
[
l
−
1]
j
(4)
=
∂
J
∂
b
[
l
]
i
∂
J
∂
z
[
l
]
i
(5)
“ ”表示矩阵对应元素相乘,⑥式中的 ,推导过程如下,对于第 层,应用链式法则
向量化后为:
代入⑥,故⑥式可改写为:
再对M个数据向量化:
梯度下降: 由(10)、(11)、(12)、(13)便可以从输出层反向递推到第1层,得到关于参数W和b的所有偏导数,再应用梯度下降法,不断更新
参数:
就可以得到最佳的W和b参数,使得训练集的代价函数最小。注:梯度的方向是函数值上升最快的方向,因此梯度的负方向即是函数值下降最快的方向,
应用梯度下降即是使函数在某个位置(W,b)沿最快的方向减小。
网络优化:
至此,网络的训练已经完成,可以在测试集上进行泛化测试。如果泛化的准确率不高,分析原因:
如果过拟合了,可以考虑增加数据集、正则化、dropout、适当减小网络层数L等
如果欠拟合了,可以考虑增加特征数量、增加网络层数、隐藏层神经元数目等
d
=
d
∗
( )
z
[
l
]
a
[
l
]
g
′
z
[
l
]
(6)
d
=
d
W
[
l
]
z
[
l
]
a
[
l
−
1]
(7)
d
=
d
b
[
l
]
z
[
l
]
(8)
∗
d
= (
d
a
[
l
]
W
[
l
+1]
)
T
z
[
l
+1]
l
d
= =
a
[
l
]
i
∂
J
∂
a
[
l
]
i
∑
j
=1
n
l
+1
∂
J
∂
a
[
l
+1]
j
∂
a
[
l
+1]
j
∂
z
[
l
+1]
j
∂
z
[
l
+1]
j
∂
a
[
l
]
i
即
d
= ( ) = = (
a
[
l
]
i
∑
j
=1
n
l
+1
∂
J
∂
a
[
l
+1]
j
g
′
z
[
l
+1]
j
W
[
l
+1]
ji
∑
j
=1
n
l
+1
∂
J
∂
z
[
l
+1]
j
W
[
l
+1]
ji
W
[
l
+1]
(,
i
)
)
T
∂
J
∂
z
[
l
+1]
j
d
= (
d
a
[
l
]
W
[
l
+1]
)
T
z
[
l
+1]
d
= (
d
∗
( )
z
[
l
]
W
[
l
+1]
)
T
z
[
l
+1]
g
′
z
[
l
]
(9)
d
=
d
∗
( ) = (
d
∗
( )
Z
[
l
]
A
[
l
]
g
′
Z
[
l
]
W
[
l
+1]
)
T
Z
[
l
+1]
g
′
Z
[
l
]
(10)
d
= =
d
W
[
l
]
∂
J
∂
W
[
l
]
1
m
Z
[
l
]
A
[
l
−
1]
T
(11)
d
= =
d
b
[
l
]
∂
J
∂
b
[
l
]
1
m
∑
i
=1
m
Z
[
l
](
i
)
(12)
d
= =
d
A
[
l
−
1]
∂
J
∂
A
[
l
−
1]
W
[
l
]
T
Z
[
l
]
(13)
W
=
W
−
α
∂
J
∂
W
b
=
b
−
α
∂
J
∂
b
三、Python实现
本文的代码框架由Andrew Ng在Coursera.deeplearning.ai的作业中给出,由LSayhi(https://github.com/LSayhi (https://github.com/LSayhi)) 补全,
仅供学习参考,勿用于Coursera刷分.
Building your Deep Neural Network: Step by Step
Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neural Network (with a single hidden layer). This week, you
will build a deep neural network, with as many layers as you want!
In this notebook, you will implement all the functions required to build a deep neural network.
In the next assignment, you will use these functions to build a deep neural network for image classification.
After this assignment you will be able to:
Use non-linear units like ReLU to improve your model
Build a deeper neural network (with more than 1 hidden layer)
Implement an easy-to-use neural network class
Notation:
Superscript denotes a quantity associated with the layer.
Example: is the layer activation. and are the layer parameters.
Superscript denotes a quantity associated with the example.
Example: is the training example.
Lowerscript denotes the entry of a vector.
Example: denotes the entry of the layer's activations).
Let's get started!
[
l
]
l
th
a
[
L
]
L
th
W
[
L
]
b
[
L
]
L
th
(
i
)
i
th
x
(
i
)
i
th
i i
th
a
[
l
]
i
i
th
l
th
1 - Packages
Let's first import all the packages that you will need during this assignment.
numpy (www.numpy.org) is the main package for scientific computing with Python.
matplotlib (http://matplotlib.org) is a library to plot graphs in Python.
dnn_utils provides some necessary functions for this notebook.
testCases provides some test cases to assess the correctness of your functions
np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don't change the seed.
In[1]:
2 - Outline of the Assignment
To build your neural network, you will be implementing several "helper functions". These helper functions will be used in the next assignment to build a
two-layer neural network and an L-layer neural network. Each small helper function you will implement will have detailed instructions that will walk you
through the necessary steps. Here is an outline of this assignment, you will:
Initialize the parameters for a two-layer network and for an -layer neural network.
Implement the forward propagation module (shown in purple in the figure below).
Complete the LINEAR part of a layer's forward propagation step (resulting in ).
We give you the ACTIVATION function (relu/sigmoid).
Combine the previous two steps into a new [LINEAR->ACTIVATION] forward function.
Stack the [LINEAR->RELU] forward function L-1 time (for layers 1 through L-1) and add a [LINEAR->SIGMOID] at the end (for the final layer
). This gives you a new L_model_forward function.
Compute the loss.
Implement the backward propagation module (denoted in red in the figure below).
Complete the LINEAR part of a layer's backward propagation step.
We give you the gradient of the ACTIVATE function (relu_backward/sigmoid_backward)
Combine the previous two steps into a new [LINEAR->ACTIVATION] backward function.
Stack [LINEAR->RELU] backward L-1 times and add [LINEAR->SIGMOID] backward in a new L_model_backward function
Finally update the parameters.
Figure 1
Note that for every forward function, there is a corresponding backward function. That is why at every step of your forward module you will be storing
some values in a cache. The cached values are useful for computing gradients. In the backpropagation module you will then use the cache to
calculate the gradients. This assignment will show you exactly how to carry out each of these steps.
L
Z
[
l
]
L
3 - Initialization
import
numpy
as
np
import
h5py
import
matplotlib.pyplot
as
plt
from
testCases_v2
import
*
from
dnn_utils_v2
import
sigmoid, sigmoid_backward, relu, relu_backward
%
matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0)
# set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%
load_ext autoreload
%
autoreload 2
np.random.seed(1)
剩余15页未读,继续阅读
UEgood雪姐姐
- 粉丝: 22
- 资源: 319
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0