没有合适的资源？快使用搜索试试~ 我知道了~

文库首页人工智能机器学习An overview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms

需积分: 50 6 下载量 95 浏览量 2019-04-06 00:03:13 上传评论收藏 2.36MB PDF 举报

温馨提示

试读

21页

An overview of gradient descent optimization algorithms

资源推荐

资源详情

资源评论

19 Jan 2016
Table of contents:
An overview of gradient descent
optimization algorithms

Gradient descent variants
Batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent
Challenges
Gradient descent optimization algorithms
Momentum
Nesterov accelerated gradient
Adagrad

Gradient descent is one of the most popular algorithms to perform optimization and by far
the most common way to optimize neural networks. At the same time, every state-of-the-
art Deep Learning library contains implementations of various algorithms to optimize
gradient descent (e.g.  ,  , and   documentation). These algorithms,
however, are often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by.
This blog post aims at providing you with intuitions towards the behaviour of different
algorithms for optimizing gradient descent that will help you put them to use. We are first
going to look at the different variants of gradient descent. We will then briefly summarize
challenges during training. Subsequently, we will introduce the most common optimization
Adadelta
RMSprop
Adam
Visualization of algorithms
Which optimizer to choose?
Parallelizing and distributing SGD
Hogwild!
Downpour SGD
Delay-tolerant Algorithms for SGD
TensorFlow
Elasic Averaging SGD
Additional strategies for optimizing SGD
Shuffling and Curriculum Learning
Batch normalization
Early Stopping
Gradient noise
Conclusion
References
lasagne's caffe's keras'

algorithms by showing their motivation to resolve these challenges and how this leads to

the derivation of their update rules. We will also take a short look at algorithms and

architectures to optimize gradient descent in a parallel and distributed setting. Finally, we

will consider additional strategies that are helpful for optimizing gradient descent.

Gradient descent is a way to minimize an objective function parameterized by a

model's parameters by updating the parameters in the opposite direction of the

gradient of the objective function w.r.t. to the parameters. The learning rate

determines the size of the steps we take to reach a (local) minimum. In other words, we

follow the direction of the slope of the surface created by the objective function downhill

until we reach a valley. If you are unfamiliar with gradient descent, you can find a good

introduction on optimizing neural networks .

Gradient descent variants

There are three variants of gradient descent, which differ in how much data we use to

compute the gradient of the objective function. Depending on the amount of data, we

make a trade-off between the accuracy of the parameter update and the time it takes to

perform an update.

Batch gradient descent

Vanilla gradient descent, aka batch gradient descent, computes the gradient of the cost

function w.r.t. to the parameters for the entire training dataset:

As we need to calculate the gradients for the whole dataset to perform just

one

update,

batch gradient descent can be very slow and is intractable for datasets that don't fit in

memory. Batch gradient descent also doesn't allow us to update our model

online

, i.e. with

new examples on-the-fly.

In code, batch gradient descent looks something like this:

(

)

∈

ℝ

(

)

∇

here

−

⋅

(

)

∇

for i in range(nb_epochs):

params_grad = evaluate_gradient(loss_function, data, params)

params = params - learning_rate * params_grad

For a pre-defined number of epochs, we first compute the gradient vector weights_grad

of the loss function for the whole dataset w.r.t. our parameter vector params. Note that

state-of-the-art deep learning libraries provide automatic differentiation that efficiently

computes the gradient w.r.t. some parameters. If you derive the gradients yourself, then

gradient checking is a good idea. (See for some great tips on how to check gradients

properly.)

We then update our parameters in the direction of the gradients with the learning rate

determining how big of an update we perform. Batch gradient descent is guaranteed to

converge to the global minimum for convex error surfaces and to a local minimum for non-

convex surfaces.

Stochastic gradient descent

Stochastic gradient descent (SGD) in contrast performs a parameter update for

each

training example and label :

Batch gradient descent performs redundant computations for large datasets, as it

recomputes gradients for similar examples before each parameter update. SGD does away

with this redundancy by performing one update at a time. It is therefore usually much

faster and can also be used to learn online.

SGD performs frequent updates with a high variance that cause the objective function to

fluctuate heavily as in Image 1.

here

(

)

(

)

−

⋅

(

; ; )

∇

(

)

(

)

剩余20页未读，继续阅读

评论收藏

内容反馈

资源评论

#完美解决问题
#运行顺畅
#内容详尽
#全网独家
#注释完整

资源反馈

评论星级较低，若资源使用遇到问题可联系上传者，3个工作日内问题未解决可申请退款~

Leo青山

粉丝: 1
资源: 1

上传资源快速赚钱

我的内容管理展开

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

前往需求广场，查看用户热搜

An overview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms（译文）

An overview of gradient descent optimizationalgorithms.pdf

梯度下降优化算法综述 - ranjiewen - 博客园2

An overview of gradient descent optimization.pdf

Matlab library for gradient descent algorithms Version 1.0.1.zip

Ollama软件windows安装包(版本0.3.10)

Page Assist - 本地 AI 模型(deepseek)的 Web UI

2025年人工智能代理人：Manus AI与AI Agent的技术革新与应用场景解析

DeepSeek R1 本地部署 桌面客户端 Windows版本

Chatbox-1.3.5-安装包

Ollama 0.5.7

Ollama (windows版0.3.10)

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

DeepSeek本地部署指南：从Ollama安装到深度模型运行与WebUI集成的技术流程

deepseek本地安装包

Chatbox-1.9.7-Setup.exe

DeepSeek使用手册

人工智能导论（第5版）.pdf

机器学习期末复习题及答案

ollama 0.3.13

深度探索DeepSeek模型本地部署教程：Ollama安装与模型运行详解

officeAI（deepseek+WPS/office）

神经网络回归预测--气温数据集

Mathwork+Matlab+编程手册

DeepSeek从入门到精通(20250204)

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！

中文短信数据集-带标签

shape_predictor_68_face_landmarks.zip

最新资源

DeepSeek R1 本地部署桌面客户端 Windows版本