DeeplearningviaHessian-freeoptimization资源-CSDN文库

2010

ICML

4星 · 超过85%的资源需积分: 12 17 浏览量 2013-11-22 15:02:52 上传评论收藏 137KB PDF 举报

资源推荐

资源详情

资源评论

Deep learning via Hessian-free optimization

James Martens JMARTENS@CS.TORONTO.EDU

University of Toronto, Ontario, M5S 1A1, Canada

Abstract

We develop a 2

-order optimization method

based on the “Hessian-free” approach, and apply

it to training deep auto-encoders. Without using

pre-training, we obtain results superior to those

reported by Hinton & Salakhutdinov (2006) on

the same tasks they considered. Our method is

practical, easy to use, scales nicely to very large

datasets, and isn’t limited in applicability to auto-

encoders, or any speciﬁc model class. We also

discuss the issue of “pathological curvature” as

a possible explanation for the difﬁculty of deep-

learning and how 2

-order optimization, and our

method in particular, effectively deals with it.

1. Introduction

Learning the parameters of neural networks is perhaps one

of the most well studied problems within the ﬁeld of ma-

chine learning. Early work on backpropagation algorithms

showed that the gradient of the neural net learning objective

could be computed efﬁciently and used within a gradient-

descent scheme to learn the weights of a network with mul-

tiple layers of non-linear hidden units. Unfortunately, this

technique doesn’t seem to generalize well to networks that

have very many hidden layers (i.e. deep networks). The

common experience is that gradient-descent progresses ex-

tremely slowly on deep nets, seeming to halt altogether be-

fore making signiﬁcant progress, resulting in poor perfor-

mance on the training set (under-ﬁtting).

It is well known within the optimization community that

gradient descent is unsuitable for optimizing objectives

that exhibit pathological curvature. 2

-order optimization

methods, which model the local curvature and correct for

it, have been demonstrated to be quite successful on such

objectives. There are even simple 2D examples such as the

Rosenbrock function where these methods can demonstrate

considerable advantages over gradient descent. Thus it is

reasonable to suspect that the deep learning problem could

be resolved by the application of such techniques. Unfortu-

Appearing in Proceedings of the 27

International Conference

author(s)/owner(s).

nately, there has yet to be a demonstration that any of these

methods are effective on deep learning problems that are

known to be difﬁcult for gradient descent.

Much of the recent work on applying 2

-order methods

to learning has focused on making them practical for large

datasets. This is usually attemptedby adoptingan “on-line”

approachakin to the one used in stochastic gradient descent

(SGD). The only demonstrated advantages of these meth-

ods over SGD is that they can sometimes converge in fewer

training epochs and that they require less tweaking of meta-

parameters, such as learning rate schedules.

The most important recent advance in learning for deep

networks has been the development of layer-wise unsu-

pervised pre-training methods (

Hinton & Salakhutdinov,

2006; Bengio et al., 2007). Applying these methods before

running SGD seems to overcome the difﬁculties associated

with deep learning. Indeed, there have been many suc-

cessful applications of these methods to hard deep learn-

ing problems, such as auto-encodersand classiﬁcation nets.

But the question remains: why does pre-training work and

why is it necessary? Some researchers (e.g.

Erhan et al.,

2010) have investigated this question and proposed various

explanations such as a higher prevalence of bad local op-

tima in the learning objectives of deep models.

Another explanation is that these objectives exhibit patho-

logical curvature making them nearly impossible for

curvature-blind methods like gradient-descent to success-

fully navigate. In this paper we will argue in favor of this

explanation and provide a solution in the form of a pow-

erful semi-online 2

-order optimization algorithm which

is practical for very large models and datasets. Using

this technique, we are able to overcome the under-ﬁtting

problem encountered when training deep auto-encoder

neural nets far more effectively than the pre-training +

ﬁne-tuning approach proposed by

Hinton & Salakhutdinov

(2006). Being an optimization algorithm, our approach

doesn’t deal speciﬁcally with the problem of over-ﬁtting,

however we show that this is only a serious issue for one of

the three deep-auto encoder problems considered by Hin-

ton & Salakhutdinov, and can be handled by the usual

methods of regularization.

These results also help us address the question of why

deep-learning is hard and why pre-training sometimes

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

评论收藏

内容反馈

浩三儿

2015-03-09

学习了很实用
buzy

2016-10-18

挺好的资源

黄革革

粉丝: 12
资源: 4

Deep learning via Hessian-free optimization

Training Deep and Recurrent Networks with Hessian-Free Optimization

Learning Recurrent Neural Networks with Hessian-Free Optimization

hessian-3.3.6-API文档-中文版.zip

hessian-3.3.6-API文档-中英对照版.zip

hessian-4.0.63-API文档-中英对照版.zip

hessian-lite-3.2.1-fixed-2.jar

Deep-neural-network-Second-Order-Optimization:我尝试使用 Hessian-free 优化来训练深度神经网络。 训练基于MNIST数据集

hessian-3.0.20-src.jar

论文研究-一种具有亚像素精度的仿Hessian-Laplace快速角点检测算法.pdf

hessian-4.0.7jar文件与源代码.rar

hessian-4.0.63-API文档-中文版.zip

hessian-3.1.6.jar

hessian-3.2.0源码

flex-hessian-java实例.rar

hessian-lite

hessian-4.0.33.jar

dubbo-hessian-lite

hessian-4.0.38.jar

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

数字信号处理——保研复习资料.pdf

最新资源

Deep-neural-network-Second-Order-Optimization:我尝试使用 Hessian-free 优化来训练深度神经网络。训练基于MNIST数据集

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar