【免费】Forme，Understandingthedifficultyoftrainingdeepfeedforward资源-CSDN文库

less

需积分: 0 129 浏览量 2022-08-03 19:48:26 上传评论收藏 1.55MB PDF 举报

资源详情

资源评论

资源推荐

249

Understanding the difﬁculty of training deep feedforward neural networks

Xavier Glorot Yoshua Bengio

DIRO, Universit

e de Montr

eal, Montr

eal, Qu

ebec, Canada

Abstract

Whereas before 2006 it appears that deep multi-

layer neural networks were not successfully

trained, since then several algorithms have been

shown to successfully train them, with experi-

mental results showing the superiority of deeper

vs less deep architectures. All these experimen-

tal results were obtained with new initialization

or training mechanisms. Our objective here is to

understand better why standard gradient descent

from random initialization is doing so poorly

with deep neural networks, to better understand

these recent relative successes and help design

better algorithms in the future. We ﬁrst observe

the inﬂuence of the non-linear activations func-

tions. We ﬁnd that the logistic sigmoid activation

is unsuited for deep networks with random ini-

tialization because of its mean value, which can

drive especially the top hidden layer into satu-

ration. Surprisingly, we ﬁnd that saturated units

can move out of saturation by themselves, albeit

slowly, and explaining the plateaus sometimes

seen when training neural networks. We ﬁnd that

a new non-linearity that saturates less can often

be beneﬁcial. Finally, we study how activations

and gradients vary across layers and during train-

ing, with the idea that training may be more dif-

ﬁcult when the singular values of the Jacobian

associated with each layer are far from 1. Based

on these considerations, we propose a new ini-

tialization scheme that brings substantially faster

convergence.

1 Deep Neural Networks

Deep learning methods aim at learning feature hierarchies

with features from higher levels of the hierarchy formed

by the composition of lower level features. They include

Appearing in Proceedings of the 13

International Conference

on Artiﬁcial Intelligence and Statistics (AISTATS) 2010, Chia La-

guna Resort, Sardinia, Italy. Volume 9 of JMLR: W&CP 9. Copy-

right 2010 by the authors.

learning methods for a wide array of deep architectures,

including neural networks with many hidden layers (Vin-

cent et al., 2008) and graphical models with many levels of

hidden variables (Hinton et al., 2006), among others (Zhu

et al., 2009; Weston et al., 2008). Much attention has re-

cently been devoted to them (see (Bengio, 2009) for a re-

view), because of their theoretical appeal, inspiration from

biology and human cognition, and because of empirical

success in vision (Ranzato et al., 2007; Larochelle et al.,

2007; Vincent et al., 2008) and natural language process-

ing (NLP) (Collobert & Weston, 2008; Mnih & Hinton,

2009). Theoretical results reviewed and discussed by Ben-

gio (2009), suggest that in order to learn the kind of com-

plicated functions that can represent high-level abstractions

(e.g. in vision, language, and other AI-level tasks), one

may need deep architectures.

Most of the recent experimental results with deep archi-

tecture are obtained with models that can be turned into

deep supervised neural networks, but with initialization or

training schemes different from the classical feedforward

neural networks (Rumelhart et al., 1986). Why are these

new algorithms working so much better than the standard

random initialization and gradient-based optimization of a

supervised training criterion? Part of the answer may be

found in recent analyses of the effect of unsupervised pre-

training (Erhan et al., 2009), showing that it acts as a regu-

larizer that initializes the parameters in a “better” basin of

attraction of the optimization procedure, corresponding to

an apparent local minimum associated with better general-

ization. But earlier work (Bengio et al., 2007) had shown

that even a purely supervised but greedy layer-wise proce-

dure would give better results. So here instead of focus-

ing on what unsupervised pre-training or semi-supervised

criteria bring to deep architectures, we focus on analyzing

what may be going wrong with good old (but deep) multi-

layer neural networks.

Our analysis is driven by investigative experiments to mon-

itor activations (watching for saturation of hidden units)

and gradients, across layers and across training iterations.

We also evaluate the effects on these of choices of acti-

vation function (with the idea that it might affect satura-

tion) and initialization procedure (since unsupervised pre-

training is a particular form of initialization and it has a

drastic impact).

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

评论收藏

内容反馈

华亿

粉丝: 40
资源: 308

For me，Understanding the difficulty of training deep feedforward

评论0

最新资源

For me，Understanding the difficulty of training deep feedforward

评论0

Understanding the difficulty of training deep feedforward neural networks

Understanding the difficulty of training deep feedforward neural networks.pdf

2010-Xavier参数初始化-Understanding the difficulty of training deep f

Understanding the difficulty of training deep feedforward neural networks.zip

On the difficulty of training recurrent neural networks

100篇之外深度学习.zip

神经网络Xavier随机初始化

POJ2151-Check the difficulty of problems

On the Difficulty of Membership Inference Attacks.pdf

The Art of Linux Kernel Design

On the Difficulty of Evaluating Baselines A Study on RS

The Complexity of Theorem-Proving Procedures

Udemy - Deep Learning Convolutional Neural Networks in Python

the design of design

Space-Time Block Coding for Wireless Communications

Solving the Quantum Many-Body Problem with Artificial Neural Networks

Towards a Practical Face Recognition System

Intelligent Video Event Analysis and Understanding

Statistical Method for Psychology (Howell)

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

最新资源