【免费】论文：Thewake-sleepalgorithmforunsupervisedneuralnetworks资源-CSDN文库

5星 · 超过95%的资源需积分: 0 80 浏览量 2014-08-19 14:17:59 上传评论 1 收藏 68KB PDF 举报

资源推荐

资源详情

资源评论

The wake-sleep algorithm for unsupervised

neural networks

Geoffrey E Hinton



Peter Dayan Brendan J Frey Radford M Neal

Department of Computer Science

University of Toronto

6 King’s College Road

Toronto M5S 1A4, Canada

April 1995

Abstract

An unsupervised learning algorithm for a multilayer network of stochastic

neurons is described. Bottom-up “recognition” connections convert the input

into representations in successive hidden layers and top-down “generative”

connections reconstruct the representation inonelayer fromthe representation

in the layer above. In the “wake” phase, neurons are driven by recognition

connections, andgenerativeconnectionsareadaptedtoincreasetheprobability

that they would reconstruct the correctactivity vector in the layerbelow. In the

“sleep” phase, neurons are driven by generative connections and recognition

connections are adapted to increase the probability that they would produce

the correct activity vector in the layer above.

Supervised learning algorithms for multilayer neural networks face two problems:

Theyrequirea teacher to specify the desired output ofthe networkand theyrequire

some method of communicating error information to all of the connections. The

wake-sleep algorithm avoids both these problems. When there is no external

teaching signal to be matched, some other goal is required to force the hidden

units to extract underlying structure. In the wake-sleep algorithm the goal is to

learn representations that are economical to describe but allow the input to be

reconstructedaccurately. We can quantify this goal by imagining a communication

game in which each vector of raw sensory inputs is communicated to a receiver by

ﬁrst sending its hidden representation and then sending the difference between the

input vector and its top-down reconstruction from the hidden representation. The

aim of learning is to minimize the “description length” which is the total number

of bits that would be required to communicate the input vectors in this way [1].

No communication actually takes place, but minimizing the description length

that would be required forces the networkto learn economical representations that

capture the underlying regularities in the data [2].

The neural network has two quite different sets of connections. The bottom-up

“recognition”connections are used to convertthe input vector into a representation

in one or more layers of hidden units. The top-down “generative” connections are

then used to reconstruct an approximation to the input vector from its underlying

representation. The training algorithm for these two sets of connections can be

used with many different types of stochastic neuron, but for simplicity we use only

stochastic binary units that have states of

. The state of unit

and the

probability that it is on is:

Prob

(

+ exp(

;

)

(1)

where

is the bias of the unit and

is the weight on a connection from unit

Sometimes the units are driven by the generative weights and other times by the

recognition weights, but the same equation is used in both cases (ﬁgure 1).

In the “wake” phase the units are driven bottom-up using the recognition weights,

producing a representation of the input vector in the ﬁrst hidden layer, a repre-

sentation of this representation in the second hidden layer and so on. All of these

layersof representation combined are called the “total representation” of the input,

and the binary state of each hidden unit,

, in total representation



. This

total representation could be used to communicate the input vector,

, to a receiver.

According to Shannon’s coding theorem, it requires

;

log

bits to communicate an

event that has probability

under a distribution agreed by the sender and receiver.

We assume that the receiver knows the top-down generative weights [3] so these

can be used to create the agreed probability distributions required for communi-

cation. First, the activity of each unit,

, in the top hidden layer is communicated

using the distribution

(



;



)

which is obtained by applying Eq. 1 to the single

generative bias weight of unit

. Then the activities of the units in each lower layer

are communicated using the distribution

(



;



)

obtained by applying Eq. 1

to the already communicated activities in the layer above,



, and the generative

weights,

. The description length of the binary state of unit

is:

(



;



log



;

(

;



) log(

;



)

(2)

The description length for input vector

using the total representation



is simply

the cost of describing all the hidden states in all the hidden layers plus the cost of

describing the input vector given the hidden states

(

; d

(



(



`L

j`

(



(



)

(3)

where

is an index over the

layers of hidden units and

is an index over the

input units which have states

Because the hidden units are stochastic, an input vector will not always be repre-

sented in the same way. In the wake phase, the recognition weights determine a

conditional probability distribution,

(

j

)

over total representations. Neverthe-

less, if the recognition weights are ﬁxed, there is a very simple, on-line method of

modifying the generative weights to minimize the expected cost



(



)

(

; d

)

of describing the input vector using a stochastically chosen total representation.

After using the recognition weights to choose a total representation, each genera-

tive weight is adjusted in proportion to the derivative of equation 3 by using the

purely local delta rule:



s



(



;



)

(4)

where



isa learning rate. Although the units aredrivenbytherecognition weights,

it is only the generative weights that learn in the wake phase. The learning makes

each layer of the total representation better at reconstructing the activities in the

layer below.

It seems obvious that the recognition weights should be adjusted to maximize the

probability of picking the



that minimizes

(

; d

)

. But this is incorrect. When

therearemany alternative waysofdescribing an input vector it ispossible to design

a stochastic coding scheme that takes advantage of the entropy across alternative

descriptions [1]. The cost is then:

(



(



)

(

; d

)

;



(



) log

(



)

(5)

The second term is the entropy of the distribution that the recognition weights

assign to the various alternative representations. If, for example, there are two

alternative representations each of which costs 4 bits, the combined cost is only 3

bits provided we use the two alternatives with equal probability[4]. It is precisely

analagous to the way in which the energies of the alternative states of a physi-

cal system are combined to yield the Helmholtz free energy of the system. As

in physics,

(

)

is minimized when the probabilities of the alternatives are expo-

nentially related to their costs by the Boltzmann distribution (at a temperature of

1):

(



exp(

;

(

; d

))



exp(

;

(

; d

))

(6)

剩余11页未读，继续阅读

评论收藏

内容反馈

jamyo

2015-09-01

很好，弄深度学习必看文章
权嘟嘟

2014-11-16

挺好的，深度学习方向的可以学习学习
pallove

2016-11-01

之前下了一个中文翻译的，现在下来英文版的对照着看一下，谢谢分享了。
nil007

2017-11-30

谢谢楼主分享
amwayy

2017-06-23

深度学习的好资料

ccemmawatson

粉丝: 22
资源: 4

论文：The wake-sleep algorithm for unsupervised neural networks

最新资源

论文：The wake-sleep algorithm for unsupervised neural networks

中文翻译论文：The wake-sleep algorithm for unsupervised neural networks

06年Hinton训练深度置信网的方法wake－sleep

pentaho-aggdesigner-algorithm-5.1.5-jhyde-API文档-中文版.zip

Multi-Verse Optimizer: a nature-inspired algorithm for global optimization.pdf

pentaho-aggdesigner-algorithm-5.1.5-jhyde.tar

pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar.zip

pentaho-aggdesigner-algorithm-5.1.5-jhyde.zip

A Distributed Geo-Routing Algorithm for Wireless Sensor Networks.pdf

最经典的蛙跳算法论文（Shuffled Frog-Leaping Algorithm,英文原版）

A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training

pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar

樽海鞘群算法Mirjalili发表原论文Salp Swarm Algorithm:A bio-inspired optimizer for engineerin

A Self-Stabilizing Algorithm for Maximal Matching in Anonymous Networks

The AES-CMAC Algorithm

The forward-backward algorithm.pdf

java maven 仓库包 pentaho-aggdesigner-algorithm-5.1.3-jhyde.jar

Road Segmentation Supervised by an Extended V-Disparity Algorithm for Autonomous Navigation

SVM-RFE algorithm：SVM-RFE算法.pdf

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

数字信号处理——保研复习资料.pdf

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar