DeepLearning-YoshuaBengio资源-CSDN文库

5星 · 超过95%的资源需积分: 3 14 浏览量 2015-10-04 15:00:49 上传评论收藏 19.5MB PDF 举报

资源推荐

资源详情

资源评论

(Version 03/10/2015)

Yoshua Bengio

Ian J. Goodfellow

Aaron Courville

Book in preparation for MIT Press

http://www.iro.umontreal.ca/~bengioy/dlbook

CONTENTS

3.6 The Chain Rule of Conditional Probabilities . . . . . . . . . . . . 54

3.7 Independence and Conditional Independence . . . . . . . . . . . 54

3.8 Expectation, Variance and Covariance . . . . . . . . . . . . . . . 55

3.9 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.10 Common Probability Distributions . . . . . . . . . . . . . . . . . 59

3.11 Useful Properties of Common Functions . . . . . . . . . . . . . . 65

3.12 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.13 Technical Details of Continuous Variables . . . . . . . . . . . . . 67

3.14 Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . 69

3.15 Example: Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . 70

4 Numerical Computation 77

4.1 Overﬂow and Underﬂow . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Poor Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3 Gradient-Based Optimization . . . . . . . . . . . . . . . . . . . . 79

4.4 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 88

4.5 Example: Linear Least Squares . . . . . . . . . . . . . . . . . . . 90

5 Machine Learning Basics 92

5.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Example: Linear Regression . . . . . . . . . . . . . . . . . . . . . 100

5.3 Generalization, Capacity, Overﬁtting and Underﬁtting . . . . . . 103

5.4 Hyperparameters and Validation Sets . . . . . . . . . . . . . . . . 113

5.5 Estimators, Bias and Variance . . . . . . . . . . . . . . . . . . . . 115

5.6 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . 124

5.7 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.8 Supervised Learning Algorithms . . . . . . . . . . . . . . . . . . . 134

5.9 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . 139

5.10 Weakly Supervised Learning . . . . . . . . . . . . . . . . . . . . . 142

5.11 Building a Machine Learning Algorithm . . . . . . . . . . . . . . 143

5.12 The Curse of Dimensionality and Statistical Limitations of Local

Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

II Deep Networks: Modern Practices 156

6 Feedforward Deep Networks 158

6.1 MLPs from the 1980’s . . . . . . . . . . . . . . . . . . . . . . . . 159

6.2 Estimating Conditional Statistics . . . . . . . . . . . . . . . . . . 163

6.3 Parametrizing a Learned Predictor . . . . . . . . . . . . . . . . . 163

6.4 Flow Graphs and Back-Propagation . . . . . . . . . . . . . . . . 175

CONTENTS

6.5 Back-propagation through Random Operations and Graphical

Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.6 Universal Approximation Properties and Depth . . . . . . . . . . 192

6.7 Feature / Representation Learning . . . . . . . . . . . . . . . . . 195

6.8 Piecewise Linear Hidden Units . . . . . . . . . . . . . . . . . . . 197

6.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7 Regularization of Deep or Distributed Models 201

7.1 Regularization from a Bayesian Perspective . . . . . . . . . . . . 203

7.2 Classical Regularization: Parameter Norm Penalty . . . . . . . . 204

7.3 Classical Regularization as Constrained Optimization . . . . . . . 212

7.4 Regularization and Under-Constrained Problems . . . . . . . . . 213

7.5 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 214

7.6 Classical Regularization as Noise Robustness . . . . . . . . . . . 216

7.7 Early Stopping as a Form of Regularization . . . . . . . . . . . . 220

7.8 Parameter Tying and Parameter Sharing . . . . . . . . . . . . . . 227

7.9 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . 228

7.10 Bagging and Other Ensemble Methods . . . . . . . . . . . . . . . 230

7.11 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

7.12 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 235

7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 236

8 Optimization for Training Deep Models 240

8.1 Optimization for Model Training . . . . . . . . . . . . . . . . . . 241

8.2 Challenges in Neural Network Optimization . . . . . . . . . . . . 246

8.3 Optimization Algorithms I: Basic Algorithms . . . . . . . . . . . 259

8.4 Optimization Algorithms II: Adaptive Learning Rates . . . . . . 265

8.5 Optimization Algorithms III: Approximate Second-Order Methods270

8.6 Optimization Algorithms IV: Natural Gradient

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

8.7 Optimization Strategies and Meta-Algorithms . . . . . . . . . . . 282

9 Convolutional Networks 296

9.1 The Convolution Operation . . . . . . . . . . . . . . . . . . . . . 297

9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

9.4 Convolution and Pooling as an Inﬁnitely Strong Prior . . . . . . 309

9.5 Variants of the Basic Convolution Function . . . . . . . . . . . . 310

9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 316

9.7 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

9.8 Eﬃcient Convolution Algorithms . . . . . . . . . . . . . . . . . . 319

iii

CONTENTS

9.9 Random or Unsupervised Features . . . . . . . . . . . . . . . . . 320

9.10 The Neuroscientiﬁc Basis for Convolutional Networks . . . . . . . 321

9.11 Convolutional Networks and the History of Deep Learning . . . . 327

10 Sequence Modeling: Recurrent and Recursive Nets 330

10.1 Unfolding Flow Graphs and Sharing Parameters . . . . . . . . . . 331

10.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 333

10.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 348

10.4 Encoder-Decoder Sequence-to-Sequence Architectures . . . . . . 348

10.5 Deep Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . 350

10.6 Recursive Neural Networks . . . . . . . . . . . . . . . . . . . . . 352

10.7 The Challenge of Long-Term Dependencies . . . . . . . . . . . . 353

11 Practical methodology 371

11.1 Default Baseline Models . . . . . . . . . . . . . . . . . . . . . . . 373

11.2 Selecting Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 374

11.3 Debugging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 383

12 Applications 388

12.1 Large Scale Deep Learning . . . . . . . . . . . . . . . . . . . . . . 388

12.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

12.3 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 401

12.4 Natural Language Processing and Neural Language Models . . . 405

12.5 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 421

12.6 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 423

III Deep Learning Research 432

13 Structured Probabilistic Models for Deep Learning 434

13.1 The Challenge of Unstructured Modeling . . . . . . . . . . . . . . 435

13.2 Using Graphs to Describe Model Structure . . . . . . . . . . . . . 439

13.3 Advantages of Structured Modeling . . . . . . . . . . . . . . . . . 453

13.4 Learning about Dependencies . . . . . . . . . . . . . . . . . . . . 454

13.5 Inference and Approximate Inference over Latent Variables . . . 456

13.6 The Deep Learning Approach to Structured Probabilistic Models 457

14 Monte Carlo Methods 462

14.1 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . 462

14.2 The Diﬃculty of Mixing between Well-Separated Modes . . . . . 464

剩余704页未读，继续阅读

评论收藏

内容反馈

tulangui

2016-06-25

这是一本好书，收下了

jerryin

粉丝: 2
资源: 9

Deep Learning - Yoshua Bengio

最新资源

Deep Learning - Yoshua Bengio

deep learning-YoshuaBengio（英文原书）

Deep Learning- by Yoshua Bengio

Yoshua Bengio《Deep Learning》

Deep Learning_Bengio

Deep Learning(Bengio)

Deep Learning（Yoshua Bengio 2015）

中文版Deep Learning - Yoshua Bengio

《Deep Learning》 Ian Goodfellow Yoshua Bengio Aaron Courville.zip

《Deep Learning》谷歌科学家Yoshua Bengio等

Deep Learning ( Ian Goodfellow Yoshua Bengio Aaron Courville)

Deep Learning-Bengio

Deep Learning Bengio

Bengio的《Deep Learning》

Bengio Deep Learning

Deep Learning, Yoshua著

深度学习(Deep Learning) Yoshua Bengio & Ian GoodFellow中文版

Deep Learning Yoshua Bengio

Deep Learning-LeCun、Bengio和Hinton三大牛的综述

Deep Learning: Theoretical Motivations_Yoshua Bengio

Deep Learning全面介绍---Yoshua Bengio

Deep Learning by Y. Bengio

Deep Learning英文版原著（Bengio）

Deep Learning Yoshua Bengio教授作品

《Deep Learning》(Ian Goodfellow, Yoshua Bengio)

Deep Learning.pdf Ian Goodfellow Yoshua Bengio Aaron Courville

Vector Davinci官方帮助配置使用手册（AutoSAR）.pdf

c++入门，核心，提高讲义笔记

最新资源