没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Deep Learning
Yoshua Bengio
Ian Goodfellow
Aaron Courville
Contents
Acknowledgments vii
Notation ix
1 Introduction 1
1.1 Who Should Read This Book? . . . . . . . . . . . . . . . . . . . . 8
1.2 Historical Trends in Deep Learning . . . . . . . . . . . . . . . . . 11
I Applied Math and Machine Learning Basics 25
2 Linear Algebra 27
2.1 Scalars, Vectors, Matrices and Tensors . . . . . . . . . . . . . . . 27
2.2 Multiplying Matrices and Vectors . . . . . . . . . . . . . . . . . . 30
2.3 Identity and Inverse Matrices . . . . . . . . . . . . . . . . . . . . 31
2.4 Linear Dependence, Span, and Rank . . . . . . . . . . . . . . . . 32
2.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Special Kinds of Matrices and Vectors . . . . . . . . . . . . . . . 35
2.7 Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 39
2.9 The Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . . . 40
2.10 The Trace Operator . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.11 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.12 Example: Principal Components Analysis . . . . . . . . . . . . . 42
3 Probability and Information Theory 46
3.1 Why Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Marginal Probability . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 51
i
CONTENTS
3.6 The Chain Rule of Conditional Probabilities . . . . . . . . . . . . 52
3.7 Independence and Conditional Independence . . . . . . . . . . . 52
3.8 Expectation, Variance, and Covariance . . . . . . . . . . . . . . . 53
3.9 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.10 Common Probability Distributions . . . . . . . . . . . . . . . . . 57
3.11 Useful Properties of Common Functions . . . . . . . . . . . . . . 62
3.12 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.13 Technical Details of Continuous Variables . . . . . . . . . . . . . 64
3.14 Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . 65
3.15 Example: Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . 68
4 Numerical Computation 74
4.1 Overflow and Underflow . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Poor Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Gradient-Based Optimization . . . . . . . . . . . . . . . . . . . . 76
4.4 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 85
4.5 Example: Linear Least Squares . . . . . . . . . . . . . . . . . . . 87
5 Machine Learning Basics 89
5.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Example: Linear Regression . . . . . . . . . . . . . . . . . . . . . 97
5.3 Generalization, Capacity, Overfitting and Underfitting . . . . . . 99
5.4 The No Free Lunch Theorem . . . . . . . . . . . . . . . . . . . . 104
5.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.6 Hyperparameters, Validation Sets and Cross-Validation . . . . . 108
5.7 Estimators, Bias, and Variance . . . . . . . . . . . . . . . . . . . 110
5.8 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . 118
5.9 Bayesian Statistics and Prior Probability Distributions . . . . . . 121
5.10 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.11 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 131
5.12 Weakly Supervised Learning . . . . . . . . . . . . . . . . . . . . . 134
5.13 The Curse of Dimensionality and Statistical Limitations of Local
Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
II Modern Practical Deep Networks 147
6 Feedforward Deep Networks 149
6.1 From Fixed Features to Learned Features . . . . . . . . . . . . . 149
6.2 Formalizing and Generalizing Neural Networks . . . . . . . . . . 152
6.3 Parametrizing a Learned Predictor . . . . . . . . . . . . . . . . . 154
ii
CONTENTS
6.4 Flow Graphs and Back-Propagation . . . . . . . . . . . . . . . . 167
6.5 Universal Approximation Properties and Depth . . . . . . . . . . 180
6.6 Feature / Representation Learning . . . . . . . . . . . . . . . . . 184
6.7 Piecewise Linear Hidden Units . . . . . . . . . . . . . . . . . . . 186
6.8 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7 Regularization 190
7.1 Regularization from a Bayesian Perspective . . . . . . . . . . . . 191
7.2 Classical Regularization: Parameter Norm Penalty . . . . . . . . 193
7.3 Classical Regularization as Constrained Optimization . . . . . . . 200
7.4 Regularization and Under-Constrained Problems . . . . . . . . . 201
7.5 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 203
7.6 Classical Regularization as Noise Robustness . . . . . . . . . . . 204
7.7 Early Stopping as a Form of Regularization . . . . . . . . . . . . 208
7.8 Parameter Tying and Parameter Sharing . . . . . . . . . . . . . . 215
7.9 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . 215
7.10 Bagging and Other Ensemble Methods . . . . . . . . . . . . . . . 215
7.11 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.12 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 223
8 Optimization for Training Deep Models 226
8.1 Optimization for Model Training . . . . . . . . . . . . . . . . . . 226
8.2 Challenges in Optimization . . . . . . . . . . . . . . . . . . . . . 229
8.3 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . 236
8.4 Approximate Natural Gradient and Second-Order Methods . . . 241
8.5 Conjugate Gradients . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.6 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.7 Hints, Global Optimization and Curriculum Learning . . . . . . . 243
9 Convolutional Networks 248
9.1 The Convolution Operation . . . . . . . . . . . . . . . . . . . . . 248
9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
9.4 Convolution and Pooling as an Infinitely Strong Prior . . . . . . 261
9.5 Variants of the Basic Convolution Function . . . . . . . . . . . . 262
9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.7 Convolutional Modules . . . . . . . . . . . . . . . . . . . . . . . . 269
9.8 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.9 Efficient Convolution Algorithms . . . . . . . . . . . . . . . . . . 271
9.10 Random or Unsupervised Features . . . . . . . . . . . . . . . . . 271
iii
CONTENTS
9.11 The Neuroscientific Basis for Convolutional Networks . . . . . . . 273
9.12 Convolutional Networks and the History of Deep Learning . . . . 280
10 Sequence Modeling: Recurrent and Recursive Nets 281
10.1 Unfolding Flow Graphs and Sharing Parameters . . . . . . . . . . 282
10.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 284
10.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 295
10.4 Deep Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . 296
10.5 Recursive Neural Networks . . . . . . . . . . . . . . . . . . . . . 299
10.6 Auto-Regressive Networks . . . . . . . . . . . . . . . . . . . . . . 299
10.7 Facing the Challenge of Long-Term Dependencies . . . . . . . . . 305
10.8 Handling Temporal Dependencies with N-Grams, HMMs, CRFs
and Other Graphical Models . . . . . . . . . . . . . . . . . . . . . 317
10.9 Combining Neural Networks and Search . . . . . . . . . . . . . . 328
11 Practical methodology 333
11.1 Basic Machine Learning Methodology . . . . . . . . . . . . . . . 333
11.2 Manual Hyperparameter Tuning . . . . . . . . . . . . . . . . . . 334
11.3 Hyper-parameter Optimization Algorithms . . . . . . . . . . . . . 334
11.4 Debugging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 336
12 Applications 339
12.1 Large Scale Deep Learning . . . . . . . . . . . . . . . . . . . . . . 339
12.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
12.3 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 352
12.4 Natural Language Processing and Neural Language Models . . . 353
12.5 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 369
12.6 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 369
III Deep Learning Research 370
13 Structured Probabilistic Models for Deep Learning 372
13.1 The Challenge of Unstructured Modeling . . . . . . . . . . . . . . 373
13.2 Using Graphs to Describe Model Structure . . . . . . . . . . . . . 377
13.3 Advantages of Structured Modeling . . . . . . . . . . . . . . . . . 391
13.4 Learning About Dependencies . . . . . . . . . . . . . . . . . . . . 392
13.5 Inference and Approximate Inference Over Latent Variables . . . 394
13.6 The Deep Learning Approach to Structured Probabilistic Models 395
14 Monte Carlo Methods 400
14.1 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . 400
iv
剩余575页未读,继续阅读
Aimar_yxj
- 粉丝: 1
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
- 3
- 4
- 5
- 6
前往页