CONTENTS
6.5 Back-propagation through Random Operations and Graphical
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.6 Universal Approximation Properties and Depth . . . . . . . . . . 192
6.7 Feature / Representation Learning . . . . . . . . . . . . . . . . . 195
6.8 Piecewise Linear Hidden Units . . . . . . . . . . . . . . . . . . . 197
6.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7 Regularization of Deep or Distributed Models 201
7.1 Regularization from a Bayesian Perspective . . . . . . . . . . . . 203
7.2 Classical Regularization: Parameter Norm Penalty . . . . . . . . 204
7.3 Classical Regularization as Constrained Optimization . . . . . . . 212
7.4 Regularization and Under-Constrained Problems . . . . . . . . . 213
7.5 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 214
7.6 Classical Regularization as Noise Robustness . . . . . . . . . . . 216
7.7 Early Stopping as a Form of Regularization . . . . . . . . . . . . 220
7.8 Parameter Tying and Parameter Sharing . . . . . . . . . . . . . . 227
7.9 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . 228
7.10 Bagging and Other Ensemble Methods . . . . . . . . . . . . . . . 230
7.11 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
7.12 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 236
8 Optimization for Training Deep Models 240
8.1 Optimization for Model Training . . . . . . . . . . . . . . . . . . 241
8.2 Challenges in Neural Network Optimization . . . . . . . . . . . . 246
8.3 Optimization Algorithms I: Basic Algorithms . . . . . . . . . . . 259
8.4 Optimization Algorithms II: Adaptive Learning Rates . . . . . . 265
8.5 Optimization Algorithms III: Approximate Second-Order Methods270
8.6 Optimization Algorithms IV: Natural Gradient
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.7 Optimization Strategies and Meta-Algorithms . . . . . . . . . . . 282
9 Convolutional Networks 296
9.1 The Convolution Operation . . . . . . . . . . . . . . . . . . . . . 297
9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
9.4 Convolution and Pooling as an Infinitely Strong Prior . . . . . . 309
9.5 Variants of the Basic Convolution Function . . . . . . . . . . . . 310
9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 316
9.7 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.8 Efficient Convolution Algorithms . . . . . . . . . . . . . . . . . . 319
iii
评论0
最新资源