Neural Networks—Tricks of the Trade (2nd Edition).pdf

It is our belief that researchers and practitioners acquire, through experience and wordofmouth, techniques and heuristics that help them successfully apply neural networks to difficult real world problems. Often these "tricks" are theoretically well motivated. Sometimes they are the result of trial and error. However, their most common link is that they are usually hidden in people's heads or in the back pages of spaceconstrained conference papers. As a result newcomers to the field waste much time wondering why their networks train so slowly and perform so poorly. This book is an outgrowth of a 1996 NIPS workshop called Tricks of the Trade whose goal was to begin the process of gathering and documenting these tricks. The interest that the workshop generated, motivated us to expand our collection and compile it into this book. Although we have no doubt that there are many tricks we have missed, we hope that what we have included will prove to be useful, particularly to those who are relatively new to the field. Each chapter contains one or more tricks presented by a given author (or authors). We have attempted to group related chapters into sections, though we recognize that the different sections are far from disjoint. Some of the chapters (e.g. 1,13,17) contain entire systems of tricks that are far more general than the category they have been placed in.
Volume editors Gregoire montavon Technische Universitat berlin Department of Computer Science Franklinstr. 28/29, 10587 Berlin, Germany Email: gregoire. montavon tuberlin de Genevieve b. orr Willamette University Department of Computer Science 900 State Street. Salem or 97301. USA Email: gorr@ willamette. edu KlausRobert muller Technische Universitat berlin Department of Computer Science Franklinstr. 28/29, 10587 Berlin, Germany and Korea University Department of Brain and Cognitive Engineering Anamdong, Seongbukgu, Seoul 136713, Korea Email: klausrobert mueller @tuberlin. de ISSN03029743 eISSN16113349 ISBN9783642352881 eISBN9783642352898 DOI10.1007/9783642352898 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2012952591 CR Subject Classification(1998): F1, 1.2.6, 1.5.1, C 1.3, F 2, J.3 LNCS Sublibrary: SL 1 Theoretical Computer Science and General Issues O SpringerVerlag Berlin Heidelberg 1998, 2012 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965 in its current version, and permission for use must always be obtained from Springer Violations are liable to prosecution under the German Copyright Law The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typesetting: Cameraready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acidfree paper SpringerispartofSpringerScience+businessMedia(www.springer.com) Preface to the second edition There have been substantial changes in the field of neural networks since the first edition of this book in 1998. Some of them have been driven by external factors such as the increase of available data and computing power. The Internet made public massive amounts of labeled and unlabeled data. The everincreasing raw mass of usergenerated and sensed data is made easily accessible by databases and Web crawlers. Nowadays, anyone having an Internet connection can parse the 4,000,000+ articles available on Wikipedia and construct a dataset out of them. Anyone can capture a Web tv stream and obtain days of video content to test their learning algorithm Another development is the amount of available computing power that ha continued to rise at steady rate owing to progress in hardware design and en gineering. While the number of cycles per second of processors has thresholded due to physics limitations, the slowdown has been offset by the emergence of processing parallelism, best exemplified by the massively parallel graphics pro cessing units(GPU). Nowadays, everybody can buy a gPu board (usually al ready available in consumergrade laptops), install free GPU software, and run computationIntensive simulations at low cost These developments have raised the following question: Can we make use of this large computing power to make sense of these increasingly complex datasets Neural networks are a promising approach, as they have the intrinsic modeling capacity and flexibility to represent the solution. Their intrinsically distributed nature allows one to leverage the massively parallel computing resources During the last two decades, the focus of neural network research and the practice of training neural networks underwent important changes. Learning in deep(or "deep learning ") has to a certain degree displaced the once more preva lent regularization issues, or more precisely, changed the practice of regularizing neural networks. Use of unlabeled data via unsupervised layerwise pretrain ing or deep unsupervised embeddings is now often preferred over traditional regularization schemes such as weight decay or restricted connectivity. This new paradigm has started to spread over a large number of applications such as image recognition, speech recognition, natural language processing, complex systems neuroscience, and computational physics The second edition of the book reloads the first edition with more tricks These tricks arose from 14 years of theory and experimentation(from 1998 to 2012)by some of the world's most prominent neural networks researchers These tricks can make a substantial difference (in terms of speed, ease of im plementation, and accuracy )when it comes to putting algorithms to work on real problems. Tricks may not necessarily have solid theoretical foundations or formal validation. As Yoshua Bengio states in Chap. 19the wisdom distilled here should be taken as a guideline, to be tried and challenged, not as a practice set in stone G. Montavon and K.R. Muller The second part of the new edition starts with tricks to faster optimize neu ral networks and make more efficient use of the potentially infinite stream of data presented to them. Chapter 182 shows that a simple stochastic gradi ent descent (learning one example at a time) is suited for training most neural networks.Chapter[ 1) introduces a large number of tricks and recommenda tions for training feedforward neural networks and choosing the multiple hyper parameters When the representation built by the neural network is highly sensitive to small parameter changes, for example, in recurrent neural networks, secondorder methods based on minibatches such as those presented in Chap 20 9 can be a better choice. The seemingly simple optimization procedures presented in these chapters require their fair share of tricks in order to work optimally. The software Torch7 presented in Chap. 215 provides a fast and modular implementation of these neural networks The novel second part of this volume continues with tricks to incorporate invariance into the model. In the context of image recognition, Chap. 22 shows that translation invariance can be achieved by learning a kmeans representation of image patches and spatially pooling the kmeans activations. Chapter 23 3 shows that invariance can be injected directly in the input space in the form of elastic distortions. Unlabeled data are ubiquitous and using them to capture regularities in data is an important component of many learning algorithms For example, we can learn an unsupervised model of data as a first step, as discussed in Chaps. 24 71 and 25110), and feed the unsupervised representation to a supervised classifier. Chapter261 12 shows that similar improvements can be obtained by learning an unsupervised embedding in the deep layers of a neural network, with added fexibility The book concludes with the application of neural networks to modeling time series and optimal control systems Modeling time series can be done using a very simple technique discussed in Chap. 27 8 that consists of fitting a linear model on top of a"reservoir "that implements a rich set of time series primitives. Chapter 28 l offers an alternative to the previous method by directly identifying the underly ing dynamical system that generates the time series data. Chapter 29 6 presents how these system identification techniques can be used to identify a markov de cision process from the observation of a control system(a sequence of states and actions in the reinforcement learning terminology ). ChapterB0 11] concludes by showing how the control system can be dynamically improved by fitting a neura network as the control system explores the space of states and actions The book intends to provide a timely snapshot of tricks, theory, and algo ithms that are of use. Our hope is that some of the chapters of the new second edition will become our companions when doing experimental workeventually becoming classics, as some of the papers of the first edition have become. Even tually in some years, there may be an urge to reload again eptember 2012 Gregoire Klaus Preface to the Second edition Acknowledgments. This work was supported by the World Class University Pro gram through the National research Foundation of Korea funded by the Ministry of Education, Science, and Technology, under grant R3110008. The editors also acknowledge partial support by DFG(MU 987/171) References [1 Bengio, Y. Practical Recommendations for Gradientbased Training of Deep Ar chitectures. In: Montavon, G, Orr, G.B., Miller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol 7700, pp. 437478. Springer, Heidelberg(2012) 2 Bottou, L: Stochastic Gradient Descent Tricks. In: Montavon, G, Orr, G.B., Miller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700 pp. 421436. Springer, Heidelberg(2012) 3 Ciresan, D.C., Meier, U, Gambardella, L.M., Schmidhuber, J. Deep Big Mul tilayer Perceptrons for Digit Recognition. In: Montavon, G, Orr, G.B., Muller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 581598 Springer, Heidelberg(2012) 4 Coates, A Ng, A.Y.: Learning Feature Representations with kmeans. In: Mon tavon,G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS vol. 7700, pp. 561580. Springer, Heidelberg(2012 5 Collobert, R, Kavukcuoglu, K, Farabet, C. Implementing Neural Networks Effi ciently. In: Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 537557 Springer, Heidelberg(2012) 6 Duell, S, Udluft, S, Sterzing, V. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Montavon, G, Orr, G B, Miller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 7>, pp. 687707. Springer, Heidelberg(2012) 7 Hinton, G.E. A Practical Guide to Training Restricted Boltzmann Machines. In Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn LNCS, vol 7700, pp. 621637. Springer, Heidelberg(2012) 18 Lukosevicius, M. A Practical Guide to Applying Echo State Networks. In: Mon tavon, G, Orr, G.B., Miller, K.R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS vol. 7700, pp. 659686. Springer, Heidelberg(2012 9 Martens, J, Sutskever, I Training Deep and Recurrent Networks with Hessian free Optimization. In: Montavon, G, Orr, G B, Miller, K.R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 479535. Springer, Heidelberg(2012) [10 Montavon, G, Miller, K.R. Deep Boltzmann Machines and the Centering Trick In: Montavon, G, Orr, G B, Muller, K.R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp 621637. Springer, Heidelberg(2012) [11 Riedmiller, M: 10 Steps and Some Tricks to Set Up Neural Reinforcement Con trollers. In: Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 735757. Springer, Heidelberg(2012) [12 Weston, J, Ratle, F, Collobert, R. Deep Learning Via Semisupervised Embed ding. In: Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp 639655. Springer, Heidelberg(2012) [13 Zimmermann, H.G, Tietz, C, Grothmann, R. Forecasting with Recurrent Neural Networks: 12 Tricks. In: NN: Tricks of the Trade. 2nd edn. Lncs. vol. 770 pp 687707. Springer, Heidelberg(2012) Table of Contents Introduction Speeding Learning Preface 1. Efficient BackProp Yann Lecun. Leon bottou. Genevieve b. Orr and KlausRobert Miiller Regularization Techniques to Improve generalization Pref 2. Early Stopping But When? Lute prechelt 3. A Simple Trick for Estimating the Weight Decay Parameter Thorsteinn s. Rognvaldsson 4. Controlling the Hyperparameter Search in MacKay's Bayesian Neural Network framework Tony Plate 5. Adaptive Regularization in Neural Network Modeling Jan larsen Claus surer. Lars nonboe andersen. and Lars kai hansen 6. Large Ensemble Averaging David Horn, Ury Naftaly, and Nathan Intrator Improving Network Models and Algorithmic Tricks Preface 7. Square Unit Augmented, Radially Extended, Multilayer Perceptrons. 14 Gary william flake 8. a Dozen Tricks with Multitask learning rich caruana 9. Solving the IllConditioning in Neural Network Learning Patrick van der smagt and Gerd hirzinger 10. Centering Neural Network Gradient Factors Nicol N. Schraudolph 11. Avoiding Roundoff Error in Backpropagating Derivatives Tony Plate Table of contents Representing and Incorporating Prior Knowledge in Neural Network Training Preface 12. Transformation Invariance in Pattern Recognition Tangent Distance and Tangent Propagation 23 Patrice y. Simard. Yann a. Lecun. John s. Denker. and Bernard victorri 13. Combining neural Networks and contextDriven Search for Online, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandyn Webb, and Richard F. Lyon 14. Neural Network Classification and Prior Class Probabilities.....29 Steve Lawrence, lan Burns, Andrew Back, Ah Chung Tsoi, and C. Lee gile 15. Applying Divide and Conquer to Large Scale Pattern Recognition Jurgen fritsch and Michael Finke Tricks for Time series Preface 16. Forecasting the Economy with Neural Nets: A Survey of Challenges and solutions John Moody 17. How to Train Neural networks Ralph Neuneier and Hans georg Zimmermann Big Learning in Deep Neural Networks Preface 19 18. Stochastic gradient Descent Tricks Leon botto 19. Practical Recommendations for GradientBased Training of Deep architectures 437 Yoshua bengio 20. Training Deep and recurrent Networks with HessianFree Optimization James Martens and Ilya Sutskever 21. Implementing neural Networks efficiently Ronan Collobert, Koray Kavukcuoglu, and Clement farabet Table of contents XI Better Representations: Invariant, Disentangled and Reusable Preface 番, 番番, 22. Learning Feature Representations with KMeans ..561 adam Coates and andrew y no 23. Deep Big Multilayer Perceptrons for Digit Re ecognition Dan Claudiu ciresan, Ueli Meier, Luca Maria gambardella, and Jurgen schmidhuber 24. A Practical Guide to Training Restricted Boltzmann Machines Geoffrey E. Hinton 25. Deep Boltzmann Machines and the Centering Trick Gregoire Montavon and KlausRobert miller 26. Deep Learning via Semisupervised Embedding Jason Weston. Frederic ratle. and ronan collobert Identifying Dynamical Systems for Forecasting and Control Preface 27. A Practical Guide to Applying Echo State Networks Mantas lukosevicias 28. Forecasting with recurrent Neural networks: 12 Tricks HansGeorg Zimmermann, Christoph Tietz, and ralph grothmann 29. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks Siegmund Duell, Steffen Udluft, and Volkmar Sterzing 30. 10 Steps and Some Tricks to Set up Neural reinforcement Controllers 73 Martin riedmiller Author index 759 Subject Index 761 Introduction It is our belief that researchers and practitioners acquire, through experience and wordofmouth, techniques and heuristics that help them successfully apply neural networks to difficult real world problems. Often these "tricks"are theoret ically well motivated. Sometimes they are the result of trial and error. However their most common link is that they are usually hidden in people's heads or in the back pages of spaceconstrained conference papers. As a result newcomers to the field waste much time wondering why their networks train so slowly and perform so poorly. This book is an outgrowth of a 1996 NIPS workshop called Tricks of the Trade whose goal was to begin the process of gathering and documenting these tricks The interest that the workshop generated, motivated us to expand our collection and compile it into this book. Although we have no doubt that there are many tricks we have missed we hope that what we have included will prove to be useful, particularly to those who are relatively new to the field. Each chapter contains one or more tricks presented by a given author (or authors). We have attempted to group related chapters into sections, though we recognize that the different sections are far from disjoint. Some of the chapters(e. g. 1, 13, 17)contain entire systems of tricks that are far more general than the category they have been placed in Before each section we provide the reader with a summary of the tricks cor tained within, to serve as a quick overview and reference. However, we do not recommend applying tricks before having read the accompanying chapter. Each trick may only work in a particular context that is not fully explained in the summary. This is particularly true for the chapters that present systems where combinations of tricks must be applied together for them to be effective Below we give a coarse roadmap of the contents of the individual chapters Speeding Learning The book opens with a chapter based on Leon Bottou and Yann LeCun's popular workshop on efficient backpropagation where they present a system of tricks for speeding the minimization process. Included are tricks that are very simple to implement as well as more complex ones, e.g. based on secondorder methods Though many of the readers may recognize some of these tricks, we believe that this chapter provides both: a thorough explanation of their theoretical basis as well as an understanding of the subtle interactions among them This chapter provides an ideal introduction for the reader. It starts with dis cussing fundamental tricks addressing input representation, initialization, target Previously published in: Orr, G.B. and Miller, K.R.(Eds LNCS 1524, ISBN 9783540653110(1998) G. Montavon et al.(Eds ) NN: Tricks of the Trade, 2nd edn., LNCS 7700, pp. 15, 2012 C SpringerVerlag Berlin Heidelberg 2012
 11.67MB
《Neural Networks: Tricks of the Trade》
20170705三个 bound 不如一个 heuristic，三个 heuristic 不如一个trick
 11.69MB
Neural Networks: Tricks of the Trade, Second Edition
20180329Neural Networks: Tricks of the Trade, Second Edition Editors: Grégoire Montavon, Geneviève B. Orr, KlausRobert Müller 有关神经网络、深度学习Tricks的入门经典书籍。
 19.84MB
Neural.Networks.Tricks.of.the.Trade.PDF【神经网络调参必备手册】
20161212Neural.Networks.Tricks.of.the.Trade.PDF【神经网络调参必备手册】
 11.27MB
Neural Networks Tricks of the Trade Second Edition 英文文字版带目录
20181115There have been substantial changes in the field of neural networks since the first edition of this book in 1998. Some of them have been driven by external factors such as the increase of available ...
 9.65MB
Neural Networks：Tricks of the Trade+无码高清扫描+文字可编辑复制+完整标签+长期存档PDF/A
20171219本书是Grégoire Montavon 2012年推出的第二版书，主要介绍神经网络的训练改进技巧、以及表示等等，本书高清无码扫描，附带完整标签，文字可编辑复制，并以保存为长期归档格式PDF/A！堪称完美！强烈推荐！
 11.79MB
Neural Networks: Tricks of the Trade
20150828LeCun等人著的“Neural Networks: Tricks of the Trade”一书第二版
 19.84MB
neural networks tricks of the trade
20170428neural networks tricks of the trade 神经网络 调优 优化
 11.67MB
Neural Networks Tricks of the Trade
20140721有关NN大大量trick总结，Neural Networks Tricks of the Trade
 11.67MB
Neural Networks_Tricks of the Trade
20150331神经网络至今所有的经典tricks设计集合，对于想搞深度学习却不得入门的同学，好好看看这本书
 13.7MB
[免费完整版]Neural Networks Tricks of the Trade
20170913Neural Networks: Tricks of the Trade, Second Edition Editors: Grégoire Montavon, Geneviève B. Orr, KlausRobert Müller 有关神经网络、深度学习Tricks的入门经典书籍。
 19.84MB
Neural Networks_ Tricks of the Trade_ Second Edition
20170818Neural Networks_ Tricks of the Trade_ Second Edition
 7.20MB
[G. B. Orr, 1998, Springerbook] Neural NetworksTricks of the Trade
20150114这本书讲述了 Neural Network 的一些基础，比如 Stochastic Gradient Descent 的训练技巧，在caffe的代码解析中引用了这一章。需要结合caffe代码加深理解。对于研究卷积神经网络（CNN）的人，这是非常好的一本书。
 4.72MB
Neural Networks and Deep Learning_Michael Nielsen.pdf
20181010Michael Nielsen大神的Neural Networks and Deep Learning在线书籍的离线英文pdf版，本人整理制作，包含结构书签，文字版pdf保证清晰。在线版为http://neuralnetworksanddeeplearning.com/index.html 。 仅供学习...
 2.9MB
Deep Neural Networks for YouTube Recommendations论文翻译.pdf
20171227Deep Neural Networks for YouTube Recommendations论文翻译
 6.18MB
Outlier Analysis 2nd Edition.pdf ——2积分系列
20180328Some guidance is also provided for the practitioner., The second edition of this book is more detailed and is written to appeal to both researchers and practitioners. Significant new material has ...
 1.60MB
A Comprehensive Survey on Graph Neural NETWORKS.pdf
20200227discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose...
 1.18MB
An Introduction to Neural Networks  Patrick van der Smagt.pdf
20180208An Introduction to Neural Networks  Patrick van der Smagt.pdf
 359KB
Reducing the Dimensionality of Data with Neural Networks.pdf
20120103用RBF降维，还可以实现分类 是HIINTON的新作
 25.37MB
Neural Networks: A Comprehensive Foundation (2nd Edition)
20180315Simon Haykin的经典神经网络教程，英文原版，详尽讲解了神经网络的各个方面
 22.6MB
刘知远Introduction to Graph Neural Networks.pdf
20200401Graph neural networks (GNNs) are proposed to combine the feature information and the graph structure to learn better representations on graphs via feature propagation and aggregation. Due to its ...
 18.95MB
Neural.Networks.Tricks.of.the.Trade
20171111Practical Recommendations for Gradientbased Training of Deep Architectures 花足够的精力网上也可以找到，不过大家时间宝贵，刚好我也缺资源分
 943KB
A Survey of the Recent Architectures of Deep Convolutional Neural Networks.pdf
20190728Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown stateoftheart results on various competitive benchmarks. The powerful learning ability of deep ...

下载
SIN2416 Linux用户手册v2.0.pdf
SIN2416 Linux用户手册v2.0.pdf

下载
20210616东北证券首批双创50ETF获批点评：如何审视科创板当下的投资价值.pdf
20210616东北证券首批双创50ETF获批点评：如何审视科创板当下的投资价值.pdf

下载
20210614国泰君安学界纵横系列之九：全球宏观风险因子模型研究.pdf
20210614国泰君安学界纵横系列之九：全球宏观风险因子模型研究.pdf

下载
20210619中信证券房地产行业专题研究：商管，由重及轻，品牌竞争于蓝海.pdf
20210619中信证券房地产行业专题研究：商管，由重及轻，品牌竞争于蓝海.pdf

下载
20210618东吴证券东吴策略·行业风火轮：消费复苏，K型收敛，大众崛起.pdf
20210618东吴证券东吴策略·行业风火轮：消费复苏，K型收敛，大众崛起.pdf

下载
20210618川财证券高端制造~科技行业物联网周报：全球首次，日本公司量产新一代100mm氧化镓晶圆.pdf
20210618川财证券高端制造~科技行业物联网周报：全球首次，日本公司量产新一代100mm氧化镓晶圆.pdf

下载
20210619广发证券食品饮料&社服行业：现制茶饮行业处于高速成长期，品牌和供应链突出的公司最受益.pdf
20210619广发证券食品饮料&社服行业：现制茶饮行业处于高速成长期，品牌和供应链突出的公司最受益.pdf

下载
20210618山西证券航空行业5月报：疫情反复不改国内民航恢复趋势.pdf
20210618山西证券航空行业5月报：疫情反复不改国内民航恢复趋势.pdf

下载
20210616中信建投商业贸易行业：每日优鲜、叮咚买菜在美递交招股说明书，商务部发文加强县域商业体系建设.pdf
20210616中信建投商业贸易行业：每日优鲜、叮咚买菜在美递交招股说明书，商务部发文加强县域商业体系建设.pdf

下载
20210610国泰君安写在首批REITs上市之前：网下询价与公众认购行为，共识之外差异仍存.pdf
20210610国泰君安写在首批REITs上市之前：网下询价与公众认购行为，共识之外差异仍存.pdf