Neural Networks—Tricks of the Trade (2nd Edition).pdf

所需积分/C币:11 2019-08-22 14:16:31 11.69MB PDF

It is our belief that researchers and practitioners acquire, through experience and word-of-mouth, techniques and heuristics that help them successfully apply neural networks to difficult real world problems. Often these "tricks" are theoretically well motivated. Sometimes they are the result of tri
Volume editors Gregoire montavon Technische Universitat berlin Department of Computer Science Franklinstr. 28/29, 10587 Berlin, Germany E-mail: gregoire. montavon tu-berlin de Genevieve b. orr Willamette University Department of Computer Science 900 State Street. Salem or 97301. USA E-mail: gorr@ willamette. edu Klaus-Robert muller Technische Universitat berlin Department of Computer Science Franklinstr. 28/29, 10587 Berlin, Germany and Korea University Department of Brain and Cognitive Engineering Anam-dong, Seongbuk-gu, Seoul 136-713, Korea E-mail: klaus-robert mueller @tu-berlin. de ISSN03029743 e-ISSN1611-3349 ISBN978-3-642-35288-1 e-ISBN978-3-642-35289-8 DOI10.1007/978-3-642-352898 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2012952591 CR Subject Classification(1998): F1, 1.2.6, 1.5.1, C 1.3, F 2, J.3 LNCS Sublibrary: SL 1- Theoretical Computer Science and General Issues O Springer-Verlag Berlin Heidelberg 1998, 2012 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965 in its current version, and permission for use must always be obtained from Springer Violations are liable to prosecution under the German Copyright Law The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SpringerispartofSpringerScience+businessMedia( Preface to the second edition There have been substantial changes in the field of neural networks since the first edition of this book in 1998. Some of them have been driven by external factors such as the increase of available data and computing power. The Internet made public massive amounts of labeled and unlabeled data. The ever-increasing raw mass of user-generated and sensed data is made easily accessible by databases and Web crawlers. Nowadays, anyone having an Internet connection can parse the 4,000,000+ articles available on Wikipedia and construct a dataset out of them. Anyone can capture a Web tv stream and obtain days of video content to test their learning algorithm Another development is the amount of available computing power that ha continued to rise at steady rate owing to progress in hardware design and en- gineering. While the number of cycles per second of processors has thresholded due to physics limitations, the slow-down has been offset by the emergence of processing parallelism, best exemplified by the massively parallel graphics pro cessing units(GPU). Nowadays, everybody can buy a gPu board (usually al- ready available in consumer-grade laptops), install free GPU software, and run computation-Intensive simulations at low cost These developments have raised the following question: Can we make use of this large computing power to make sense of these increasingly complex datasets Neural networks are a promising approach, as they have the intrinsic modeling capacity and flexibility to represent the solution. Their intrinsically distributed nature allows one to leverage the massively parallel computing resources During the last two decades, the focus of neural network research and the practice of training neural networks underwent important changes. Learning in deep(or "deep learning ") has to a certain degree displaced the once more preva lent regularization issues, or more precisely, changed the practice of regularizing neural networks. Use of unlabeled data via unsupervised layer-wise pretrain ing or deep unsupervised embeddings is now often preferred over traditional regularization schemes such as weight decay or restricted connectivity. This new paradigm has started to spread over a large number of applications such as image recognition, speech recognition, natural language processing, complex systems neuroscience, and computational physics The second edition of the book reloads the first edition with more tricks These tricks arose from 14 years of theory and experimentation(from 1998 to 2012)by some of the world's most prominent neural networks researchers These tricks can make a substantial difference (in terms of speed, ease of im plementation, and accuracy )when it comes to putting algorithms to work on real problems. Tricks may not necessarily have solid theoretical foundations or formal validation. As Yoshua Bengio states in Chap. 19the wisdom distilled here should be taken as a guideline, to be tried and challenged, not as a practice set in stone G. Montavon and K.-R. Muller The second part of the new edition starts with tricks to faster optimize neu ral networks and make more efficient use of the potentially infinite stream of data presented to them. Chapter 182 shows that a simple stochastic gradi- ent descent (learning one example at a time) is suited for training most neural networks.Chapter[ 1) introduces a large number of tricks and recommenda- tions for training feed-forward neural networks and choosing the multiple hyper parameters When the representation built by the neural network is highly sensitive to small parameter changes, for example, in recurrent neural networks, second-order methods based on mini-batches such as those presented in Chap 20 9 can be a better choice. The seemingly simple optimization procedures presented in these chapters require their fair share of tricks in order to work optimally. The software Torch7 presented in Chap. 215 provides a fast and modular implementation of these neural networks The novel second part of this volume continues with tricks to incorporate invariance into the model. In the context of image recognition, Chap. 22 shows that translation invariance can be achieved by learning a k-means representation of image patches and spatially pooling the k-means activations. Chapter 23 3 shows that invariance can be injected directly in the input space in the form of elastic distortions. Unlabeled data are ubiquitous and using them to capture regularities in data is an important component of many learning algorithms For example, we can learn an unsupervised model of data as a first step, as discussed in Chaps. 24 71 and 25110), and feed the unsupervised representation to a supervised classifier. Chapter261 12 shows that similar improvements can be obtained by learning an unsupervised embedding in the deep layers of a neural network, with added fexibility The book concludes with the application of neural networks to modeling time series and optimal control systems Modeling time series can be done using a very simple technique discussed in Chap. 27 8 that consists of fitting a linear model on top of a"reservoir "that implements a rich set of time series primitives. Chapter 28 l offers an alternative to the previous method by directly identifying the underly ing dynamical system that generates the time series data. Chapter 29 6 presents how these system identification techniques can be used to identify a markov de cision process from the observation of a control system(a sequence of states and actions in the reinforcement learning terminology ). ChapterB0 11] concludes by showing how the control system can be dynamically improved by fitting a neura network as the control system explores the space of states and actions The book intends to provide a timely snapshot of tricks, theory, and algo ithms that are of use. Our hope is that some of the chapters of the new second edition will become our companions when doing experimental work--eventually becoming classics, as some of the papers of the first edition have become. Even tually in some years, there may be an urge to reload again eptember 2012 Gregoire Klaus Preface to the Second edition Acknowledgments. This work was supported by the World Class University Pro- gram through the National research Foundation of Korea funded by the Ministry of Education, Science, and Technology, under grant R31-10008. The editors also acknowledge partial support by DFG(MU 987/17-1) References [1 Bengio, Y. Practical Recommendations for Gradient-based Training of Deep Ar chitectures. In: Montavon, G, Orr, G.B., Miller, K.-R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol 7700, pp. 437-478. Springer, Heidelberg(2012) 2 Bottou, L: Stochastic Gradient Descent Tricks. In: Montavon, G, Orr, G.B., Miller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700 pp. 421-436. Springer, Heidelberg(2012) 3 Ciresan, D.C., Meier, U, Gambardella, L.M., Schmidhuber, J. Deep Big Mul tilayer Perceptrons for Digit Recognition. In: Montavon, G, Orr, G.B., Muller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 581-598 Springer, Heidelberg(2012) 4 Coates, A Ng, A.Y.: Learning Feature Representations with k-means. In: Mon- tavon,G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS vol. 7700, pp. 561-580. Springer, Heidelberg(2012 5 Collobert, R, Kavukcuoglu, K, Farabet, C. Implementing Neural Networks Effi ciently. In: Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 537-557 Springer, Heidelberg(2012) 6 Duell, S, Udluft, S, Sterzing, V. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Montavon, G, Orr, G B, Miller, K.R.(eds NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 7>, pp. 687-707. Springer, Heidelberg(2012) 7 Hinton, G.E. A Practical Guide to Training Restricted Boltzmann Machines. In Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn LNCS, vol 7700, pp. 621-637. Springer, Heidelberg(2012) 18 Lukosevicius, M. A Practical Guide to Applying Echo State Networks. In: Mon- tavon, G, Orr, G.B., Miller, K.R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS vol. 7700, pp. 659-686. Springer, Heidelberg(2012 9 Martens, J, Sutskever, I Training Deep and Recurrent Networks with Hessian free Optimization. In: Montavon, G, Orr, G B, Miller, K.R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 479-535. Springer, Heidelberg(2012) [10 Montavon, G, Miller, K.R. Deep Boltzmann Machines and the Centering Trick In: Montavon, G, Orr, G B, Muller, K.-R.(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp 621-637. Springer, Heidelberg(2012) [11 Riedmiller, M: 10 Steps and Some Tricks to Set Up Neural Reinforcement Con trollers. In: Montavon, G, Orr, G.B., Miller, K.-R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 735-757. Springer, Heidelberg(2012) [12 Weston, J, Ratle, F, Collobert, R. Deep Learning Via Semi-supervised Embed ding. In: Montavon, G, Orr, G.B., Miller, K.R(eds )NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp 639-655. Springer, Heidelberg(2012) [13 Zimmermann, H.-G, Tietz, C, Grothmann, R. Forecasting with Recurrent Neural Networks: 12 Tricks. In: NN: Tricks of the Trade. 2nd edn. Lncs. vol. 770 pp 687-707. Springer, Heidelberg(2012) Table of Contents Introduction Speeding Learning Preface 1. Efficient BackProp Yann Lecun. Leon bottou. Genevieve b. Orr and Klaus-Robert Miiller Regularization Techniques to Improve generalization Pref 2. Early Stopping- But When? Lute prechelt 3. A Simple Trick for Estimating the Weight Decay Parameter Thorsteinn s. Rognvaldsson 4. Controlling the Hyperparameter Search in MacKay's Bayesian Neural Network framework Tony Plate 5. Adaptive Regularization in Neural Network Modeling Jan larsen Claus surer. Lars nonboe andersen. and Lars kai hansen 6. Large Ensemble Averaging David Horn, Ury Naftaly, and Nathan Intrator Improving Network Models and Algorithmic Tricks Preface 7. Square Unit Augmented, Radially Extended, Multilayer Perceptrons. 14 Gary william flake 8. a Dozen Tricks with Multitask learning rich caruana 9. Solving the Ill-Conditioning in Neural Network Learning Patrick van der smagt and Gerd hirzinger 10. Centering Neural Network Gradient Factors Nicol N. Schraudolph 11. Avoiding Roundoff Error in Backpropagating Derivatives Tony Plate Table of contents Representing and Incorporating Prior Knowledge in Neural Network Training Preface 12. Transformation Invariance in Pattern Recognition- Tangent Distance and Tangent Propagation 23 Patrice y. Simard. Yann a. Lecun. John s. Denker. and Bernard victorri 13. Combining neural Networks and context-Driven Search for On-line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandyn Webb, and Richard F. Lyon 14. Neural Network Classification and Prior Class Probabilities.....29 Steve Lawrence, lan Burns, Andrew Back, Ah Chung Tsoi, and C. Lee gile 15. Applying Divide and Conquer to Large Scale Pattern Recognition Jurgen fritsch and Michael Finke Tricks for Time series Preface 16. Forecasting the Economy with Neural Nets: A Survey of Challenges and solutions John Moody 17. How to Train Neural networks Ralph Neuneier and Hans georg Zimmermann Big Learning in Deep Neural Networks Preface 19 18. Stochastic gradient Descent Tricks Leon botto 19. Practical Recommendations for Gradient-Based Training of Deep architectures 437 Yoshua bengio 20. Training Deep and recurrent Networks with Hessian-Free Optimization James Martens and Ilya Sutskever 21. Implementing neural Networks efficiently Ronan Collobert, Koray Kavukcuoglu, and Clement farabet Table of contents XI Better Representations: Invariant, Disentangled and Reusable Preface 番, 番番, 22. Learning Feature Representations with K-Means ..561 adam Coates and andrew y no 23. Deep Big Multilayer Perceptrons for Digit Re ecognition Dan Claudiu ciresan, Ueli Meier, Luca Maria gambardella, and Jurgen schmidhuber 24. A Practical Guide to Training Restricted Boltzmann Machines Geoffrey E. Hinton 25. Deep Boltzmann Machines and the Centering Trick Gregoire Montavon and Klaus-Robert miller 26. Deep Learning via Semi-supervised Embedding Jason Weston. Frederic ratle. and ronan collobert Identifying Dynamical Systems for Forecasting and Control Preface 27. A Practical Guide to Applying Echo State Networks Mantas lukosevicias 28. Forecasting with recurrent Neural networks: 12 Tricks Hans-Georg Zimmermann, Christoph Tietz, and ralph grothmann 29. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks Siegmund Duell, Steffen Udluft, and Volkmar Sterzing 30. 10 Steps and Some Tricks to Set up Neural reinforcement Controllers 73 Martin riedmiller Author index 759 Subject Index 761 Introduction It is our belief that researchers and practitioners acquire, through experience and word-of-mouth, techniques and heuristics that help them successfully apply neural networks to difficult real world problems. Often these "tricks"are theoret ically well motivated. Sometimes they are the result of trial and error. However their most common link is that they are usually hidden in people's heads or in the back pages of space-constrained conference papers. As a result newcomers to the field waste much time wondering why their networks train so slowly and perform so poorly. This book is an outgrowth of a 1996 NIPS workshop called Tricks of the Trade whose goal was to begin the process of gathering and documenting these tricks The interest that the workshop generated, motivated us to expand our collection and compile it into this book. Although we have no doubt that there are many tricks we have missed we hope that what we have included will prove to be useful, particularly to those who are relatively new to the field. Each chapter contains one or more tricks presented by a given author (or authors). We have attempted to group related chapters into sections, though we recognize that the different sections are far from disjoint. Some of the chapters(e. g. 1, 13, 17)contain entire systems of tricks that are far more general than the category they have been placed in Before each section we provide the reader with a summary of the tricks cor tained within, to serve as a quick overview and reference. However, we do not recommend applying tricks before having read the accompanying chapter. Each trick may only work in a particular context that is not fully explained in the summary. This is particularly true for the chapters that present systems where combinations of tricks must be applied together for them to be effective Below we give a coarse roadmap of the contents of the individual chapters Speeding Learning The book opens with a chapter based on Leon Bottou and Yann LeCun's popular workshop on efficient backpropagation where they present a system of tricks for speeding the minimization process. Included are tricks that are very simple to implement as well as more complex ones, e.g. based on second-order methods Though many of the readers may recognize some of these tricks, we believe that this chapter provides both: a thorough explanation of their theoretical basis as well as an understanding of the subtle interactions among them This chapter provides an ideal introduction for the reader. It starts with dis- cussing fundamental tricks addressing input representation, initialization, target Previously published in: Orr, G.B. and Miller, K.R.(Eds LNCS 1524, ISBN 978-3-540-65311-0(1998) G. Montavon et al.(Eds ) NN: Tricks of the Trade, 2nd edn., LNCS 7700, pp. 15, 2012 C Springer-Verlag Berlin Heidelberg 2012

  • 分享宗师


关注 私信 TA的资源