Deep learning Technical introdunction

所需积分/C币:3 2018-10-07 14:44:05 1.55MB PDF

最全的DNN概述论文:详解前馈、卷积和循环神经网络技术;论文技术性地介绍了三种最常见的神经网络:前馈神经网络、卷积神经网络和循环神经网络。且该文详细介绍了每一种网络的基本构建块,其包括了基本架构、传播方式、连接方式、激活函数、反向传播的应用和各种优化算法的原理。
Contents 1 Preface 2 Acknowledgement 579 3 Introduction 4 Feedforward Neural networks 11 4.1 Introduction 12 4.2 fnn architecture 13 4.3 Some notations 14 4.4 Weight averaging 14 4.5 Activation function 15 4.6 FNN lavers 20 4.7 Loss function .21 4.8 Regularization techniques 22 4.9 Backpropagation 27 4.10 Which data sample to use for gradient descent? 29 4.11 Gradient optimization techniques 30 4. 12 Weight initialization 32 Appendices 33 4.a Backprop through the output layer 33 4.B Backprop through hidden layers 34 4.C Backprop through batchnorm 34 4. D FNN ResNet(non standard presentation) 35 4. FNN ResNet (more standard presentation 38 4.F Matrix formulation 38 5 Convolutional Neural networks 41 5.1 Introduction 42 5.2 cnn architecture 43 5.3 CNn specificities 43 5.4 Modification to batch normalization 49 5.5 Network architectures 50 5.6 Backpropagation 56 Appendices .... 64 5.a Backprop through BatchNorm 64 5. B Error rate updates: details 4 CONTENTS 5.C Weight update: details 67 5. D Coefficient update: details 8 5.E Practical Simplification 68 5.F Batchpropagation through a resNet module 5. Convolution as a matrix multiplication 72 5. H Pooling as a row matrix maximum 75 Recurrent neural networks 77 6.1 Introduction 78 6.2 RNN-LSTM architecture 78 6.3 Extreme Layers and loss function 80 6. 4 RNn specificities 81 6.5 LSTM specificities 85 Appendices 6. a Backpropagation trough Batch Normalization 6 b RNN Backpropagation·..:.···· 0099 6.C LSTM Backpropagation 6. Peephole connexions 101 7 Conclusion 103 Chapter 1 自」 reface started learning about deep learning fundamentals in February 2017 At this time, I knew nothing about backpropagation, and was com- letely ignorant about the differences between a Feedforward, Con- volutional and a recurrent neural networ As I navigated through the humongous amount of data available on deep learning online, I found myself quite frustated when it came to really un- derstand what deep learning is, and not just applying it with some available library In particular, the backpropagation update rules are seldom derived, and never in index form. Unfortunately for me, I have an"index"mind: seeing a 4 Dimensional convolution formula in matrix form does not do it for me. since I am also stupid enough to like recoding the wheel in low level programming languages, the matrix form cannot be directly converted into working code either I therefore started some notes for my personal use, where i tried to rederive everything from scratch in index form i did so for the vanilla feedforward network then learned about l1 and L2 regularization, dropout[ll, batch normalization[2], several gradient de- scent optimization techniques. Then turned to convolutional networks, from conventional single digit number of layer conv-pool architectures[3 to recent VGGl4 ResNet[5] ones, from local contrast normalization and rectification to bacthnorm. And finally i studied Recurrent Neural Network structures[61 from the standard formulation to the most recent Lstm one] As my work progressed, my notes got bigger and bigger, until a point when I realized I might have enough material to help others starting their own deep learning journey CHAPTER 1. PREFACE This work is bottom-up at its core. If you are searching a working Neural Network in 10 lines of code and 5 minutes of your time, you have come to the wrong place. If you can mentally multiply and convolve 4D tensors, then I have nothing to convey to you either If on the other hand you like(d)to rederive every tiny calculation of every theorem of every class that you stepped into then you might be interested by what follow Chapter 2 aCknowledgements ora his work has no benefit nor added value to the deep learning topic on its own. It is just the reformulation of ideas of brighter researchers to fit a peculiar mindset: the one of prefering formulas with ten indices but where one knows precisely what one is manipulating rather than (in my opinion sometimes opaque) matrix formulations where the dimension of the objects are rarely if ever specified Among the brighter people from whom I learned online are Andrew Ng His Coursera class(here)was the first contact I got with Neural Network, and this pedagogical introduction allowed me to build on solid ground I also wish to particularly thanks Hugo Larochelle, who not only built a wonderful deep learning class(here), but was also kind enough to answer emails from a complete beginner and stranger The Standford class on convolutional networks(here)proved extremely valuable to me, so did the one on Natural Language processing(here) I also benefited greatly from Sebastian Ruder's blog(here), both from the blog pages on gradient descent optimization techniques and from the author himself I learned more about LSTM on colah's blog(here), and some of my draw ings are inspired from there I also thank Jonathan Del Hoyo for the great articles that he regularly shares on LinkedIn Many thanks go to my collaborators at Mediamobile, who let me dig as deep as i wanted on Neural Networks. I am especially indebted to Clement Nicolas, Jessica, Christine and celine CHAPTER 2, ACKNOWLEDGEMENTS Thanks to Jean-Michel loubes and Fabrice Gamboa, from whom I learned a great deal on probability theory and statistics I end this list with my employer, Mediamobile, which has been kind enough to let me work on this topic with complete freedom. a special thanks to Philippe, who supervized me with the perfect balance of feedback and free dom Chapter 3 叫 Introduction da his note aims at presenting the three most common forms of neural network architectures. It does so in a technical though hopefully ped agogical way buiding up in complexity as one progresses through the ap Chapter 4 starts with the first type of network introduced historically: a regular feedforward neural network, itself an evolution of the original per ceptron [8 algorithm. One should see the latter as a non-linear regression and feedforward networks schematically stack perceptron layers on top of one another We will thus introduce in chapter 4 the fundamental building blocks of the simplest neural network layers: weight averaging and activation functions We will also introduce gradient descent as a way to train the network when joint with the backpropagation algorithm, as a way to minimize a loss function adapted to the task at hand(classification or regression). The more technical details of the backpropagation algorithm are found in the appendix of this chapter, alongside with an introduction to the state of the art feedforward neural network the ResNet. One can finally find a short matrix description of the feedforward network In chapter 5, we present the second type of neural network studied: the con volutional networks, particularly suited to treat images and label them. This implies presenting the mathematical tools related to this network: convolution, pooling, stride.. As well as seeing the modification of the builiding block in- troduced in chapter 4. Several convolutional architectures are then presented an the appendices once again detail the difficult steps of the main text Chapter 6 finally presents the network architecture suited for data with a temporal structure -as time series for instance the recurrent neural network ChapTeR 3. INTRODUCTION There again, the novelties and the modifications of the material introduced in the two previous chapters are detailed in the main text, while the appendices give all what one needs to understand the most cumbersome formula of this kind of network architecture

...展开详情
img
yph001
  • 分享达人

    成功上传6个资源即可获取

关注 私信 TA的资源

上传资源赚积分,得勋章
    最新推荐