Learning Deep Architectures for AI.pdf

所需积分/C币:50 2018-04-18 16:54:16 894KB PDF
收藏 收藏

[Foundations and Trends in Machine Learning] Yoshua Bengio - Learning Deep Architectures for AI (2009, Now Publishers Inc).pdf
10 Conclusion 110 Acknowledgments 112 References 113 Foundations and trends Machine Learning Vol.2,No.1(2009)1-127 noUN e 2009 Y Bengio DOI:10.1561/22000000 the essence of knowlede Learning Deep Architectures for Al Yoshua Bengio Dept IRO. Universite de Montreal, c.P. 6128, Montreal, Qc, HSC3J7 Canada, yoshua bengio@montreal. ca Abstract Theoretical results suggest that in order to learn the kind of com- plicated functions that can represent high-level abstractions(e.g, in vision, language, and other AI-level tasks), one may need deep arch itec tures. Deep architectures are composed of multiple levels of non-linear perations, such as in neural nets with many hidden layers or in com plicated propositional formulae re-using many sub-formulae Searching the parameter space of deep architectures is a difficult task, but learning algorithns such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state of-the-art in certain areas. This monograph discusses the motivations and principles regarding learning algorithms for deep architectures, in particular Chose exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks 1 Introduction Allowing computers to model our world well enough to exhibit what we call intelligence has been the focus of more than half a century of research. To achieve this, it is clear that a large quantity of informa- tion about our world should somehow be stored, explicitly or implicitly in the computer. Because it seems daunting to formalize manually all that information in a form that computers can use to answer ques tions and generalize to new contexts, many researchers have turned to learning algorithms to capture a large fraction of that information Much progress has been made to understand and improve learning algorithms, but the challenge of artificial intelligence(AI)remains. Do we have algorithms that can understand scenes and describe them in natural language? Not really, except in very limited settings. Do we havc algorithms that can infer enough scmantic concepts to bc able to interact with most humans using these concepts? No. If we consider image understanding, one of the best specified of the AI tasks, we real- ize that we do not yet have learning algorithms that can discover the many visual and scmantic concepts that would sccm to be nccessary to interpret most images on the web. The situation is similar for other Al tasks vcry high levcl reprcscntation MAN SITTING… etc lightly higher level representation raw input vector representation =23|1920 感: Fig 1.1 We would like the raw input image to be transformed into gradually higher levels of representation, representing more and more abstract functions of the raw input, e. g, edges, local shapes, object parts, ctc. In practicc, we do not know in advance what the "right representation should be for all these levels of abstractions, although linguistic concepts might help guessing what the higher levels should implicitly represent Consider for example the task of interpreting an input image such as the one in Figure 1.1. When humans try to solve a particular AI task (such as machine vision or natural language processing), they often exploit their intuition about how to decompose the problem into sub problems and multiple levels of representation, e.g., in object parts and constellation models[138, 179, 197] where models for parts can be re-used in different object instances. For example, the current state of-thc-art in machinc vision involvcs a scqucncc of modules starting from pixels and ending in a linear or kernel classifier [134, 145, with intermedia.te modules mixing engineered transformations and learning 4 Introduction e. g first extracting low-level features that are invariant to small geo- metric variations(such as edge detectors from Gabor filters), transform ing them gradually (e.g, to make them invariant to contrast changes and contrast inversion, sometimes by pooling and sub-sampling), and then detecting the most frequent patterns. a plausible and common way to extract useful information from a natural image involves trans forming the raw pixel representation into gradually more abstract rep resenlalions, e.g., starting froin the presence of edges, the detection of more complex but local shapes, up to the identification of abstract cat egories associated with sub-objects and objects which are parts of the image, and putting all these together to capture enough understanding of the scene to answer questions about it Here, we assume that the computational machinery necessary to express complex behaviors (which one might label "intelligent requires highly varying mathematical functions, i. e, mathematical func- Lions that are highly non-linear in lerins of raw sensory inpuls, and display a very large number of variations (ups and downs) across the domain of interest. We view the raw input to the learning system as a high dimensional entity, made of many observed variables, which are related by unknown intricate statistical relationships. For example using knowledge of the 3d geometry of solid objects and lighting. we can relate small variations in underlying physical and geometric fac tors(such as position, orientation, lighting of an object) with changes in pixel intensities for all the pixels in anl image. We call these JacloT's of variation because they are different aspects of the data that can vary separately and often independently. In this case, explicit knowledge of e physical factors involved allows one to get a picture of the math ematical form of these dependencies, and of the shape of the set of images(as points in a high-dimensional space of pixel intensities)asso- ciated with the same 3D object. If a machine captured the factors that explain the statistical variations in the data, and how they interact to generate the kind of data we observe, we would be able to say that the machine understands those aspects of the world covered by these factors of variation. Unfortunately, in general and for most factors of variation underlying natural images, we do not have an analytical understand ing of these factors of variation. We do not have enough formalized 1.1 How do We Train Deep Architectures?5 prior knowledge about the world to explain the observed variety of images, even for such an apparently simple abstraction as man, illus- tra.ted in Figure 1. 1. a high-level abstraction such as man has the property that it corresponds to a very large set of possible images which might be very different from each other from the point of view of simple Euclidean distance in the space of pixel intensities. The set f images for which that label could be appropriate forms a highly con voluLed region in pixel space Chat is not even necessarily a connected region. The Man category can be seen as a high-level abstraction with respect to the space of images. What we call abstraction here can be a category(such as the Man category)or a feature, a function of sensory dala, which can be discrete(e. g, the input sentence is at the past tense)or continuous(e. g, the input video shows an object moving at 2 meter/second ) Many lower-level and intermediate-level concepts (which we also call abstractions here) would be useful to construct a MAN-detector. Lower level abstractions are nore directly tied lo particular percepts, whereas higher level ones are what we call"more abstract" because their connection to actual percepts is more remote, and through other, intermediate-level abstractions In addition to the difficulty of coming up with the appropriate inter mediate abstractions, the number of visual and semantic categories (such as MAN that we would like an "intelligent"machine to cap ture is rather large. The focus of deep architecture learning is to auto Inalically discover such abstractions, fronm the lowest level features lo the highest level concepts. Ideally, we would like learning algorithms that enable this discovery with as little human effort as possible,i.e without having to manually define all necessary abstractions or hav- ing to provide a huge set of relevant hand-labeled examples. If these algorithms could tap into the huge resource of text and images on the web, it would certainly help to transfer much of human knowledge into machine-interpretable form 1.1 How do We Train Deep Architectures? Deep learning methods aim at learning feature hierarchies with fea- tures from higher levels of the hierarchy formed by the composition of 6 Introducti lower level features. Automatically learning features at multiple levels of a bstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features. This is especially important for higher-level abstractions, which humans often do not know how to specify explic- Itly in terms of raw sensory input. The ability to automatically learn powerful features will become increasingly important as the amount of data and range of applicalions to machine learning Inethods continues to grow Depth of architecture refers to the number of levels of composition of non-linear operations in the function learned. Whereas most cur- rent learning algorithns correspond to shallow archileclures(1 3 levels), the mammal brain is organized in a deep architecture [173 with a given input percept represented at multiple levels of abstrac tion, each level corresponding to a different area of cortex. Humans oflen describe such concepts in hierarchical ways, with multiple levels of abstraction. The brain also appears to process information through multiple stages of transformation and representation. This is partic lar ly clear in the primate visual system [1731, with its sequence of processing stages: detection of edges, primitive shapes, and moving up to gradually more complex visual shapes Inspired by the architectural depth of the brain, neural network researchers had wanted for decades to train deep multi-layer neural 19,191, but no successful at tempts were reporled before 2006: researchers reported positive experimental results with typically two or three levels (i.e, one or two hidden layers), but training deeper networks consistently yielded poorer results. Something that can be considered a breakthrough happened in 2006: Hinton et al. at Univer sity of Toronto introduced Deep Belief Networks(DBNs)[73, with a learning algorithm that greedily trains one layer at a time, exploiting an unsupervised learning algorithm for each layer, a Restricted Boltz- mann Machine(RBM)51. Shortly after, related algorithms based on auto-encoders were proposed [17, 153, apparently exploiting the 1 Except for neural networks with a special structure called convolutional networks, dis- cussed in Section 4.5 1.2 Sharing Features and Abstractions Across Tasks 7 same principle: guiding the training of intermediate levels of represen tation using unsupervised learning, which can be performed locally at each level. other algorithms for deep architectures were proposed more recently that exploit neither RBMs nor auto-encoders and that exploit the same principle [131, 202(see Section 4) ince 2006, deep networks have been applied with success not ly in classification tasks 17,99,111,150,153,1951, but also in regression [160, dimensionality reduction [74, 158, modeling Lex tures 1411, modeling motion [ 182, 183, object segmentation [114 information retrieval [154, 159, 190, robotics60, natural language processing [ 37, 130, 202, and collaborative filtering [162. Althougl auto-encoders. RBMs and dbns can be trained with unlabeled data in many of the above applications, they have been successfully used to initialize deep supervised feedforward neural networks applied to a specific task 1.2 Intermediate Representations: Sharing Features and Abstractions Across Tasks Since a deep architecture can be seen as the composition of a series of processing stages, the immediate question that deep architectures raise is: what kind of representation of the data should be found as the output of each stage (i.e., the input of another )? What kind of interface should there be between these stages? A hallmark of recent research on deep architectures is the focus on these intermediate representations: the success of dccp architectures bclongs to the representations Icarncd in an unsupervised way by RBMs [73, ordinary auto-encoders [17 sparse auto-encoders [150, 153, or denoising auto-encoders [195. These algo- rithms (described in more detail in Section 7.2) can be seen as learn- ing to transform one representation(the output of the previous stage into another, at each step may be disentangling better the factors of variations underlying the data. As we discuss at length in Section 4 it has been observed again and again that once a good representa- tion has bccn found at cach lcvel. it can bc uscd to initialize and successfully train a deep neural network by supervised gradient-based optImization

试读 127P Learning Deep Architectures for AI.pdf
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
Learning Deep Architectures for AI.pdf 50积分/C币 立即下载
Learning Deep Architectures for AI.pdf第1页
Learning Deep Architectures for AI.pdf第2页
Learning Deep Architectures for AI.pdf第3页
Learning Deep Architectures for AI.pdf第4页
Learning Deep Architectures for AI.pdf第5页
Learning Deep Architectures for AI.pdf第6页
Learning Deep Architectures for AI.pdf第7页
Learning Deep Architectures for AI.pdf第8页
Learning Deep Architectures for AI.pdf第9页
Learning Deep Architectures for AI.pdf第10页
Learning Deep Architectures for AI.pdf第11页
Learning Deep Architectures for AI.pdf第12页
Learning Deep Architectures for AI.pdf第13页
Learning Deep Architectures for AI.pdf第14页
Learning Deep Architectures for AI.pdf第15页
Learning Deep Architectures for AI.pdf第16页
Learning Deep Architectures for AI.pdf第17页
Learning Deep Architectures for AI.pdf第18页
Learning Deep Architectures for AI.pdf第19页
Learning Deep Architectures for AI.pdf第20页

试读结束, 可继续阅读

50积分/C币 立即下载