深度卷积神经网络

所需积分/C币:42 2017-07-31 21:09:25 3.14MB PDF
75
收藏 收藏
举报

深度卷积神经网络
NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification Icarning for visual undcrstanding(Guo ct al. 2016), reviews covering re cent advances in CNNs(Gu et al., 2015, and a taxonomy of DCNNs for computer vision tasks(Srinivas et al. 2016) have been published. However given the surge in the popularity of dcnns for image classification tasks and the subsequent plethora of related papers, we feel the time is right to re- vicw them for this spccific and momentous problcm With this in mind, this review is intended for those who want to understand the development of CNN technology and architecture, specifically for image classification, from their predecessors up to modern state-of-the-art deep learning systems. It also asserts brief insights into their tuture and provides several interesting imminent directions making it suitable for researchers in the field The remainder of this review is organized as follows: Section 2 briefly introduces CNNs and acquaints readers with the key building blocks of their architecture. Section 3 covers the early development of CNNs. among other highlights, it briefly touches on the first applications of backpropaga tion and max pooling, as well as the introduction of the famous mNIST data set LeCun et al. 1998). In section 4, we deal with the role of DC NNs in the deep learning renaissance and this is followed by discussions on selected representative works that have contributed to their popularity for image classification tasks. Section 5 deals with several DCNN improve ment attempts in various aspects including network architecture, nonlinear activation functions, supervision components, regularization mechanisms optimization techniques, and computational cost developments Section 6 concludes the review by introducing some of the remaining challenges and current trends 2 Overview of cnn architecture CNNs are feed forward networks in that information flow takes place in one direction only, from their inputs to their outputs just as artificial neu ral nctworks (ANN) arc biologically inspired, so arc CNNs. The visual cor tex in the brain, which consists of alternating layers of simple and complex cells(hubel wiesel, 1959, 1962), motivates their architecture. CNN ar chitectures come in several variations; however, in general, they consist of convolutional and pooling (or subsampling)layers, which are grouped nto modules. Either one or more fully connected layers, as in a standard feed forward dules. modules stacked on top of each other to form a deep model Figure 1 illustrates typ- ical cnn architecture for a toy image classification task. An image is input directly to the network, and this is followed by several stages of convolu tion and pooling Thereafter, representations from these operations feed one or more fully connected layers. Finally, the last fully connected layer out- puts the class label. Despite this being the most popular base architecture found in the literature, several architecture changes have been proposed in recent years with the objective of improving image classification accuracy or NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang Convolutional lavers Fully connected laye Qutput class Convolution Car B Train Plane Figure 1: CNN image classification pipeline reducing computation costs. Although for the remainder of this section, we merely fleetingly introduce standard CNN architecture, in section 5 we dea with several architectural design changes that have facilitated enhanced im- age classification performance 2.1 Convolutional Layers. The convolutional layers serve as feature extractors, and thus they learn the feature representations of their input images. The neurons in the convolutional layers are arranged into feature maps Each neuron in a feature map has a receptive field, which is con nccted to a ncighborhood of ncurons in the previous laycr via a sct of train- able weights, sometimes referred to as a filter bank (LeCun et al., 2015) Inputs are convolved with the learned weights in order to compute a new feature map, and the convolved results are sent through a nonlinear acti- vation function All neurons within a feature map have weights that are constrained to bc cqual; however, diffcrcnt fcaturc maps within thc same convolutional layer have different weights so that several features can be xtracted at each location(LeCun et al. 1998; LeCun et al. 2015). More fo mally, the kth output feature map Y can be computed as Yk=f(Wk米x) (21) here the input image is denoted by x; the olutional filter related to the kth feature map is denoted by Wk; the multiplication sign in this con- text refers to the 2D convolutional operator which is used to calculate the inner product of the filter model at each location of the input image; and f()represents the nonlinear activation function (Yu, Wang, Chen, Wei 2014). Nonlinear activation functions allow for the extraction of nonlinear features. Traditionally, the sigmoid and hyperbolic tangent functions were used; recently, rectified linear units(Rel. Us; Nair &z Hinton, 2010) have become popular (Lecun et al. 2015). Their popularity and success have NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification opencd up an arca of rcscarch that focuses on the devclopmcnt and ap plication of novel dCnn activation functions to improve several charac- teristics of dCnn performance. Thus, in section 5.2, we formally introduce the relu and discuss the motivations that led to their development, before elaborating on the performance of several rectification-based and alterna tivc activation functions 2.2 Pooling layers The purpose of the pooling layers is to reduce the spatial resolution of the feature maps and thus achieve spatial invariance to input distortions and translations(LeCun et al., 1989a, 1989b; LeCun et al 1998, 2015; Ranzato et al., 2007). Initially, it was common practice to use av erage pooling aggregation layers to propagate the average of all the input values, of a small neighborhood of an image to the next layer(Lecun et al 1989a, 1989b; LeCun et al., 1998). However, in more recent models( Ciresan et al., 2011; Krizhevsky et al., 2012, Simonyan zisserman, 2014; Zeiler Fergus, 2014; Szegedy Liu et al. 2014; Xu et al. 2015), max pooling aggre- gation layers propagate the maximum value within a receptive field to the next layer (Ranzato et al. 2007). Formally, max pooling selects the largest element within each receptive field such that .9)∈9k网, (2.2) where the output of the pooling operation, associated with the kth feature map,is denoted by Yhij, xkpa denotes the element at location (p, q )contained by the pooling region iij, which embodies a receptive field around the po sition(i,j)(Yuet al., 2014). Figure 2 illustrates the difference between max pooling and average pooling. Given an input image of size 4x 4, if a 2x 2 filter and stride of two is applied, max pooling outputs the maximum value of each 2 x 2 region, while average pooling outputs the average rounded integer value of each subsampled region. While the motivations behind the migration toward max pooling are addressed in section 4.2.3, there are also several concerns with max pooling which have led to the development of other pooling schemes. These are introduced in section 5.1.2 2. 3 Fully Connected layers. Several convolutional and pooling layers are usually stacked on top of each other to extract more abstract feature representations in moving through the network. The fully connected layers that follow these layers interpret these feature representations and perform the function of high-level reasoning (hinton et al. 2012, Simonyan zisser man, 2014; Zeiler fergus, 2014). For classification problems, it is standard to use the softmax operator(see sections 5.3. 1 and 5.3.5)on top of a DCNN (Krizhevsky et al. 2012; Lin et al. 2013 Simonyan Zisserman, 2014 Zeiler Fergus, 2014; Szegedy, Liu et al. 2014; Xu et al., 2015). While early NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 N. Rawat and 7. Wang 1594|2 11586 158 87317 81 2373 Max pooling 105 Averag ge pooling 58 Figure 2: Average versus max pooling success was enjoyed by using radial basis functions(RBFs, as the classifier on top of the convolutional towers(LcCun ct al., 1998), Tang(2013)found that replacing the softmax operator with a support vector machine(SVM) leads to improved classification accuracy(see section 5. 3. 4 for further de tails). Moreover, given that computation in the fully connected layers is often challenged by their compute-to-data ratio, a global average-pooling layer(scc scction 5.1.1. 1 for furthcr details), which fccds into a simplc lin- ear classifier, can be used as an alternative (Lin et al. 2013). Notwithstanding these attempts comparing the performance of different classifiers on top of DCNNs still requires further investigation and thus makes for an interest- ing research direction (see section 6 for other intrinsic DCNN trends) 2.4 Training. CNNS, and ANNS in general use learning algorithms to adjust their free parameters(i.e. the biases and weights) in order to at- tain the desired network output. The most common algorithm used for this purpose is backpropagation(LeCun, 1989; LeCun et al., 1998; Bengio, 2009; Deng Yu, 2014; Deng, 2014; Srinivas et al. 2016). Backpropaga tion computes the gradient of an objective(also referred to as a cost/loss performance) function lo delermine how to adjust a network's parameters in order to minimize errors that affect performance. A commonly experi- enced problem with training cnns, and in particular dCnns is overfit ting which is poor performance on a held-out test set after the network is trained on a small or even large training set. This affects the model's ability lo generalize on unseen data and is a major challenge for DCNNs thal can be assuaged by regularization, which is surveyed in section 5.4 NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification 2.5 Discussion. This scction bricfly highlighted somc of thc fundamcn tal aspects related to the basic building blocks of CNNs. Further detailed explanations on the convolution function and its variants and the con- volutional and pooling layers, can be found in Goodfellow, Bengio, and Courville(in press). Furthermore, for convolutional and pooling arithmetic, rcader's are referred to Dumoulin and visin(2016). Detailed explanations on the backpropagtion algorithm and general training protocols for deep neural networks(dNNs) are available in Le Cun et al. (1998)and Goodfel- low et al. (2016), while Lecun et al. (2015) provides a concise summary of the algorithm and supervised learning(one of the major machine learn ing paradigms, together with unsupervised and reinforcement learning in general. A brief history on the development of this popular algorithm specifically for CNNs, is provided in section 3. 2. Finally, some of the DCNN theoretical considerations, many of which are concisely summarized by Koushik(2016), are introduced in section 6.1 3 Early CNn Development In this section we cover the early developments and significant advance- ments of Cnns, from their predecessors up to successful applications prior to the dccp learning renaissance(Hinton ct al. 2006; Hinton Salakhutdi- nov, 2006; Bengio, Lamblin, Popovici, Larochelle, 2007) 3.1 CNN Predecessors Inspired by neuroscience. Biology has inspired several artificial intelligence techniques such as anns, evolutionary algo rithms, and ccllular automata(Florcano mattiussi, 2008 ). Howevcr, per haps the greatest success story among them are Cnns(Goodfellow, bengio Courville, in press). Their history began with the neurobiological exper iments conducted by Hubel and Wiesel(1959, 1962) from as early as 1959 The main contribution of their work was the discovery that neurons in dif- ferent stages of the visual systcm, respondcd strongly to spccific stimulus patterns while ignoring others. More specifically, they found that neurons in the early stages of the primary visual cortex responded strongly to pre cisely oriented patterns of light, such as bars but ignored more complex patterns of the input stimulus that resulted in strong responses from neu rons in later stages. They also found that the visual cortex consisted of sim ple cells, which had local receptive fields, and complex cells, which were invariant to shifted or distorted inputs arranged in a hierarchical fashion These works provided the early inspiration to model our automated vision systems based on characteristics of the central nervous system In 1979, a novel multilayered neural network model, nicknamed the neocognitron, was proposed(Fukushima, 1979). Modeled based on the findings of Hubel and wiesel(1959, 1962), it also consisted of simple and complex cells, cascaded together in a hierarchical manner. With this archi tecture, the network pi ul at recognizing simple input patterns NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang irrespective of a shift in the position or considcrable distortion in the shape of the input pattern(Fukushima, 1980; Fukushima Miyake, 1982). Sig nificantly, the neocognitron laid the groundwork for the development of CNNS. In fact, CNNs were derived from the neocognitron, and hence they have a similar architecture(LeCun et al. 2015) 3.2 Brief History of Backpropagation and the First Application to CNNS. Backpropagation was derived in the 1960s. In particular, SE.Drey fus(1962) derived a simplified version of the algorithm that used the chain rule alone. Nevertheless, the early versions of backpropagation were inef ficient since they backpropagated derivative information from one layer to the preceding layer without openly addressing direct links across lay- ers. Furthermore, they did not consider potential efficiency gains due to network sparseness(Schmidhuber, 2015). The modern efficient form of the algorithm that addressed these issues was derived in 1970(Linnainmaa 1970); however, there was no mention of its use for ANNs. Preliminary discussions for its use for ANNs date back to 1974(Werbos, 1974); how ever, the first known application of efficient backpropagation specifically for ANNs, was described in 1981(Werbos, 1982), but this remained rela tively unknown. Nevertheless, it was"significantly popularized"(Schmid huber, 2015)duc to a seminal paper in 1986 by D E. Rumclhart ct al. (1986) which demonstrated that by using the backpropagation learning algorithm, the internal hidden neurons of an anN could be trained to represent im portant features of the task domain In 1989, LeCun et al. (1989a, 1989b) proposed the first multilayered CNNs and successfully applicd thesc large-scalc nctworks, to rcal (hand- written digits and zip codes) image classification problems. These ini- tial CNNs were reminiscent of the neocognitron(Fukushima, 1979, 1980 Fukushima &z miyake, 1982). However, the key difference was that they were trained in a fully supervised fashion using backpropagation, which was in contrast to the unsupervised reinforcement scheme uscd by thcir predecessor. This allowed them to rely more profoundly on automatic learning rather than hand-designed preprocessing for feature extraction (LeCun et al., 1989a, 1989b; LeCun, 1989), which previously proved to be extreme nallengin g; hence, they form an essential component of many recent competition-winning DCNNs(Krizhevsky et al., 2012; Simonyan Zisserman, 2014; Zeiler Fergus, 2014; Szegedy, Liu et al., 2014) 3.3 Introduction of the mnist data Set. In 1998 the cnns described earlier (LeCun et al. 1989a, 1989b), were improved on and used for the task of individual character classification in a document recognition appli- cation. This work was published in a detailed seminal paper(LeCun et al 1998)that highlighted the main advantages of CNNs when compared to traditional ANNs: they require fewer free parameters(because of weight sharing), and they consider the spatial topology of the input data, thereby NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification 3 1. maps 16@10K1 ature maps 以 S4: f maps 15@5x5 C5:layer F6: LayerOUTPUT Full connection Gaussian connections Subsampling Convolutions Subsampling Full connection Figure 3: Architecture of L e Net-5(LeCun et al, / 1998) allowing them to deal with the variability of 2D shapes. In addition to the proposed CNNs, LeCun et al. (1998)introduced the popular Modified N Lional Institute of Standards and Technology(MNiSt) data set of 70,000 handwritten digits, which has since been used extensively for several com puter vision tasks and, in particular, for image classification and recognition problems. Figure 3 illustrates the architecture of the Cnn, called LeNet-5, proposcd by LeCun ct al.(1998). The diagram clcarly illustrates the desigr of LeNet-5, which consists of alternating convolutional and subsampling lavers, followed by a single fully connected layer 3.4 Early CNn Successes Despite Perceived Issues with Gradient De scent. In the late 1990s and early 2000s neural network research had dimin ished (Simard et al. 2003; LeCun et al. 2015). It was little used for machine Icarning tasks, and computor vision and spccch recognition tasks over looked them It was widely believed that learning useful multistage feature extractors, with little prior knowledge, was infeasible due to issues with the popular optimization algorithm, gradient descent. Specifically, it was thought that basic gradient descent would not recover from poor weight configurations that inhibited the reduction of thc avcrage backpropagated error, a phenomenon known as poor local minima(Le Cun et al, 2015). In contrast,other statistical methods and, in particular, SVMS, became popular due to their successes ( decoste scholkopf, 2002 ). Contrary to this trend, a CNN was proposed for the application of visual document analysis in 2003 (Simard et al. 2003) At a time when CNNs were not popular in the engineering community, Simard et al.(2003)were able to achieve the best-known classification re- sult on the mnist data set (Lecun et al. 1998), improving on the previous best results obtained by the SVMs of Decoste and Scholkopf(2002). Cit ing the advantages that were mentioned by LeCun et al.(1998), utilizing CNNs for visual tasks, they expanded the size and quality of the MniSt data set and proposed the use of simple software loops for the convolu tional operation. These loops exploited the property of backpropagation that allows an ann to be expressed in a modular fashion and this allowed NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang for modular software debugging. Although LeCun ct al. (1998 )had alrcad hypostasized and proved that by increasing the size of the data set, using ar- tificially generated affine transformations, the network's performance will improve, Simard et al.(2003)improved the quality of the increased por tion of the data set to further improve performance. This was accomplished by using clastic image deformations. This work formed part of a scrics of several optical character recognition applications that used CNNs. In par ticular, Microsoft used them for English handwritten digits(Simard et al. 2003; Chellapilla, Shilman, Simard, 2006), Arabic handwriting recogni- tion(Abdulkader, 2006)and East Asian handwritten character recognition Chellapilla Simard, 2006). Thus, these applications, together with the work described by LeCun et al. (1989a, 1989b, 1998), represent some of the early image classification successes enjoyed by CNNs. The background to the next section highlights several other successes 4 The Deep Learning Renaissance and the Rise of DCNNs This section briefly introduces the deep learning renaissance and focuses on the significant contributions of DeNNs to the current surge in deep learning research. It also covers a seminal paper and several representative worl that have led to their recent ascendancy over other image classification technique 4.1 Background to the Deep Learning Renaissance. The first feedfor ward multilayered neural nctworks were trained in 1965 (Ivakhnenko Lapa, 1966), and although they did not use backpropagation, they were per haps the first deep learning systems(Schmidhuber, 2015). Although deep learning-like algorithms have a long history, the term deep learning became a catchphrase around 2006, when deep belief networks(DBNs)and autoen- coders trained in an unsupervised fashion wcre uscd to initialize DNNs trained using backpropagation(Linton et al., 2006; Linton Salakhut- dinov, 2006; Bengio et al., 2007). Prior to this, it was taught that deep multilayered networks (including dCNNs) were too hard to train due to issues with gradient descent and thus were not popular(Bengio et al., 2007; Bengio, 2009, Deng Yu, 2014; Schmidhuber, 2015; Goodfellow et al, in press). Conversely, CNNs were a notable exception and proved easier to train when compared to fully connected networks(Simard et al. 2003, Ben- gio, 2009; LeCun et al. 2015; Good fellow et al. in press). In addition to the successes discussed in section 3.3, some of the other successful appli- cations that incorporated Cnns for their image classification component prior to the resurgence of neural networks in 2006 include medical image segmentation(Ning et al., 2005); facial recognition, detection, and verifica tion (Lawrence, Giles, Tsoi, back, 1997, Garcia Delakis, 2002; Chopra Hadsell, Lecun, 2005 ); off-road obstacle avoidance(Muller, Ben, Cosatto

...展开详情
试读 98P 深度卷积神经网络
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
关注 私信
上传资源赚钱or赚积分
最新推荐
深度卷积神经网络 42积分/C币 立即下载
1/98
深度卷积神经网络第1页
深度卷积神经网络第2页
深度卷积神经网络第3页
深度卷积神经网络第4页
深度卷积神经网络第5页
深度卷积神经网络第6页
深度卷积神经网络第7页
深度卷积神经网络第8页
深度卷积神经网络第9页
深度卷积神经网络第10页
深度卷积神经网络第11页
深度卷积神经网络第12页
深度卷积神经网络第13页
深度卷积神经网络第14页
深度卷积神经网络第15页
深度卷积神经网络第16页
深度卷积神经网络第17页
深度卷积神经网络第18页
深度卷积神经网络第19页
深度卷积神经网络第20页

试读结束, 可继续阅读

42积分/C币 立即下载 >