深度卷积神经网络


-
深度卷积神经网络
NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification Icarning for visual undcrstanding(Guo ct al. 2016), reviews covering re cent advances in CNNs(Gu et al., 2015, and a taxonomy of DCNNs for computer vision tasks(Srinivas et al. 2016) have been published. However given the surge in the popularity of dcnns for image classification tasks and the subsequent plethora of related papers, we feel the time is right to re- vicw them for this spccific and momentous problcm With this in mind, this review is intended for those who want to understand the development of CNN technology and architecture, specifically for image classification, from their predecessors up to modern state-of-the-art deep learning systems. It also asserts brief insights into their tuture and provides several interesting imminent directions making it suitable for researchers in the field The remainder of this review is organized as follows: Section 2 briefly introduces CNNs and acquaints readers with the key building blocks of their architecture. Section 3 covers the early development of CNNs. among other highlights, it briefly touches on the first applications of backpropaga tion and max pooling, as well as the introduction of the famous mNIST data set LeCun et al. 1998). In section 4, we deal with the role of DC NNs in the deep learning renaissance and this is followed by discussions on selected representative works that have contributed to their popularity for image classification tasks. Section 5 deals with several DCNN improve ment attempts in various aspects including network architecture, nonlinear activation functions, supervision components, regularization mechanisms optimization techniques, and computational cost developments Section 6 concludes the review by introducing some of the remaining challenges and current trends 2 Overview of cnn architecture CNNs are feed forward networks in that information flow takes place in one direction only, from their inputs to their outputs just as artificial neu ral nctworks (ANN) arc biologically inspired, so arc CNNs. The visual cor tex in the brain, which consists of alternating layers of simple and complex cells(hubel wiesel, 1959, 1962), motivates their architecture. CNN ar chitectures come in several variations; however, in general, they consist of convolutional and pooling (or subsampling)layers, which are grouped nto modules. Either one or more fully connected layers, as in a standard feed forward dules. modules stacked on top of each other to form a deep model Figure 1 illustrates typ- ical cnn architecture for a toy image classification task. An image is input directly to the network, and this is followed by several stages of convolu tion and pooling Thereafter, representations from these operations feed one or more fully connected layers. Finally, the last fully connected layer out- puts the class label. Despite this being the most popular base architecture found in the literature, several architecture changes have been proposed in recent years with the objective of improving image classification accuracy or NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang Convolutional lavers Fully connected laye Qutput class Convolution Car B Train Plane Figure 1: CNN image classification pipeline reducing computation costs. Although for the remainder of this section, we merely fleetingly introduce standard CNN architecture, in section 5 we dea with several architectural design changes that have facilitated enhanced im- age classification performance 2.1 Convolutional Layers. The convolutional layers serve as feature extractors, and thus they learn the feature representations of their input images. The neurons in the convolutional layers are arranged into feature maps Each neuron in a feature map has a receptive field, which is con nccted to a ncighborhood of ncurons in the previous laycr via a sct of train- able weights, sometimes referred to as a filter bank (LeCun et al., 2015) Inputs are convolved with the learned weights in order to compute a new feature map, and the convolved results are sent through a nonlinear acti- vation function All neurons within a feature map have weights that are constrained to bc cqual; however, diffcrcnt fcaturc maps within thc same convolutional layer have different weights so that several features can be xtracted at each location(LeCun et al. 1998; LeCun et al. 2015). More fo mally, the kth output feature map Y can be computed as Yk=f(Wk米x) (21) here the input image is denoted by x; the olutional filter related to the kth feature map is denoted by Wk; the multiplication sign in this con- text refers to the 2D convolutional operator which is used to calculate the inner product of the filter model at each location of the input image; and f()represents the nonlinear activation function (Yu, Wang, Chen, Wei 2014). Nonlinear activation functions allow for the extraction of nonlinear features. Traditionally, the sigmoid and hyperbolic tangent functions were used; recently, rectified linear units(Rel. Us; Nair &z Hinton, 2010) have become popular (Lecun et al. 2015). Their popularity and success have NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification opencd up an arca of rcscarch that focuses on the devclopmcnt and ap plication of novel dCnn activation functions to improve several charac- teristics of dCnn performance. Thus, in section 5.2, we formally introduce the relu and discuss the motivations that led to their development, before elaborating on the performance of several rectification-based and alterna tivc activation functions 2.2 Pooling layers The purpose of the pooling layers is to reduce the spatial resolution of the feature maps and thus achieve spatial invariance to input distortions and translations(LeCun et al., 1989a, 1989b; LeCun et al 1998, 2015; Ranzato et al., 2007). Initially, it was common practice to use av erage pooling aggregation layers to propagate the average of all the input values, of a small neighborhood of an image to the next layer(Lecun et al 1989a, 1989b; LeCun et al., 1998). However, in more recent models( Ciresan et al., 2011; Krizhevsky et al., 2012, Simonyan zisserman, 2014; Zeiler Fergus, 2014; Szegedy Liu et al. 2014; Xu et al. 2015), max pooling aggre- gation layers propagate the maximum value within a receptive field to the next layer (Ranzato et al. 2007). Formally, max pooling selects the largest element within each receptive field such that .9)∈9k网, (2.2) where the output of the pooling operation, associated with the kth feature map,is denoted by Yhij, xkpa denotes the element at location (p, q )contained by the pooling region iij, which embodies a receptive field around the po sition(i,j)(Yuet al., 2014). Figure 2 illustrates the difference between max pooling and average pooling. Given an input image of size 4x 4, if a 2x 2 filter and stride of two is applied, max pooling outputs the maximum value of each 2 x 2 region, while average pooling outputs the average rounded integer value of each subsampled region. While the motivations behind the migration toward max pooling are addressed in section 4.2.3, there are also several concerns with max pooling which have led to the development of other pooling schemes. These are introduced in section 5.1.2 2. 3 Fully Connected layers. Several convolutional and pooling layers are usually stacked on top of each other to extract more abstract feature representations in moving through the network. The fully connected layers that follow these layers interpret these feature representations and perform the function of high-level reasoning (hinton et al. 2012, Simonyan zisser man, 2014; Zeiler fergus, 2014). For classification problems, it is standard to use the softmax operator(see sections 5.3. 1 and 5.3.5)on top of a DCNN (Krizhevsky et al. 2012; Lin et al. 2013 Simonyan Zisserman, 2014 Zeiler Fergus, 2014; Szegedy, Liu et al. 2014; Xu et al., 2015). While early NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 N. Rawat and 7. Wang 1594|2 11586 158 87317 81 2373 Max pooling 105 Averag ge pooling 58 Figure 2: Average versus max pooling success was enjoyed by using radial basis functions(RBFs, as the classifier on top of the convolutional towers(LcCun ct al., 1998), Tang(2013)found that replacing the softmax operator with a support vector machine(SVM) leads to improved classification accuracy(see section 5. 3. 4 for further de tails). Moreover, given that computation in the fully connected layers is often challenged by their compute-to-data ratio, a global average-pooling layer(scc scction 5.1.1. 1 for furthcr details), which fccds into a simplc lin- ear classifier, can be used as an alternative (Lin et al. 2013). Notwithstanding these attempts comparing the performance of different classifiers on top of DCNNs still requires further investigation and thus makes for an interest- ing research direction (see section 6 for other intrinsic DCNN trends) 2.4 Training. CNNS, and ANNS in general use learning algorithms to adjust their free parameters(i.e. the biases and weights) in order to at- tain the desired network output. The most common algorithm used for this purpose is backpropagation(LeCun, 1989; LeCun et al., 1998; Bengio, 2009; Deng Yu, 2014; Deng, 2014; Srinivas et al. 2016). Backpropaga tion computes the gradient of an objective(also referred to as a cost/loss performance) function lo delermine how to adjust a network's parameters in order to minimize errors that affect performance. A commonly experi- enced problem with training cnns, and in particular dCnns is overfit ting which is poor performance on a held-out test set after the network is trained on a small or even large training set. This affects the model's ability lo generalize on unseen data and is a major challenge for DCNNs thal can be assuaged by regularization, which is surveyed in section 5.4 NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification 2.5 Discussion. This scction bricfly highlighted somc of thc fundamcn tal aspects related to the basic building blocks of CNNs. Further detailed explanations on the convolution function and its variants and the con- volutional and pooling layers, can be found in Goodfellow, Bengio, and Courville(in press). Furthermore, for convolutional and pooling arithmetic, rcader's are referred to Dumoulin and visin(2016). Detailed explanations on the backpropagtion algorithm and general training protocols for deep neural networks(dNNs) are available in Le Cun et al. (1998)and Goodfel- low et al. (2016), while Lecun et al. (2015) provides a concise summary of the algorithm and supervised learning(one of the major machine learn ing paradigms, together with unsupervised and reinforcement learning in general. A brief history on the development of this popular algorithm specifically for CNNs, is provided in section 3. 2. Finally, some of the DCNN theoretical considerations, many of which are concisely summarized by Koushik(2016), are introduced in section 6.1 3 Early CNn Development In this section we cover the early developments and significant advance- ments of Cnns, from their predecessors up to successful applications prior to the dccp learning renaissance(Hinton ct al. 2006; Hinton Salakhutdi- nov, 2006; Bengio, Lamblin, Popovici, Larochelle, 2007) 3.1 CNN Predecessors Inspired by neuroscience. Biology has inspired several artificial intelligence techniques such as anns, evolutionary algo rithms, and ccllular automata(Florcano mattiussi, 2008 ). Howevcr, per haps the greatest success story among them are Cnns(Goodfellow, bengio Courville, in press). Their history began with the neurobiological exper iments conducted by Hubel and Wiesel(1959, 1962) from as early as 1959 The main contribution of their work was the discovery that neurons in dif- ferent stages of the visual systcm, respondcd strongly to spccific stimulus patterns while ignoring others. More specifically, they found that neurons in the early stages of the primary visual cortex responded strongly to pre cisely oriented patterns of light, such as bars but ignored more complex patterns of the input stimulus that resulted in strong responses from neu rons in later stages. They also found that the visual cortex consisted of sim ple cells, which had local receptive fields, and complex cells, which were invariant to shifted or distorted inputs arranged in a hierarchical fashion These works provided the early inspiration to model our automated vision systems based on characteristics of the central nervous system In 1979, a novel multilayered neural network model, nicknamed the neocognitron, was proposed(Fukushima, 1979). Modeled based on the findings of Hubel and wiesel(1959, 1962), it also consisted of simple and complex cells, cascaded together in a hierarchical manner. With this archi tecture, the network pi ul at recognizing simple input patterns NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang irrespective of a shift in the position or considcrable distortion in the shape of the input pattern(Fukushima, 1980; Fukushima Miyake, 1982). Sig nificantly, the neocognitron laid the groundwork for the development of CNNS. In fact, CNNs were derived from the neocognitron, and hence they have a similar architecture(LeCun et al. 2015) 3.2 Brief History of Backpropagation and the First Application to CNNS. Backpropagation was derived in the 1960s. In particular, SE.Drey fus(1962) derived a simplified version of the algorithm that used the chain rule alone. Nevertheless, the early versions of backpropagation were inef ficient since they backpropagated derivative information from one layer to the preceding layer without openly addressing direct links across lay- ers. Furthermore, they did not consider potential efficiency gains due to network sparseness(Schmidhuber, 2015). The modern efficient form of the algorithm that addressed these issues was derived in 1970(Linnainmaa 1970); however, there was no mention of its use for ANNs. Preliminary discussions for its use for ANNs date back to 1974(Werbos, 1974); how ever, the first known application of efficient backpropagation specifically for ANNs, was described in 1981(Werbos, 1982), but this remained rela tively unknown. Nevertheless, it was"significantly popularized"(Schmid huber, 2015)duc to a seminal paper in 1986 by D E. Rumclhart ct al. (1986) which demonstrated that by using the backpropagation learning algorithm, the internal hidden neurons of an anN could be trained to represent im portant features of the task domain In 1989, LeCun et al. (1989a, 1989b) proposed the first multilayered CNNs and successfully applicd thesc large-scalc nctworks, to rcal (hand- written digits and zip codes) image classification problems. These ini- tial CNNs were reminiscent of the neocognitron(Fukushima, 1979, 1980 Fukushima &z miyake, 1982). However, the key difference was that they were trained in a fully supervised fashion using backpropagation, which was in contrast to the unsupervised reinforcement scheme uscd by thcir predecessor. This allowed them to rely more profoundly on automatic learning rather than hand-designed preprocessing for feature extraction (LeCun et al., 1989a, 1989b; LeCun, 1989), which previously proved to be extreme nallengin g; hence, they form an essential component of many recent competition-winning DCNNs(Krizhevsky et al., 2012; Simonyan Zisserman, 2014; Zeiler Fergus, 2014; Szegedy, Liu et al., 2014) 3.3 Introduction of the mnist data Set. In 1998 the cnns described earlier (LeCun et al. 1989a, 1989b), were improved on and used for the task of individual character classification in a document recognition appli- cation. This work was published in a detailed seminal paper(LeCun et al 1998)that highlighted the main advantages of CNNs when compared to traditional ANNs: they require fewer free parameters(because of weight sharing), and they consider the spatial topology of the input data, thereby NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 Deep convolutional Neural Networks for Image Classification 3 1. maps 16@10K1 ature maps 以 S4: f maps 15@5x5 C5:layer F6: LayerOUTPUT Full connection Gaussian connections Subsampling Convolutions Subsampling Full connection Figure 3: Architecture of L e Net-5(LeCun et al, / 1998) allowing them to deal with the variability of 2D shapes. In addition to the proposed CNNs, LeCun et al. (1998)introduced the popular Modified N Lional Institute of Standards and Technology(MNiSt) data set of 70,000 handwritten digits, which has since been used extensively for several com puter vision tasks and, in particular, for image classification and recognition problems. Figure 3 illustrates the architecture of the Cnn, called LeNet-5, proposcd by LeCun ct al.(1998). The diagram clcarly illustrates the desigr of LeNet-5, which consists of alternating convolutional and subsampling lavers, followed by a single fully connected layer 3.4 Early CNn Successes Despite Perceived Issues with Gradient De scent. In the late 1990s and early 2000s neural network research had dimin ished (Simard et al. 2003; LeCun et al. 2015). It was little used for machine Icarning tasks, and computor vision and spccch recognition tasks over looked them It was widely believed that learning useful multistage feature extractors, with little prior knowledge, was infeasible due to issues with the popular optimization algorithm, gradient descent. Specifically, it was thought that basic gradient descent would not recover from poor weight configurations that inhibited the reduction of thc avcrage backpropagated error, a phenomenon known as poor local minima(Le Cun et al, 2015). In contrast,other statistical methods and, in particular, SVMS, became popular due to their successes ( decoste scholkopf, 2002 ). Contrary to this trend, a CNN was proposed for the application of visual document analysis in 2003 (Simard et al. 2003) At a time when CNNs were not popular in the engineering community, Simard et al.(2003)were able to achieve the best-known classification re- sult on the mnist data set (Lecun et al. 1998), improving on the previous best results obtained by the SVMs of Decoste and Scholkopf(2002). Cit ing the advantages that were mentioned by LeCun et al.(1998), utilizing CNNs for visual tasks, they expanded the size and quality of the MniSt data set and proposed the use of simple software loops for the convolu tional operation. These loops exploited the property of backpropagation that allows an ann to be expressed in a modular fashion and this allowed NECO_a_00990-Rawat MITjats-NECO cls May 31, 2017 22: 33 W. Rawat and 7. Wang for modular software debugging. Although LeCun ct al. (1998 )had alrcad hypostasized and proved that by increasing the size of the data set, using ar- tificially generated affine transformations, the network's performance will improve, Simard et al.(2003)improved the quality of the increased por tion of the data set to further improve performance. This was accomplished by using clastic image deformations. This work formed part of a scrics of several optical character recognition applications that used CNNs. In par ticular, Microsoft used them for English handwritten digits(Simard et al. 2003; Chellapilla, Shilman, Simard, 2006), Arabic handwriting recogni- tion(Abdulkader, 2006)and East Asian handwritten character recognition Chellapilla Simard, 2006). Thus, these applications, together with the work described by LeCun et al. (1989a, 1989b, 1998), represent some of the early image classification successes enjoyed by CNNs. The background to the next section highlights several other successes 4 The Deep Learning Renaissance and the Rise of DCNNs This section briefly introduces the deep learning renaissance and focuses on the significant contributions of DeNNs to the current surge in deep learning research. It also covers a seminal paper and several representative worl that have led to their recent ascendancy over other image classification technique 4.1 Background to the Deep Learning Renaissance. The first feedfor ward multilayered neural nctworks were trained in 1965 (Ivakhnenko Lapa, 1966), and although they did not use backpropagation, they were per haps the first deep learning systems(Schmidhuber, 2015). Although deep learning-like algorithms have a long history, the term deep learning became a catchphrase around 2006, when deep belief networks(DBNs)and autoen- coders trained in an unsupervised fashion wcre uscd to initialize DNNs trained using backpropagation(Linton et al., 2006; Linton Salakhut- dinov, 2006; Bengio et al., 2007). Prior to this, it was taught that deep multilayered networks (including dCNNs) were too hard to train due to issues with gradient descent and thus were not popular(Bengio et al., 2007; Bengio, 2009, Deng Yu, 2014; Schmidhuber, 2015; Goodfellow et al, in press). Conversely, CNNs were a notable exception and proved easier to train when compared to fully connected networks(Simard et al. 2003, Ben- gio, 2009; LeCun et al. 2015; Good fellow et al. in press). In addition to the successes discussed in section 3.3, some of the other successful appli- cations that incorporated Cnns for their image classification component prior to the resurgence of neural networks in 2006 include medical image segmentation(Ning et al., 2005); facial recognition, detection, and verifica tion (Lawrence, Giles, Tsoi, back, 1997, Garcia Delakis, 2002; Chopra Hadsell, Lecun, 2005 ); off-road obstacle avoidance(Muller, Ben, Cosatto

- 深度学习之卷积神经网络(Convolutional Neural Networks, CNN) 128002020-05-12前面, 我们介绍了DNN及其参数求解的方法(BP算法),我们知道了DNN仍然存在很多的问题,其中最主要的就是BP求解可能造成的梯度消失和梯度爆炸的问题.那么,人们又是怎么解决这个问题的呢?本节的卷积神经网络(Convolutional Neural Networks, CNN)就是一种解决方法. 我们知道神经网络主要有三个部分组成, 分别为: 网络结构---描述神经元的层次与连接神经元的结构. 激活函数(激励函数)--- 用于加入非线性的因素,解决线性模型所不能解决的问题. 参数学习方...
- 全连接网络的设计---深度特征交叉网络 DCN (Deep & Cross Network) 2392020-10-16深度特征交叉网络 之前常常研究CNN的网络结构,却不知道多层感知机(MLP)的设计也有很多讲究。可以设计出很多不同的网络。 对于传统的分类算法来说,输入数据通常是一个一维的向量,向量中的每个值都是一个特征,这时候可以选择的分类算法很多,svm,逻辑回归,决策树,多层感知机等。 多层感知机也是用的比较多的方法,这里介绍一个多层感知机的一个改进。加入一个深度特征交叉网络。如下图黄色单元所示: 可以看到有个公式: 整个核心就是这个公式。x0是一个列向量,其中的主要运算可以看成: x0和自己的转置进行相乘(矩
决胜AI-深度学习系列实战150讲
2018-05-21<p> <br /> </p> <p> <br /> </p> <p style="font-family:"color:#3D3D3D;font-size:16px;background-color:#FFFFFF;"> <img src="https://img-bss.csdn.net/201908070554499608.jpg" alt="" /><span style="font-family:""></span> </p> <p> <br /> </p> <p> 深度学习系列课程从基本的神经网络开始讲起,逐步过渡到当下流行的卷积与递归神经网络架构。课程风格通俗易懂,方便大家掌握深度学习的原理。课程以实战为导向,结合当下热门的Tensorflow框架进行案例实战,让同学们上手建模实战。对深度学习经典项目,从数据处理开始一步步带领大家完成多个项目实战任务! </p>
-
下载
三成集团联合创始人羊懂-淘客如何从单粉月产6元提高至60元?.pdf
三成集团联合创始人羊懂-淘客如何从单粉月产6元提高至60元?.pdf
-
下载
20210418-中泰证券-纺织服装行业月报:纺服需求加速恢复,优质国牌未来可期.pdf
20210418-中泰证券-纺织服装行业月报:纺服需求加速恢复,优质国牌未来可期.pdf
-
下载
【Morgan Stanley】 FinTech.pdf
【Morgan Stanley】 FinTech.pdf
-
下载
20210420-华泰证券-美联储QE“缩减恐慌”的前世与今生.pdf
20210420-华泰证券-美联储QE“缩减恐慌”的前世与今生.pdf
-
下载
20210420-兴业证券-科创板系列研究(四十四):科创板IPO再进化,聚焦公司科创属性.pdf
20210420-兴业证券-科创板系列研究(四十四):科创板IPO再进化,聚焦公司科创属性.pdf
-
下载
20210419-方正证券-保利地产-600048-2020年年报点评:营收业绩小幅增长;杠杆率水平维持在“绿档”安全线内.pdf
20210419-方正证券-保利地产-600048-2020年年报点评:营收业绩小幅增长;杠杆率水平维持在“绿档”安全线内.pdf
-
下载
武汉理工大学计算机专业历年笔试真题
武汉理工大学计算机专业历年笔试真题
-
下载
20210418-中信建投-食品饮料行业动态:一季报行情到来,关注具备高增长的白酒板块.pdf
20210418-中信建投-食品饮料行业动态:一季报行情到来,关注具备高增长的白酒板块.pdf
-
下载
辉隆股份:2020年年度报告.PDF
辉隆股份:2020年年度报告.PDF
-
下载
天邦股份:2020年年度报告.PDF
天邦股份:2020年年度报告.PDF
