Multi-column deep neural network for traffic sign classification

所需积分/C币:10 2018-03-04 13:07:05 1.57MB PDF
收藏 收藏

交通信号灯识别We describe the approach that won the final phase of the German traffic sign recognition benchmark. Our method is the only one that achieved a better-than-human recognition rate of 99.46%. We use a fast, fully parameterizable GPU implementation of a Deep Neural Network (DNN) that does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. Combining various DNNs trainedon differently preprocessed data into aMulti-Column DNN (MCDNN) further boosts recognition performance, making the system insensitive also to variations in contrast and illumination.
D. Ciresan et aL. Neural Networks 32 (2012)333-338 335 11917 onginal 11917 Imadjust 11917 Histeq 00102n30405D607a8n9 01a2n3a4050607a8 0010203040506070809 U0.1U2UJU405U5U.78U9 U.102U304U50.6U708Uy 0.102030405060708 0 00.102030.4050.60.7009 0.1020.30405060.70D 0.10203040 11917 Adapthisteq 11917 Contrast Origina 00.1020304050607080.9 0.1020304050607080.9 @t画 Imadjust 0@e香N画 00.1020.30.4050607080.9 0.102030.4050607080 0④K Contrast 00.10203040506 010203040506070809 Fig. 2. Histogram of pixel intensities for image 11,917 from the test set of the preliminary phase of the competition, before and after normalization, as well as an additional selection of 5 traffic signs before and after normalization during training, using random but bounded values for translation, the training set might be translated, scaled and rotated, whereas rotation and scaling. These values are drawn from a uniform only the undeformed, original or preprocessed images are used distribution in a specified range, i.e.+10% of the image size for for validation Training ends once the validation error is zero translation, 0.9-1 1 for scaling and +5 for rotation. The final, (usually after 15-30 epochs ). Initial weights are drawn from fixed sized image is obtained using bilinear interpolation of the a uniform random distribution in the range [0.05, 0.05] distorted input image. These distortions allow us to train DNN with Each neuron's activation function is a scaled hyperbolic tangent many free parameters without overfitting and greatly improve (e.g. LeCun et al., 1998) generalization performance (i.e. the error rate on the first phase of gtsrB decreases from 2. 83% to 1.66%, Ciresan et al., 2011b). All dnn are trained using on-line gradient descent with an annealed 3.1. Data preprocessing learning rate The original color images contain one traffic sign each, with a 2.1.5. Forming the mcdnn border of 10% around the sign. They vary in size from 15x 15 Finally, we form an mcdnn by averaging the output activations to 250x 250 pixels and are not necessarily square. The actual of several DNN columns(Fig. 1c). For a given input pattern, traffic sign is not always centered within the image; its bounding the predictions of all columns are averaged. Before training the box is part of the annotations. The training set consists of 39, 209 weights of all columns are randomly initialized. various columns images; the test set of 12, 630 images. We crop all images and can be trained on the same inputs, or on inputs preprocessed in process only the image within the bounding box. Our MCDNN fferent ways. If the errors of p different models have zero mean implementation requires all training images to be of equal size and are uncorrelated, the average error might be reduced by a After visual inspection of the training image size distribution factor of p simply by averaging the p models ( bishop 2006). In we resize all images to 48x 48 pixels. As a consequence, the practice, errors of models trained on similar data tend to be highly scaling factors along both axes are different for traffic signs with correlated. To avoid this problem, our MCDnn combines various dnn trained on differently normalized data. a key question is rectangular bounding boxes. Resizing forces them to have square whether to optimize the combination of outputs of various models bounding boxes. or not(duin, 2002). Common problems during training include High contrast variation among the images calls for contrast a)additional training data is required, and(b) there is no normalization. We use the following standard normalizations: guarantee that the trained mcdnn generalize well to the unseen Image adjustment (Imadjust)increases image contrast by mapping pixel intensities to new values such that 1% of the data 2011). that simply averaging the outputs of many dnn generalizes is saturated at low and high intensities (MATLAB, 2010) better on the test set than a linear combination of all the dnN with weights optimized over a validation set (hashem Histogram equalization Histeg)enhances contrast by transform Schmeiser, 1995; Ueda, 2000). We therefore form the mcdnn by ing pixel intensities such that the output image histogram is democratically averaging the outputs of each dnn. roughly uniform (MATLAB, 2010 Adaptive histogram equalization(Adapthisteq operates(unlike 3. Experiments Histeq on tiles rather than the entire image: the image is tiled in 8 nonoverlapping regions of 66 pixels each. Every tile,'s We use a system with a Core i7-950(3.33 GHz), 24 GB contrast is enhanced such that its histogram becomes roughly DDR3, and four graphics cards of type gtX 580. Images from uniform (MATLAB, 2010) 336 D. Ciresan et al/ Neural Networks 32(2012) 333-338 L8-Output class 'max 30km 'h 300 neurons L7-Fully Connected 250 maps of 3x3 neurons 院能 L6-MaxPoaling 250 maps of 6x6 neurons 翻鱷 L5-Convolutional 37500 filters of 4x4 weights Filters 150 maps of 9x 9 neurons L4-MaxPooling 150 maps of 18x18 neurons L3-Convolutional 15000 filters of 4x 4 weights 100 maps of 21X21 neurons L2-MaxPoaling to aono 000003①300①05间吗即m 四间3画间到的回的3 000间30s0的sp的的00{的3⑩到 100 maps of 42X42 neurons 300 filters cf 7X7 weights 3 of 48x48 neul Fig 3. DNN architecture from Table 1 together with all the activations and the learned filters. Only a subset of all the maps and filters are shown, the output layer is not drawn to scale and weights of fully connected layers are not displayed. For better contrast, the filters are individually normalized Contrast normalization( Conorm)enhances edges through filter ing the input image by a difference of Gaussians. We use a filter size of 5 x 5 pixels(sermanet Lecun, 2011). 酒 Note that the above normalizations, except Conorm, are not performed in rgB-color space but rather in a color space that has image intensity as one of its components. For this purpose we transform the image from RGB- to Lab-space perform the Fig. 4. The learned filters of the first convolutional layer of a DNN. The layer has normalization and then transform the normalized image back 100 maps each connected to the three color channels of the input image for a total to RGB-space. The effect of the four different normalizations is of 3 x 100 filters of size 15 X 15. Every displayed filter is the superposition of illustrated in Fig. 2, where histograms of pixel intensities together the3 filters that are connected to the red, green and blue channel of the input with original and normalized images are shown image respectively. For better contrast, the filters are individually normalized. ( For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article. 3.2. Results Initial experiments with varying network depths showed three maps of 48 x 48 pixels for each color channel; the output that deep nets work better than shallow ones, consistent with layer consists of 43 neurons, one per class. The used architecture our previous work on image classification( Ciresan, Meier, has approximately 1.5 million free parameters, half of which are Gambardella, Schmidhuber, 2010: Ciresan et al, 2011a). We from the last two fully connected layers. It takes 37 h to train the report results for a single dnN with 9 layers (Table 1): the same MCDNN with 25 columns on four GPUs. After training, 87 images architecture is shown in Fig 3 where the activations of all layers per second can be processed on a single GPU together with the filters of a trained dnn are illustrated. Filters We also train a dnn with bigger filters, 15x 15 instead of7x7 of the first layer are shown in color but consist in principle of in the first convolutional layer and plot them in Fig. 4. They are three independent filters, each connected to the red, green and randomly initialized and learn to respond to blobs, edges and other blue channel of the input image, respectively. The input layer has shapes in the input images. This illustrates that even the first layer D. Ciresan et aL/ Neural Networks 32(2012)333-338 Table 1 Fig. 5 depicts all errors, plus ground truth and first and second 9 layer dnn architecture. predictions. Over 80% of the 68 errors are associated with correct T Maps ncurons Kernel second predictions. Erroneously predicted class probabilities tend Input 3 maps of48×48 neurons here the mcdnn is quite unsure about its Convolutional 100 maps of42×42 neurons7×7 classifications. In general, however, it is very confident- most of 1234567 Max pooling 100 maps of 21x 21 neurons 2×2 Convolutional 150 maps of18×18 neurons4×4 its predicted class probabilities are close to one or zero. Rejecting Max pooling 150 maps of9×9 neurons 2×2 only 1% of all images(confidence below 0.51) results in an even Convolutional 250 maps of6×6 neurons 4×4 lower error rate of 0. 24%. To reach an error rate of 0.01%(a single Max pooling 250 maps of 3 x 3 neurons 2×2 misclassification), only 6.67% of the images have to be rejected Fully connected 300 neurons l×l Fully connected 43 neurons 1×1 (confidence below 0.94) f a very deep (9 layers )dnn can be successfully trained by simple 4. Conclusion gradient descent, although it is usually the most problematic one (Hochreiter, Bengio, Frasconi, Schmidhuber, 2001) Our mcdnn won the german traffic sign recognition bench- In total we trained 25 nets 5 randomly initialized nets for each mark with a recognition rate of 99. 46%, better than the one of hul- of the five datasets (i.e. original plus 4 different normalizations) The results are summarized in table 2. each column shows mans on this task(98.84%), with three times fewer mistakes than the recognition rates of 5 randomly initialized dnn. Mean and the second best competing algorithm ( 98.31%) Forming a MCDNN standard deviations are listed for each of the five distinct datasets from 25 nets, 5 per preprocessing method increases the recogni as well as for all 25 dNN. The mcdnn results(but not the tion rate from an average of 98.. 46%. none of the prepro recognition rates) after averaging the outputs of all 25 dnN are cessing methods are superior in terms of single dnn recognition shown as well. All individual dnn are better than any other rates, but combining them into a mcdnn increases robustness to method that entered the competition. Moreover, the resulting various types of noise and leads to more recognized tratfic signs MCDnn with 25 dnn columns achieves a recognition rate of We plan to embed our method in a more general system that 99.46% and a drastic improvement with respect to any of the first localizes traffic signs in realistic scenes and then classifies individual dnn them 區圈國 @@@@ @@@回@國 @國@回@回@國@國 nR031 053 047 n454 1.00 000 094 006 096 007 056 0.44 086 014 n78a22 0.32 030 @ 0.450.33 040031 05040302 040030 0s0.30 46032 040035 @ 0.41 0.44031 042039 0.700 0780.19 08017 0530. 081019 0.1 080.15 0B010 07021 06024 05036 0.503 0800.2 07010 07020 0.19013 049039 053047 0400.34 03z^6a△ 038033 38022 A圈圈 △ △ 0.4s03 0410.32 031030 0.20.18 0.50 0410.25 0.4002 0420.32 0.31a2 A 060.18 0420 0340.21 0.3031 02U.1 0210250.20.2023 Fig. 5. The 68 errors of the mcdnn, with correct label (lett)and first(middle) and second best (right) predictions. Best seen in color. ( For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article. 338 D. Ciresan et aL. Neural Networks 32(2012)333-338 Table 2 Recognition rates(%)of the mcdnn and its 25 DNN Trial Original Histeq Adapthisteq Conorm 98.63 98.27 2345 98.64 98.77 98.51 98.51 98.46 9846 98.61 98.31 98.53 98.54 98.58 98.58 98.66 Avg.9847±0.189862=0.159848±0.229850±0.0498.54±0.14 Average dNn recognition rate: 98.52+0. 15 MCDNN: 99.46 Acknowledgment Hubel, D. H,& wiesel, T.(1962). Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex. Journal of physiology (London ) This work was partially supported by a FP7-1CT-2009-6 EU 160.106-154 LeCun,Y, Bottou, L, Bengio, Y, Haffner, P (1998).Gradient-based learning applied Grant under project Code 270247: A Neuro-dynamic framework to document recognition. Proceedings of the IEEE, 86, 2278-2324 for Cognitive Robotics: Scene representations, Behavioral se Liu, C.-L, Yin, F, Wang, Q.-F, Wang, D.-H. (2011). ICDAR 20 11 Chinese handwriting quences, and learning recognition competition In International conference on document analysis and recognition(pp. 1464-1469 ) IEEE MATLAB. (2010) Version 7. 10.0(R2010a ). Natick, Massachusetts: The Math Works References Meier, U, Ciresan, D. C, Gambardella, L. M, Schmidhuber, J. (2011). Better digit ecognition with a committee of simple neural nets. In International conference Bchnke, S(2003). Lecture notes in computer science: VoL 2766 Hierarchical neural on document analysis and recognition(pp. 1135-1139)IEEE networks for image interpretation. Springer Mutch, ], Lowe. D G (2008).Object class recognition and localization using sparse Bishop, C M. (2000). Pattern recognition and machine learning. Springer. features with limited receptive fields. International Journal of computer vision Ciresan, D. C, Meier, U, Gambardella, L M,& Schmidhuber, J (2010). Deep, big, 56,503-511. iple neural nets for handwritten digit recognition. Neural Computation, 22, Olshausen. B A.& Field, D J(1997). Sparse coding with an overcomplete basis set 3207-3220. a strategy employed by v1? Vision Research, 37, 3311-3325 Ciresan, D. C Meier, U, Gambardella, L M, Schmidhuber, (2011) Convolutional Riesenhuber, M.,& Poggio, T(1999 ). Hierarchical models of object recognition in neural network committees for handwritten character classification In Interna- cortex. Nature Neuroscience. 2. 1019-1025 tional conference on document analysis and recognition (pp. 1250-1254) IEEE Scherer, D, Muller, A,& Behnke, S(2010). Evaluation of pooling operations in Ciresan, D. C, Meier, U, Masci, ], Gambardella, L. M,& Schmidhuber, J (2011a) convolutional architectures for object recognition. In Intermlutionul conference Flexible, high performance convolutional neural networks for image classifica on artificial neural networks(pp 82-91). Springer tion In International joint conference on artificial intelligence(pp. 1237-1242). Schmidhuber. I, Eldracher, M,& Foltin, B.(1996). Semilinear predictabilit AAAI Press minimization produces well-known feature detectors. Neural Computation, 8, Ciresan, D. C, Meier, U, Masci, . , Schmidhuber, J. (20 lb. A committee of neural networks for traffic sign classification. In International joint conference on neural Sermanet, P ,& LeCun, Y.(2011). Traffic sign recognition with multi-scale convolutional networks. In International joint conference on neural networks Duin, R. P. W.(2002. The combining classifier: to train or not to train? (Pp.2809-2813)EEE In International conference on pattern recognition (pp. 765-770)IEEE Serre, T, Wolf, L,& Poggio, T (2005 ). Object recognition with features inspired Fukushima, K.(1980). Neocognitron: a self-organizing neural network for by visual cortex. In Computer vision and pattern recognition conference mechanism of pattern recognition unaffected by shift in position. Biological (Pp.994-1000,EEE Cybernetics, 36, 193-202. Simard, P Y, Steinkraus, D, Platt, J C(2003). Best practices for convolutional Hashem, S,& Schmciser, B (1995). Improving model accuracy using optimal lincar neural networks applied to visual document analysis. In International conference ombinations of trained neural networks. transactions on Neural Networks on document analysis and recognition(pp. 958-963)IEEE 792-794 Stallkamp, J, Schlipsing, M, Salmen, J, igel, C(2011. The German traffic sign Hochreiter, S, Bengio, Y, Frasconi, P ,& Schmidhuber, J(2001). Gradient flow recognition benchmark: a multi-class classification competition. In Interna- in recurrent nets: the difficulty of learning long-term dependencies. In S.C. tional joint conference on neural networks (pp. 1453-1460. IEEE Kremer, &j. f. Kolen(Eds ) A field guide to dynamical recurrent neural networks Ueda, N.(2000). Optimal linear combination of neural nctworks for improving IEEE Press classification performance IEEE Transactions on Pattern Analysis and Machine Hoyer, P O,& Hyvarinen, A (2000). Independent component analysis applied to Intelligence, 22, 207-215 feature extraction from colour and stereo images. Network: Computation in Wiesel, D. H, Hubel, T N (1959). Receptive fields of single neurones in the cat's Neural Systems, 11, 191-210 striate cortex. Journal of Physiology, 148, 574-591

试读 6P Multi-column deep neural network for traffic sign classification
立即下载 低至0.43元/次 身份认证VIP会员低至7折
Multi-column deep neural network for traffic sign classification 10积分/C币 立即下载
Multi-column deep neural network for traffic sign classification第1页
Multi-column deep neural network for traffic sign classification第2页

试读结束, 可继续阅读

10积分/C币 立即下载