ISCAS2010-Convolutional networks and applications in vision

所需积分/C币:15 2015-05-21 10:12:44 304KB PDF
收藏 收藏

Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world (or ”features”), which must be invariant to irrelevant variations of the input while, preserving relevant information. A major question for Machine Learning is how to learn such good features automatically. Convolutional Networks (ConvNets) are a biologicallyinspired trainable architecture that can learn invariant features. Each stage in a ConvNets is composed of a filter bank, some non-linearities, and feature pooling layers. With multiple stages, a ConvNet can learn multi-level hierarchies of features. While ConvNets have been successfully deployed in many commercial applications from OCR to video surveillance, they require large amounts of labeled training samples. We describe new unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples. Applications to visual object recognition and vision navigation for off-road mobile robots are described.
TABLE I AVERAGE RECOGNITION RATES ON CALTECH-101 Match Kernel(PMK) SVM 39 classifer can also be seen as Rabs -n-Pa Rabs -PAN-PM PA another layer of feature extraction since it performs a K-means 65.5% 60.5 2.0% based feature extraction followed by local histogramming R 60.0%29.7% 46.7 56.0 9.1% IIL HARDWARE AND SOFTWARE IMPLEMENTATIONS R 62.90 33.7% 37.6% Implementing Conv Nets in software is best achieved using the modular, object-oriented approach suggested djusted by stochastic gradient descent to lower E. Once in [2]. Each basic module (convolution, pooling, etc training is complete, the feature vector for a given input is is implemented as a class with three member functions simply obtained with Z*= C(X K), hence the process is module fprop(input, output), which computes the extremely fast (feed-forward output from the input, module. bprcp(input, output Results on Object Recognition which back-propagates gradients from the outputs back to In this section, various architectures and training procedures the inputs and the internal trainable parameters, and op are compared to determine which non- linearities are prefer- tionally module obprop(input, output), which may able, and which training protocol makes a difference back-propagate second diagonal derivatives for the implemen- Generic Object Recognition using Caltech 101 Dataset: We tation of second-order optimization algorithms [81 use a two-stage system where, the first stage is composed of Several software implementations of ConVNets are built F layer with 64 filters of size 9 x 9, followed by different around this concept, and have four basic capabilities: 1. a fie combinations of non-linearities and pooling. The second-stage ible multi-dimensional array library with basic operations such feature extractor is fed with the output of the first stage and as dot products, and convolutions, 2. a class hierarchy of basic extracts 256 output features maps, each of which combines learning machine building blocs(e.g. multiple convolutions a random subset of 16 feature maps from the previous stage non-linear transforms, cost functions, .) 3. a set of classes using 9 kernels. Hence the total number of convolution for energy-based inference [42], gradient-based optimization, kernels is 256x 16=4096 and performance measurement Table I summarizes the results for the experiments, where Three available Conv Net implementations use this concept U and R denotes unsupervised pre-training and random The first one is part of the Lush system, a Lisp dialect with initialization respectively, and+ denotes supervised fine- an interpreter and compiler with an easy interface to 143 tuning of the whole system The second one is EBlearn, a C++ machine learning library 1. Excellent accuracy of 65.5% is obtained using unsupervised with class hierarchy to the lush implementation [44]. Third pre-training and supervised refinement with abs and is Torch-5[45 a C library with an interpreter front end based normalization non-linearities. The result is on par with the on Lua. All three systems come with facilities to manipulate popular model based on STFT and pyramid match kernel large datasets, images, and videos SVM[39].It is clear that abs and normalization are cruciala The first hardware implementations of Conv Nets date back for achieving good performance. This is an extremely to the early 90s with Bell Labs' ANNa chip, a mixed analog important fact for users of convolutional networks, which digital processor that could compute 64 simultaneous 8x 8 traditionally only use tanh convolutions at a peak rate of 4.10 multiply-accumulate 2. Astonishingly, random filters without any filter learning operations per second [46], [471, with 4 bit resolution on the whatsoever achieve decent performance(62.9% for R), as states and 6 bits on the weights. More recently, a group from long as abs and normalization are present (Rabs -N-PA). the Canon corporation developed a prototype ConvNet chip for A more detailed study on this particular case can be found low-power intelligent cameras [48]. Some current approaches in 33 rely on Addressed-Event Representation(AER) convolvers 3. Comparing experiments from rows R vS Rt,U vS Ut, which present the advantage of not requiring multipliers we see that supervised fine tuning consistently improves the to compute the convolutions. CAViar is the leading such performance, particularly with weak non-linearities project, with a performance of 12G connections/sec [49] when newly proposed non-linearities are not in D 4. It seens that unsupervised pre-training(U, U+)is crucial FPGa implementations of Conv Nets appeared in the ace mid-90S with [50], which used low-accuracy arithmetic to avoid implementing full-fledged multipliers. Fortunately, re Handwritten Digit Classification using MNIST Dataset: cent DSP-oriented FPGAS include large numbers of hard Using the evidence gathered in previous experiments, we used wired MAC units, which allow extremely fast and low power a two-Stage system with a two-layer fully-connected classifier. implementations of ConV Nets. The CNP developed in our The two convolutional stages were pre-trained unsupervised, group [51] achieves 10GoPS for 7x7 kernels, with an archi- and refined supervised. An error rate of 0.53% was achieved lecture that implements entire Conv Nets, including pre/post ont he test set. To our knowledge, this is the lowest error processing, and is entirely programmable. An actual face de rate ever reported on the original Mist dataset, without tection application was demonstrated on this system, achieving distortions or preprocessing. The best previously reported 10fps on VGA images [52] error rate was 0. 60%[321 IV. CONCLUSION Connection with Other Approaches in Object Recognition The Convolutional Network architecture is a remarkably Many recent successful object recognition systems can also versatile, yet conceptually simple paradigm that can be applied be seen as single or multi-layer feature extraction systems fol- to a wide spectrum of perceptual tasks. while traditional lowed by a classifier. Most common feature extraction systems ConV Net trained with supervised learning are very effective like SIFT [401, HoG [41] are composed of filterbanks(oriented training them require a large number of labeled trainin edge detectors at multiple scales) followed by non-linearities samples. We have shown that using simple architectural tricks (winner take all) and pooling(histogramming). A Pyramid such as rectification and contrast normalization, and using 255 unsupervised pre-training of each filter bank, the need for [24] V Jain and H. S Seung. "Natural image denoising with convolutional labeled samples is considerably reduced. Because of their networks, in Advances in Neural Information Processing Systems 21 (NIPS 2008). MIT Press, 2008 applicability to a wide range of tasks, and because of their rel-[25]E Ning, D. Delhomme, YLeCun, F Piano, L Bottou, and P. Barbano atively uniform architecture, Conv Nets are perfect candidates Toward automatic phenotyping of developing embryos from videos for hardware implementations, and embedded applications, as EE Transactions on Image Processing special issue on Molec ular and Cellular Bioimaging demonstrated by the increasing amount of work in this area. [26] V. Jain, J. F. Murray, F.Roth,S. Turaga, V. Zhigulin,K.Briggman We expect to see many new embedded vision systems based M. Helmstaedter, W. Denk, and H. s. Seung, Supervised learning of on ConVNets in the next few years image restoration with convolutional networks "in ICCV07 27] M. Mozer, The Perception of Multiple Objects, A Connectionist Ap- Despite the recent progress in deep learning, one of the major challenges of computer vision, machine learning, and [28] T. Serre, L. Wolf, and T. Poggio, "Object recognition with features AI in general in the next decade will be to devise methods [29] Mutch and D). (i. Lowe, "Multiclass object that can automatically learn good features hierarchies from calized features in CVPR. 2006 unlabeled and labeled data in an integrated fashion. Current [301 G. E Hinton and R. R. Salakhutdino "Reducing the dimensionality of data with neural networks, Science 2006 and future research will focus on performing unsupervised [311 Y Bengio, P. Lamblin, D Popovici and h. Larochelle, Greedy layer learning on multiple stages simultaneously, on the integration [32] M. Ranzato, Y. Boureau, and Y LeCun. "Sparse feature learning for of unsupervised and unsupervised learning, and on using the deep belief networks, in N/ feed-back path implemented by the decoders to perform visual [33] K. Jarrett, K. Kayukcuoglu, M: Ranzato, and y LeCun, what is the best inference, such as pattern completion and disambiguation Conference on Computer Vision (1CCV09). TFEE, 2009 REFERENCES 34 H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng ,"Convolutional deep belief networks for scalable unsupervised learning of hierarchical [l] Y. LeCun. B. Boser, J.S. Denker, D. Henderson, R. E. Howard. [351 A.Ahmed, K. Yu, W. Xu, Y Gong, and E. Xing, "Training hierarchical back-propagation network, in NIPS'89 feed-forward visual recognition models using transfer learning from pseudo-tasks, in ECCV. Springer-Verlag, 2008 [2] Y. LcCun, L Bottou, Y Bengio, and P. Haffncr, "Gradicnt-bascd learning [36] j. Weston, F. Rattle, and R. Collober ,"Deep learning via semi- applied to document recognition, Proceedings of the IEEE, 1998 supervised embedding, in ICML, 2008 [3] S. Lyu and E. P. Simoncelli,"Nonlinear image representation using [37] K. Kavukcuoglu, M. Ranzato, and Y LeCun inference in sparse divisive normalization, in CVPR, 2008 coding algorithms with applications to object lition,Tech. Rep N. Pinto, D. D. Cox, and JJ DiCarlo, " Why is real-world visual object 2008, tech Report CBLL-TR-2008-12-01 recognition hard? Plo.s Comput Biol, voL. 4, no. I, p e27, 01 2008 [38 B. A. Olshausen and D. J Field, "Sparse coding with an overcomplete [5]Y. LeCun. B. Boser, S. Denker, D. Henderson, R. E. Howard basis set: a strategy employed hy Vision research. 997 W. Hubbard, and L. D Jackel, "Backpropagation applied to handwritten [39] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial zip code recognition, Neural Computation, 1989 and J. C. Platt, "Best practices for pyramid matching for recognizing natural scene categories, in CVPR 6]Y. Simard, Patrice, D. Steinkraus convolutional neural networks applied to visual document analysis, " in [401 D.G.Lowe, Distinctive image features from scale-invariant keypoints, ICDAR'O3 International Journal of Computer vision, 2004 [7 K. Kavlkclogl, M. Ranzato, R. Fergus, and Y.LeCun,"L earning [41]N. Dalal and B. Triggs, "Histograms of oriented gradients for human invariant features through topographic filter maps, " in CVPR09 detection in CVPR. 2005 [8]Y Le Cun, L. Bottou, G. Orr, and K. Muller, " Efficient backprop, in [42] Y Lc Cun S Chopra, R Hadsell, M. Ranzato, and F Huang, "A tutorial Neural Networks: Tricks of the trade, 1998 on energy-based learning, in Predicting Structured Data, G. Bakir, [9 K. Fukushima and S. Miyake, Neocognitron A new algorithm for T. Hofman, B. Scholkopf, A. Smola, and B. taskar, Eds. MIT Press [lo Patern Recognition tolerant of deformations and shifts in position, pattern rece 2006 [43 Y. Le Cun and L. Bottou,"Lush reference manual, Tech. Rep )K. Chellapilla, M. Shilman, and P. Simard,"Optimally comhining 2002,codeavailableat[oNline].Available cascade of classifiers, "in Proc. o/ Document Recognition and Retrieval 13, Electronic Imaging, 6067, 2006 http://ush.sourceforewukcuoglu,andY.Lecun,"eblEarn:Open-source 44] P. Sermanet, K. Kay LIl] K. Chellapilla, s. Puri, and P. Simard," High performance convolutional energy-based learning in C++, in Proc. International Conference on neural networks for document processing n WFHR06 Tools with Artificial Intelligence(1CTA/09). IEEE. 2009 [12] A. Abdulkader, A two-tier approach for arabic offline handwriting [45] R Collobert, Torch, prescnted at the Workshop on Machine Lcarnin recognition,in /WFHR'06 Open Source Software, NIPS, 200 [13] K. Chellapilla and P Simard, "A new radical based approach to offline [46] B Boser, E. Sackinger, J. Bromley, Y LeCun, and L Jackel, An analo andwritten east-asian character recognition in /WFHR'OO neural network processor with programmable topology, IEEE Journal localisation of objects in images, "IEE Proc on Vision, Image, and Signal (47) f Solid-State Circuits, vol 26, no. 12, Pp. 2017-2025, December 1991 [14]R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the Boser,. Bromley, cun. and l. d. jackel Processing, voL. 141, no. 4, pp. 245-250, August 1994 Application of the ANNA neural network chip to high-speed character [15 C. Garcia and M. Dclakis,"Convolutional facc finder: A neural archi recognition " IEEE Transaction on Neural Networks. 1992 tecture for fast and robust face detection, IEEE Transactions on Pattern 48 O Nomura and T morie, " Projection-field-type vlsi convolutional neural Analysis and Machine intelligence. 2004 Learning Research, vol 8, pp. 1197-1215, May 200 face detection [49] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A Linares-barranco, [16 M. Osadchy, Y. LeCun, and M. Miller, " Synergistic and pose estimation with energy-based models, Jo R. Paz.Vicente. F. Gomez-Rodriguez. L. Camu nas-Mesa. R. Berner M. Rivas-Perez. T. Delbruck. S -C. Liu. R. douglas. P. hafiger [17 F Nasse, C. Thurau, and G. A. Fink, Face detection using gpu-based G. Jimenez-Moreno C. Balleels. T Scrrano-Gotarrcdona. A. J convolutional neural networks, pp 83-90, 200D Acosta-Jimenez. and b. Linares - barranco. "Caviar: a 45k neuron [18 A Frome, G. Cheung, A. Abdulkader, M. Zennaro, B Wu, A. Bissacco 5m synapse, 12g connects/s aer hardware sensory-processing-learning H. Adam. H. Neven, and L. Vincent, Large-scale privacy protection in actuating system for high-speed visual object recognition and tracking, street-level imagery, in 1CCV09 Trans. Neur: Netw. vol. 20. no. g 1417-1438,2009 [ 9]S Nowlan and J. Platt, A convolutional neural network hand tracker [50] J. Cloutier, E. Cosatto, S. Pigeon, F. Boyer, and P. Y Simard, "Vip San Mateo, CA: Morgan Kanfmann, 1995, pp. 901-908 An fpga-based processor for image processing and neural networks, in MicroNeuro. 1996 networks "in International Conference on Computer Vision Theory and [5I C. Farabet, C. Poulet, J.Y. Han, and Y. LeCun, Cnp: An tpga-based Applications(VISAPP 2008), 2008 processor for convolutional networks, in International Conference on [21] Y. LeCun, U. Muller, J. Ben, E. Cosatto, and B. Flepp, Off-road Field Programmable Logic and Applications, 2009 obstacle avoidance through end-to-end learning, "in Advances in Neural [521 C. Farabet, C. Poulet, and Y. LeCun,"An tpga-based stream processor Information Processing Systems(NIPS 2005). MIT Press, 2005 for embedded real-time vision with convolutional networks in fifth [22]R. Hadsell, P. Sermanet, M. Scoffier, A. Erkan, K. Kavackuoglu IEEE WOrkshop on Embedded Computer vision(ECv09). IEEE, 2009 Muller, and Y. Le Cun, "Learning lo ong-ran for autonomou off-road driving, " Journal of Field Robotics, vol. 26, no. 2, Pp. 120-144 February 2009 [23 M. Happold and M. ollis, "Using learned features from 3d data for robot navigation 2007.

试读 4P ISCAS2010-Convolutional networks and applications in vision
立即下载 身份认证后 购VIP低至7折
  • 分享王者

关注 私信
ISCAS2010-Convolutional networks and applications in vision 15积分/C币 立即下载
ISCAS2010-Convolutional networks and applications in vision第1页

试读结束, 可继续读1页

15积分/C币 立即下载