### Using Ranking-CNN for Age Estimation 25 收藏

CVPR2017上的一篇论文，该论文对卷积神经网络的应用，很有独到之处。
log-likelihood as the loss function, and minimize it using stochastic gradient descent. Output 3.2. Ranking-CNN Assume that xi is the feature vector representing the ith Figure 2. Architecture of a basic binary Cnn sample and y; E i1,.,K is the corresponding ordinal la bel. To train the k-th binary CNN, the entire dataset D is 3. Ranking-CNN for Age Estimation split into two subsets, with age values higher or lower (or The training of ranking-Cnn consists of two stages: pre- equal to ) than k training with facial images and fine-tuning with age-labeled faces. First, a base network is pre-trained with uncon- D={(x,+1)y>k},D={(x,-1)y≤k.(1) strained facial images 9] to learn a nonlinear transforma- tion of the input samples that captures their main variation Based on different splitting of D, K-1 basic networks can From the base network we then train a series of basic binary be trained from the base one note that in our model each CNNS with ordinal age labels. Specifically, we categorize network is trained using the entire dataset, typically result samples into two groups: with ordinal labels either higher ing in better ranking performance and also preventing over- or lower than a certain age, and then use them to train a cor- fitting. Given an unknown input xi, we first use the basic networks to make a set of binary decisions and then aggre responding binary Cnn. The fully connected layers in the binary Cnn first flatten the features obtained in the previ- gate them to make the final age prediction r(xi) K-1 ous layers and then relate them to a binary prediction. The weights are updated through stochastic gradient descent by r(x)=1+∑[(x)>0 comparing the prediction with the given label. Finally, all where fK(xi) is the output of the basic network and the binary outputs are aggregated to make the final age pre- denotes the truth-test operator, which is l if the inner con diction. In the following, we present our system in details dition is true and o otherwise lt can be shown that the 3.1. Basic Binary CNNs final ranking error is bounded by the maximum of the bi nary ranking errors. That is, the ranking-CNN results can be As shown in Fig. 2 a basic CNN has three convolutional improved by optimizing the basic networks. We mathemat- and sub-sampling layers, and three fully connected layers. ically prove this in Section 3. 2. 1 followed by the theoretical Specifically, Cl is the first convolutional layer with feature comparison between ranking and softmax-based multi-class maps connected to a 5X 5 neighboring area in the input. classification in Section 3. 2.3 There are 96 filters applied to each of the 3 channels (rgb of the input, followed by Rectified Linear Unit(ReLU)[28I 3.2.1 Error bound S2 is a sub-sampling layer with feature maps connected to corresponding feature maps in Cl. In our case, we use In ranking-CNN, we divide an age ranking estimation prob- max pooling on 3 X 3 regions with the stride of 2 to em- lem, ranging from I,..., K, into a series of binary classifica- phasize the most responsive points in the feature maps. S2 tion sub-problems(K-1 classifiers). By aggregating the re is followed by local response normalization (LRN that can sults of each sub-problem, we then obtain an estimated age aid generalization [191. C3 works in a similar way as cl r(x). To assure a better overall performance of the model with 256 filters in 96 channels and 5x 5 filter size followed a key issue is whether the ranking error can be reduced if by ReLU. Layer S4 functions similarly as S2, and is fol- we improve the accuracy of the binary classifiers. We rigor lowed by lrn. Then, C5 is the third convolutional layer ously address this issue with formal mathematical proof in with 384 filters in 256 channels and smaller filter size 3 x 3 this section followed by the third max pooling layer S6 Here, we provide a much tighter error bound for age F7 is the first fully connected layer in which the feature ranking than that introduced in [2 which claims that the maps are flattened into a feature vector. There are 512 neu- final ranking error is bounded by the sum of errors gen rons in F7 followed by ReLU and a dropout layer . F8 is erated by all the classifiers. We adopt the idea in [21 the second fully connected layer with 512 neurons that re- divides the errors of sub-problems into two groups: over- ceives the output from F7 followed by relu and another estimated and underestimated errors. However, instead of dropout layer. F9 is the third fully connected layer and simply aggregating errors, we rearrange them in an increas computes the probability that an input x (i.e, output aft ing order and go deep into the analysis of the underlyin F8)belongs to class i using the logistic function. The op differences between any adjacent sub-classifier errors inside timal model parameters of a network are typically learned each group By the accumulation of those differences, we through minimizing a loss function. We use the negative theoretically obtain an approximation for the final ranking error, which is controlled by the maximum error produced It follows e(E+)2LT. Similarly, we can show eE-2E among sub-problems We denote e K-1, as the number of mis Then, maxk ek-maxlete+)e(E-j> maxEt, E h, which completes the proof classifications fk(x)>0 when the actual value y<k,k= 1,…,K-1. Similarly,, we denote e-=∑k- as the opposite case, where %=lfu(x)> oby k and Y 3. 2.2 Technical Contribution of the new error bound Ik(x)<olly>k, and l- is an indicator function taking R Ranking-CNN can be seen as an ensemble of CNNs, fused value of 1 when the condition in holds, O otherwise with aggregation. By showing that the final ranking error For any observation(x,y), we define the cost function is bounded by the maximum error of the binary rankers (error) for each classifier as we make significant technical contribution in the following ek=(k-y+1)?y≤k aspects ek=(y-k% y>k 1. Theoretically, it was mentioned in both  and [29 Thus, we have a theorem for the error bound of final ranking that the inconsistency issue of the binary outputs could error not be resolved because that would make the train ing process significantly complicated. The aggrega Theorem 1 For any observation(x, y), in which y >0 is the tion was just carried out without explicit understanding actual label (integer ), then the following inequality holds of the inconsistency. with the tightened error bound r(x)-y≤ marek(x) we can confidently demonstrate that the inconsistency doesnt actually matter because as long as the maxi- where r(x) is the estimated rank of age, k=1,., K-1 mum binary error is decreased, the error produced by That is, we can diminish the final ranking error hy minimiz inconsistent labels can be ignored It would neither ing the greatest binary error influence the final estimation error nor complicate the training procedure Proof Denote ek(r) in (3 as ek for simplicity. We split the 2. Methodologically, the tightened bound provides ex- proof into two parts. Firstly, we show E+-E tremely helpful guidance for the training of ranking- yI. Secondly, we demonstrate maxk ek maxE.e. By CNN. The training of an ensemble of deep learning E+-E< maxEt, E) for Et and E nonnegative, the Models is typically very time consuming, especially nequality (4p follows when the number of sub-models is large. Based on Firstly, we begin by definition our results, it is technically sound to focus on the sub- K models with the largest errors. This training strategy will lead to more efficient training to achieve the de +∑k=1((x)>Oy≤k](x)>0y>k] =1+E++∑k1(x)>0]>k sired performance gain. The training strategy can also be extended to ensemble learning with other decision fusion methods Subtracting(E+-E )on both sides, we get r(x)-(E+-E 3. Mathematically, based on the new error bound, we can 1+2=(x)>0y>k+∑=1(x)≤Oy>k rigorously derive the expectation of prediction error k(x)>0]+[k(x)<0)y>k of ranking-CNN and prove that ranking-CNN outper- forms other softmax-based deep learning models. The k] detailed proof is given in the next section 6) Thus r(x)-y=Et-e ho 3.2.3 Ranking vs Softmax an increasing order denoted as a set leli'sl2e them in Secondly, we extract all et. >0 and re In this section, we focus on demonstrating that ranking- E+} Cnn outperforms softmax method because it is more likel Similarly, we do the same operation on ek and have the set to get smaller prediction error [r(x)-yl. The reason is that 2, .., E, where for any random variable \$, softmax failed to take the ordinal relation between ages into S. denotes the order statistics consideration. Thus, instead of a softmax classifier, ranking Since y is an integer, by (3, et 2 1 and e method is preferred for age estimation I for any j∈{1,2,…,E+}. We observe that a basic cnn in ranking-cnn differs from the softmax multi-class classification approach in the output layer. Su +)1+=t)-1l+…+l=tE (7) pose after fully-connected layer, we get z1, 22, ".zK from K networks. Denote y as the estimated age label, and ai-e E(r2(x)-yly=2)=P(r2(x)=1|x)+P(r2(x)=3x where el, is the natural exponential function. For softmax a+c the posterior probability of each class is given by a+b+c (16 P(°∈ix) Given y=3, K 8) E(rI(x)-ylly=3 fori= 1, .. K. Then, the expected error given the label of 2P(f1(x)<0,n2(x)<0x)+P(f1(x)>0,12(x)<0x) the observation(r, y)is +P(f1(x)<0,f2(x)>0x) 2ab+b+ac E(r(x)-ylly)=Ekili-yP(=ilx (a+b)(b+c) For ranking-CNN. we use K-l classifiers to determine (17) ordinal relation between adjacent ages. The posterior prob E(n2(x)-yy=3)=2P(r2(x)=1|x)+P(r2(x)=2x 2u+b ability for a prediction of age greater than a specific age i is a+6+ given Thus, for ranking-cnN. it follows i+1 P(f(x)>0x)=c2+e+at+a1+1 E(ri(x)-yD)=egr(x)-illy=i ab+bc Then, the expected error for a given sample is (19) a+b)(b+c) E(r(x)-yy)=∑1÷-yP(=ix.(11) Similarly, for soft We present a theorem for a three ordinal class problem. In atc the theorem, we use a, b, c to represent a1, a2, a3 respec- E(12-y)=∑31E(n2(x)-iy=i)=2+ a+b+ tively for better clarity (20) Since Theorem 2 Suppose we have classes 1, 2 and 3 with u, b, c>0 respeclively. There exists an ordinal relation a+c ab+ bc 1<2<3. Denote the rank obtained by ranking-CNN as a+b+c(a+b)(b+c a-c+ca (21) rI(x)and the result by softmax as r2(x). Then (a+b)(b+c)(a+h+c) E(n1(x)-y)<E(r2(x)-y) then we conclude Proof. Given a sample with label 1, the expected errors for E(r(x)-y)<E(r2(x)-y) (22 ranking-cnn and softmax are Furthermore, the cases for K=4.5, ..could be shown E(n1(x)-yy=1) in a similar way by induction. However, when the number =2P(1(x)>0.2(x)>0x)+P(1(x)>0,(x)<0x)ofclasKincreases,theanalyticepresionofthedstibu- +P(f1(x)<0,f2(x)>0x) tion for each class i=1, ... K, becomes b b b a+bb+c a+bb+c a+bb+c P=iy)=∑Ip∏(1-P),(23) 26C+b+ac 4∈tj∈A (a+b(b+c) atisfying a Poisson-Binomial distribution, where pi (13) i is the subset of i integers that could be and lected from 11, 2,.. ., K and Ac is the complement of A E(r2(r)-ylly=1)=2P(r2(x)=2 x)+P(r2(x)=3 x) Notice that i represents C2 possible cases. Then,to com- 2c+h pute the expected value becomes dreadful since listing all a+b+c the probability out as we did in thcorem 2 seems impracti (14) cal. Though Le Cam et al.  gave an approximation of respectively Poisson-Binomial by a Poisson distribution, the computa Similarly, given y= 2, tion for E(r(x-y KK =P(1(x)>0.2(x)>0x)+P(1(x)<0,/P(x)<0|x) E(n1(x)-y)=∑∑|-yP(=r)(24) ab+be y=1r=1 a+b)(b+c) is still unrealistic. So, we generalize with the help of learn- ( 15) ing theory Theorem 3 Suppose the vC dimension of each basic CNn The age and gender information of the selected samples is classifier's hypothesis spaces ii is d, the sample size for shown in Tablel Note that these images are not used in the training is m. Then for any 8E 0, 1], with probability at pre-training stage. All the selected samples are then divided least 1-8, the expected error of the ranking -CNN is upper into two sets: 80% of the samples are used for basic net bounded as follows works training and the rest 20%0 samples for testing. There is no overlapping between the training and testing sets, and EpIr(x)-yls maxx(x)+21/dlog(2m)+log(o) we use 5-fold cross-validation to evaluate the performance (25 during experiments he (x) denotes the empirical values for Epek(x) Table 1. The age and gender information of the 54, 362 samples randomly selected from MORPH Album 2 Proof. Taking expectation on both sides of Eq (4), we <2020-2930-3940-49>50 Total Mle654313849123229905332145940 EDr(x)-y≤ Ep max ek(x) (26 emale8292291288619754418422 k Total7372161401520811880376254362 Using Vapnik-Chervonenkis theory , the desired resu follows We adopt a general pre-processing procedure for face de tection and alignment before feeding the raw data to the net- Remark 4 Notice the expected error for ranking-CNN is works. Specifically, given an input color image, we first bounded by the maximum training error produced by its ba- perform face detection using Harr-based cascade classifiers sic CNNs with binary output, adding a term associated with . Then, face alignment is conducted based on the loca VC dimension. Since the vC dimension d of a softmax out- tions of eyes. Finally, the image is resized to a standard size put CNn is greater than that of a basic cnn presented in of 256x256x3 for network training and age estimation Fig.2 (32 if the weights of previous layers are fixed, it re- sults in a greater second term on right hand side of Eq.(25) 4. Experiments for a cnn with softmax output layer. It follows that given the same training samples, ranking-CNn is more likely to In this section, we demonstrate the performance of attain a smaller error by minimizing the training errors(the ranking-CNN through extensive experiments. We imple- first term in Eq.(25) than the one with a softmax output mented the architecture for ranking CNN in the GPu mode with Caffe[17. For the 3+3 architecture of a basic CNN The error bound in Eq. (25) provides a solid support for our shown in Fig. 2 it is derived from a simplified version of the framework. We will further verify this conclusion in the Image Net CNn[19 with fewer layers for higher efficiency sense of statistical significance by t-test later in the experi- . The network is initialized with random weights fol ment section lowing Gaussian distribution, the mean is 0, and standard deviation is ool 3.3. Age Estimation For our hardware settings, we use a single GtX 980 When humans predict a persons age, it is generally eas graphics card (including 2,048 CUDA cores), 17-4790K to determine if a person is elder than a specific age than CPU, 32GB RAM, and 2Tb hard disk drive. The training directly giving an exact age. With ranking-CNN, it pro time for the base cnn with the selected 3+3 architecture vides a framework for simultaneous feature learning and is around 6 hours. Fine-tuning takes about 20 to 30 min age ranking based on facial images. The rationale of us utes for each basic CNN. Totally, it takes about 30 hours to ing ranking-CNN for age estimation is that the age labels pre-train the base Cnn and fine-tune 50 baSIc CNNs are naturally ordinal, and ranking-CNn can keep the rela- 4. 1. Evaluation metrics tive ordinal relationship among different age groups First, we pre-train a base network with 26,580 image For multiple age estimation, we compared the features samples from the unfiltered faces dataset 19 The age learned by ranking-cnn with the ones obtained througl group labels for these images are used in training as sur- BIF+OLPP , ST, and multi-class CNN. BIF features rogate labels 18. Then, we fine-tune our ranking-cnn are implemented with Gabor filters in 8 orientations and 8 model on the most commonly used age estimation bench- scales and followed by max-pooling. In addition, OLPP is mark dataset: MORPH Album 2 . MORPH contains employed to learn the age manifold based on BIF features 55, 134 facial images with the age range from 16 to 77. Fol in which the top 1,000 eigenvectors are used. In St, the ga lowing the settings used in some recent work on age esti- bor coefficients are scattered into 417 routes in two convo mation [29,3743 we randomly select 54, 362 samples lutional layers and pooled with gaussian smoothing. multi in the age range between 16 and 66 from mOrPh dataset. class cnn is commonly used for age estimation [25 39 Table 2. Comparison of MAe among different combinations of features and estimators. The lowest MAE is highlighted in bold. A dash in the table means that the selected feature is not applicable to the selected estimator. ENGINEERED FEATURES LEARNED FEATURES BIF+OlPP CNN FEATURE RANKING-CNN FEATURE CLASSIFICATION SVM 4 00 5.15 MODEL MULTI-CLASS CNN 3.65 RANKIN RANKING-SVM 5.03 4.88 3.63 MODEL RANKING-CNN 2.96 but it completely ignores the ordinal information in age la- features (i.e., BIF+OLPP and ST),learned classification bels. Its structure is similar to a basic cnn (three convolu features (Multi-class CNN) and learned ranking features tional and pooling layers and three fully connected layers ranking-CNN), and two sets of age estimators: classifica with the exception that the last fully-connected layer con- tion methods (i.e, SVM and Multi-claSs CNN) and rank- tains multiple outputs corresponding to the number of ages ing methods (ranking-CNN and ranking-SVM). We report to be classified instead of the binary ones. As for the age es- MAE of all possible combinations of feature extractors and timators, SVM is selected for comparison due to its proved age estimators (eight in total) in Table 2 A dash in the table performance [15 In ranking-based approach(Ranking- means that the selected feature set is not applicable to the SVM), following [2, SVM is used as the binary classifier selected estimator for each age label and the results are aggregated to give the As shown in Table 2 ranking-CNN with its features final output achieves the lowest mae of 2. 96 in all the combinations The comparison and evaluation of different methods in Ranking-Cnn features with Ranking -SVM achieves the our experiments are reported in terms of accuracy of each second best mae result and this validates the effectiveness binary ranker as well as two widely adopted performance and generality of ranking-CNN features. In comparison, the measures [29 2]: Mean Absolute Error(MAE)and Cumu- lowest MAE achieved by the learned classification features lative Score(CS). Mae computes the absolute costs be is 3.65. Note the multi-class CNn represents the commonl tween the exact and the predicted ages the lower the bet used CNN-based age estimation methods [2539 Our ex ter): MAE=EHjei/m, where ei= i-lil is the absolute perimental results strongly support the theoretical results cost of misclassifying true label l; to li, and M is the to- (ranking v.S. softmax)we presented in Section 3.2.3An tal amount of testing samples. CS indicates the percentage other fact we can see is that the performance of CNN-based of data correctly classified in the range of (li-L, I i+L, features gets weakened when combined with SVM-based a neighbor range of the exact age l;(the larger the better ): estimators. The lowest MAE achieved by engineered fea- CS(L)=EilleisLI/M, where is the truth-test operate tures is 4.88 by St+ranking-SVM. Notice that ST works and L is the parameter representing the tolerance range. better with ranking-SVM. and bif+olpp works better with Also, we used paired t-test to demonstrate the statisti SVM. This could be caused by the fact that in the literature cal significance of our empirical comparison. We employ pecific features were manually selected for certain estima- paired t-test to determine if ranking-cnn significantly out- tors to achieve the best performance performs other methods. A two-sample t-statistic with un- Table 3. Comparison with MR-CNN, OR-CNN and DEX on the known but equal variance is computed MoRPh dataset. The lowest Mae is highlighted in bold Ranking-CNN MR-CNN OR-CNNDEX 4.2. Age Estimation Results MAE 296 3.27 3.34 3.25 In this section, we consider the age estimation problem In Table 3 we compare ranking-CNN with the most re in the range between 16 and 66 years old and compare cent age estimation models, i. e, Ordinal Regression with ranking-CNN with other state-of-the-art feature extractors CNN(OR-CNN), Metric Regression with CNN(MR-CNN and age estimators. As there are 51 age groups in this age  and Deep EXpectation(DEX). Since the experi range, 50 binary rankers are needed for ranking approaches ments are all carried out on morph dataset and we fol (i.e, ranking-CNN and ranking-SVM) In our experiments lowed the settings in  for data partition, we can directly 43,490 samples(80% of all the randomly selected samples) compare the MAE of Ranking-CNN with the ones obtained with binary labels are selected to train each basic network by MR-CNN, OR-CNN and DEX. Clearly, ranking-CNN or SVM in ranking-CNN and ranking-SVM, respectively. outperforms all MR-CNN, OR-CNN and DEX, and signifi The exactly same set of samples with multi-class labels are cantly improves the performance of age estimation used to train multi-class CNN and SVM, respectively. The The comparison in terms of cs of the eight combina rest 10, 872 samples were used for testing results. All exper- tions of features and estimators are given in Fig 图 Clearly iments are carried out with 5-fold cross-validation anking-CNN outperforms all others across the entire range Basically, we have three sets of features: engineered of L(age error tolerance range) from 0 to 10. Specifically, Table 4. T test outcomes of all eight combinations of features and estimators. Numbers #l to #8 correspond to eight compared models in the sequence of: RANKINGi-CNN, RANKINGi-CNN FEATURE+RANKING-SVM, ST+RANKING-SVM, BIF+OLPP+RANKING- SVM. MULTI-CLASS CNN. CNN FEATURE+SVM. ST+SVM and bif+olpP+svm #3 6 # #1 RANKING-CNN NAN #2 RANKING-CNN FEATURE 6.36e -148 NAN 0.85 tRANKING-SVM #3 ST+RANKING-SVM 000 NAN #4 bf+olpPtraNKINg-Svm 1.79e-135NAN 800 00.990.81 #5 MULTI-CLASS CNN 0.14 NAN #6 CNN FEATURE+SⅤM 4.12c-276890e-184 543c-24NAN1 #7 ST+SVM 0 0 194e-1212.00e 0 0 nan 3. 66e #8 bif+olpp+svm 0 4.56e900.18 0 0.99 NAN our results clearly illustrated the remarkable improvement of using ranking-CNN for age estimation Last, to demonstrate that the experimental results we ob- tained do not happen simply by chance, we report in Table 4 the p-values from paired t-test at significant level 1%.In RANKING-NN FFAT IRF.RANKING BIF+OLPP.RAVKING-SVM Table 4 if p< 1%0, we reject the null hypothesis. Other- MULTI-CLASS CNN CNM FEATURE· wise, we dont. For example, when comparing"ranking CNN'withranking-CNn featuretranking sv,, the p value 6.36e-148 is much less than 0. 01. which means that we reject the null hypothesis that "the performance of Age error tolerance ranking-CNN is not significantly improved". The"NaN Figure 3. Comparison on Cumulative Score with L in[0, 10 in the table means we could not compare a method with itself. As we can see, statistically, ranking-CNN signifi Ranking-CNN can reach the accuracy of 89.90%c for L=6 cantly outperforms all other methods, which implies if we and 9293%o for L=7. The other fact we notice is that four repeat the experiments for numerous times, then in 99%o of CNN-based methods reach a higher accuracy for L-10 those experiments, ranking-Cnn would significantly out- than the others perform. From the table, Ranking-CNN Feature+Ranking SVM and the Multi-Class Cnn tied for the second place followed by cnn Feature+sVM. st+Ranking svm stands out among the engineered feature-based methods Lastly BIF+OLPP+Ranking-SVM ties with BIF+OLPP+SVM and ST+SVM has no significant improvement than any other method 5. Conclusion RANKING S\M In this paper, we proposed ranking-CNN, a novel deep 618202224262830323436 ranking framework for age estimation. We established a Figurc 4. accuracy of cach binary ranker in ranking modcls. much tighter error bound for ranking-based age estimation and showed rigorously that ranking-CNN, by taking the g.图 we further compare the four ranking-based ordinal relation between ages into consideration, is more methods and report their performance on each binary likely to get smaller estimation errors when compared with ranker. Again, ranking-cnn demonstrates a consistent multi-class classification approaches. Through extensive outstanding performance throughout all binary problems. experiments, we show that statistically, ranking-CNN sig Note that when the data for the binary rankers are not bal- nificantly outperforms other state-of-the-art age estimation anced(and thus higher baseline accuracy, e. g, age< 20 and methods on benchmark datasets age> 48), all rankers seem to perform quite well. However, when it comes to the age range with more balanced data Acknowledgment This work was partially supported by (and thus lower baseline accuracy, age 20-48), the supe- US National Science Foundation (NSF)under grant CNS rior performance of ranking-CNN is shown, and this would 1637312, and by Ford Motor Company University Research lead to better overall performance of age estimation. again Program under grant. 2015-9186R References on Computer Vision and Pattern Recognition(CVPR), pages 112-119.2009  T. Ahonen, A Hadid, and M. Pietikainen. Face description  S Ioffe and C. Szegedy. Batch normalization: Accelerating with local binary patterns: Application to face recognition deep network training by reducing internal covariate shift IEEE Transactions on Pattern Analysis and Machine intelli arXiv preprint ar Xiv: 1502.03167, 2015 gence,28(12):2037-2041,2006  YJia, E. Shelhamer, J. Donahue, SKarayev, J. Long, R Gir K.Y Chang and C -s. Chen. A learning framework for age shick. S. Guadarrama. and T. Darrell. Caffe: Convol rank estimation based on face images with scattering trans tional architecture for fast fcaturc embedding. arXiv preprint form. /FEE Tran saction s on Image Processing, 24(3): 785 arXiv:l408.5093,2014 798,2015 [18 N. Kalchbrenner, E Grefenstette, and P. Blunsom. A con  K.-Y. Chang, C.S. Chcn, and Y.-P. Hung. Ordinal hyper- volutional neural network for modelling sentences. arXiv planes ranker with cost sensitivities for age estimation. In preprint arXiv: 1404.2188, 2014 IEEE Conference on Computer Vision and Pattern recogni  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet tion(CVPR), pages 585-592. IEEE, 2011 classification with deep convolutional neural networks. Ir K Chen, S Gong, T. Xiang, and C. Change Loy. Cumula Advances in Neural Information Processing Systems, pages tive attribute space for age and crowd density estimation In 1097-1105,2012 IEEE Conference on Computer Vision and Pattern Recogni-  Y II. Kwon and N. D. V Lobo. Age classification from fa- tion(CVPR), pages 2467-2474, 2013 cial images. In IEEE Computer Society Conference on Com 5R. Collobert, J. Weston, L. Bottou, M. Karlen puter Vision and Paltern Recognition (CVPR), pages 762 K. Kavukcuoglu, and P. Kuksa. Natural language pro 767.1994. cessing(almost)from scratch. Journal of machine learning [21 A. Lanitis, C. Draganova, and C. Christodoulou. Compar- Research,12:2493-2537.Nov.2011 ing different classifiers for automatic age estimation. IEEE  T.F. Cootes, G.J. Edwards, and C J Taylor. Active appear- Transactions on Systems, Man, and Cybernetics, Part B(Cy- ance models. IEEE Transactions on Pattern Analysis and hermetics),34(1):621628,2004 Machine intelligence, 23(6): 681-685, 2001 A. Lanitis, C.J. Taylor, and T. F. Cootcs. Toward auto  3. Deng, W. Dong, R. Socher, L.. Li,K. Li, and L. Fei- matic simulation of aging effects on face images. IEEE Fei. Imagenet: A large-scale hierarchical image database. In Transactions on Pallern Analysis und Machine intelligence, IEEE Conference on Computer Vision and Pattern recogni 24(4:442-455,2002 tion(CVPR), pages 248-255. IEEE, 2009  L Le Cam et al. An approximation theorem for the pois 8 A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and son binomial distribution. Pacific jMath, 10(4): 1181-1197 T. Brox. Discriminative unsupervised feature learning with 1960 convolutional neural networks. In Advances in Neural Infor- 124 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient mation Processing Systems, pages 766-774, 2014 bascd learning applicd to document recognition Proceed  E. Eidingcr, R. Enbar, and T. Hassncr. Agc and gender cs ings of the IEED,86(11):2278-2324,1998. timation of unfiltered faces. IELL Transactions on Informa [25 G. Levi and T. Hassner. Age and gender classification us lion Forensics and Securily, 9(12): 2170-2179, 2014 ing convolutional neural networks. In Proceedings of the [10 X Geng, Z.-H. Zhou, and K. Smith-Miles. Automatic age IEEE Conference on Computer vision and Pattern recogni estimation based on facial aging patterns. IEEE Transactions tion Workshops(CvPRW), pages 3442, 2015 on Pattern Analysis and Machine intelligence, 29(12): 2234  C. Li, Q. Liu, J. Liu, and H. Lu. Learning ordinal dis- 2240.2007 criminative features for age estimation. In IEEE Conference [11 X. Geng, Z.H. Zhou, Y. Zhang, G. Li, and H. Dai. Learning on Computer Vision and Pattern Recognition(CVPR), pages 2570-2577.IEEE,2012. from facial aging patterns for automatic age estimation. Ir J. Long, E. Shelhamer, and T Darrell. Fully convolutional Proceedings of the 14th annual aCM International Confe networks for semantic segmentation. In IEEE Conference ence on Multimedia, pages 307-316. ACM, 2006 on Computer vision and Pattern Recognition(CVPR), pages [12R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea 3431-3440,2015 ture hierarchies for accurate object detection and semantic [28 V. Nair and G. E. Hinton. Rectified linear units improve segmentation In IEEE Conference on Computer Vision and restricted boltzmann machines. In Proceedings of the 27th Pattern Recognition(CVPR), pages 580-587, 2014 International Conference on Machine learning, pages 807 3] A. Gunay and V. v Nabiyev. Automatic age classification 814.2010 with lbp. In Computer and Information Sciences. ISCIS08 29 Z Niu, M. Zhou, L. Wang, X Gao, and G. Hua. Ordinal 23rd International Symposium on, pages 1-4. IEEE, 2008 regression with multiple output cnn for age estimation. In [14 G. Guo, Y Fu, C.R. Dyer, and T.S. Huang. Image-based IEEE Conference on Computer Vision and Pattern recogni hunan age estimation by manifold learning and locally ad- tion(CVPR), June 2016 justed robust regression. IEEE Transactions on Image Pro-  K. Ricanek Jr and T. Tesafaye Morph: A longitudinal image cassin,17(7):1178-1188,200 database of normal adult age-progression. In /th Interna- [15 G. Guo, G. Mu, Y Fu, and T.s. huang. human age es tional Conference on Automatic Face and Gesture recogni timation using bio-inspired features. In IEEE Conference tion, pages 341-345, 2006 31R. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer vision, pages 1-14,2016 [32E D. Sontag. Vc dimension of neural networks. NATO ASI Series F computer and Systems Sciences, 168: 6996, 1998 [33 N Srivastava, G. E. Hinton, A. Krizhevsky, I Sutskever, and R Salakhutdinov. Dropout: a simple way to prevent neu ral networks from overfitting. Journal of Machine learning Research,15(1):19291958.2014 Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface Closing the gap to human-level performance in face verifica tion. In IEEE Conference on Computer vision and Pattern Recognition(CVPR). pages 1701-1708, 2014  V.N. Vapnik and V. Vapnik. Statistical learning theory, vol ume 1. wiley new York. 1998  P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Soci ety Conference on Computer Vision and Pattern recognition (CVPR), volume 1, pages 1-511. IEEE, 2001  X. Wang, R Guo, and C Kambhamettu. Deeply-learned fea- ture for age estimation. In Applications of Computer vision 2015 IEEE Winter Conference on, pages 534-541. IEEE 5.  Z. Yang and H. Ai. Demographic classification with local binary patterns. In International Conference on Biometrics, pages 464-473. Springer, 2007  D. Yi, Z. Lei, and s.Z.Li. Age estimation by multi-scale convolutional network. In Asian Conference on Computer Vision, pages 144-158. Springer, 2015

...展开详情

• 1
资源
• 0
粉丝
• 等级 Using Ranking-CNN for Age Estimation 30积分/C币 立即下载
1/10   30积分/C币 立即下载 ＞