Using RankingCNN for Age Estimation

CVPR2017上的一篇论文，该论文对卷积神经网络的应用，很有独到之处。
loglikelihood as the loss function, and minimize it using stochastic gradient descent. Output 3.2. RankingCNN Assume that xi is the feature vector representing the ith Figure 2. Architecture of a basic binary Cnn sample and y; E i1,.,K is the corresponding ordinal la bel. To train the kth binary CNN, the entire dataset D is 3. RankingCNN for Age Estimation split into two subsets, with age values higher or lower (or The training of rankingCnn consists of two stages: pre equal to ) than k training with facial images and finetuning with agelabeled faces. First, a base network is pretrained with uncon D={(x,+1)y>k},D={(x,1)y≤k.(1) strained facial images 9] to learn a nonlinear transforma tion of the input samples that captures their main variation Based on different splitting of D, K1 basic networks can From the base network we then train a series of basic binary be trained from the base one note that in our model each CNNS with ordinal age labels. Specifically, we categorize network is trained using the entire dataset, typically result samples into two groups: with ordinal labels either higher ing in better ranking performance and also preventing over or lower than a certain age, and then use them to train a cor fitting. Given an unknown input xi, we first use the basic networks to make a set of binary decisions and then aggre responding binary Cnn. The fully connected layers in the binary Cnn first flatten the features obtained in the previ gate them to make the final age prediction r(xi) K1 ous layers and then relate them to a binary prediction. The weights are updated through stochastic gradient descent by r(x)=1+∑[(x)>0 comparing the prediction with the given label. Finally, all where fK(xi) is the output of the basic network and the binary outputs are aggregated to make the final age pre denotes the truthtest operator, which is l if the inner con diction. In the following, we present our system in details dition is true and o otherwise lt can be shown that the 3.1. Basic Binary CNNs final ranking error is bounded by the maximum of the bi nary ranking errors. That is, the rankingCNN results can be As shown in Fig. 2 a basic CNN has three convolutional improved by optimizing the basic networks. We mathemat and subsampling layers, and three fully connected layers. ically prove this in Section 3. 2. 1 followed by the theoretical Specifically, Cl is the first convolutional layer with feature comparison between ranking and softmaxbased multiclass maps connected to a 5X 5 neighboring area in the input. classification in Section 3. 2.3 There are 96 filters applied to each of the 3 channels (rgb of the input, followed by Rectified Linear Unit(ReLU)[28I 3.2.1 Error bound S2 is a subsampling layer with feature maps connected to corresponding feature maps in Cl. In our case, we use In rankingCNN, we divide an age ranking estimation prob max pooling on 3 X 3 regions with the stride of 2 to em lem, ranging from I,..., K, into a series of binary classifica phasize the most responsive points in the feature maps. S2 tion subproblems(K1 classifiers). By aggregating the re is followed by local response normalization (LRN that can sults of each subproblem, we then obtain an estimated age aid generalization [191. C3 works in a similar way as cl r(x). To assure a better overall performance of the model with 256 filters in 96 channels and 5x 5 filter size followed a key issue is whether the ranking error can be reduced if by ReLU. Layer S4 functions similarly as S2, and is fol we improve the accuracy of the binary classifiers. We rigor lowed by lrn. Then, C5 is the third convolutional layer ously address this issue with formal mathematical proof in with 384 filters in 256 channels and smaller filter size 3 x 3 this section followed by the third max pooling layer S6 Here, we provide a much tighter error bound for age F7 is the first fully connected layer in which the feature ranking than that introduced in [2 which claims that the maps are flattened into a feature vector. There are 512 neu final ranking error is bounded by the sum of errors gen rons in F7 followed by ReLU and a dropout layer [33]. F8 is erated by all the classifiers. We adopt the idea in [21 the second fully connected layer with 512 neurons that re divides the errors of subproblems into two groups: over ceives the output from F7 followed by relu and another estimated and underestimated errors. However, instead of dropout layer. F9 is the third fully connected layer and simply aggregating errors, we rearrange them in an increas computes the probability that an input x (i.e, output aft ing order and go deep into the analysis of the underlyin F8)belongs to class i using the logistic function. The op differences between any adjacent subclassifier errors inside timal model parameters of a network are typically learned each group By the accumulation of those differences, we through minimizing a loss function. We use the negative theoretically obtain an approximation for the final ranking error, which is controlled by the maximum error produced It follows e(E+)2LT. Similarly, we can show eE2E among subproblems We denote e K1, as the number of mis Then, maxk ekmaxlete+)e(Ej> maxEt, E h, which completes the proof classifications fk(x)>0 when the actual value y<k,k= 1,…,K1. Similarly,, we denote e=∑k as the opposite case, where %=lfu(x)> oby k and Y 3. 2.2 Technical Contribution of the new error bound Ik(x)<olly>k, and l is an indicator function taking R RankingCNN can be seen as an ensemble of CNNs, fused value of 1 when the condition in holds, O otherwise with aggregation. By showing that the final ranking error For any observation(x,y), we define the cost function is bounded by the maximum error of the binary rankers (error) for each classifier as we make significant technical contribution in the following ek=(ky+1)?y≤k aspects ek=(yk% y>k 1. Theoretically, it was mentioned in both [2] and [29 Thus, we have a theorem for the error bound of final ranking that the inconsistency issue of the binary outputs could error not be resolved because that would make the train ing process significantly complicated. The aggrega Theorem 1 For any observation(x, y), in which y >0 is the tion was just carried out without explicit understanding actual label (integer ), then the following inequality holds of the inconsistency. with the tightened error bound r(x)y≤ marek(x) we can confidently demonstrate that the inconsistency doesnt actually matter because as long as the maxi where r(x) is the estimated rank of age, k=1,., K1 mum binary error is decreased, the error produced by That is, we can diminish the final ranking error hy minimiz inconsistent labels can be ignored It would neither ing the greatest binary error influence the final estimation error nor complicate the training procedure Proof Denote ek(r) in (3 as ek for simplicity. We split the 2. Methodologically, the tightened bound provides ex proof into two parts. Firstly, we show E+E tremely helpful guidance for the training of ranking yI. Secondly, we demonstrate maxk ek maxE.e. By CNN. The training of an ensemble of deep learning E+E< maxEt, E) for Et and E nonnegative, the Models is typically very time consuming, especially nequality (4p follows when the number of submodels is large. Based on Firstly, we begin by definition our results, it is technically sound to focus on the sub K models with the largest errors. This training strategy will lead to more efficient training to achieve the de +∑k=1((x)>Oy≤k](x)>0y>k] =1+E++∑k1(x)>0]>k sired performance gain. The training strategy can also be extended to ensemble learning with other decision fusion methods Subtracting(E+E )on both sides, we get r(x)(E+E 3. Mathematically, based on the new error bound, we can 1+2=(x)>0y>k+∑=1(x)≤Oy>k rigorously derive the expectation of prediction error k(x)>0]+[k(x)<0)y>k of rankingCNN and prove that rankingCNN outper forms other softmaxbased deep learning models. The k] detailed proof is given in the next section 6) Thus r(x)y=Ete ho 3.2.3 Ranking vs Softmax an increasing order denoted as a set leli'sl2e them in Secondly, we extract all et. >0 and re In this section, we focus on demonstrating that ranking E+} Cnn outperforms softmax method because it is more likel Similarly, we do the same operation on ek and have the set to get smaller prediction error [r(x)yl. The reason is that 2, .., E, where for any random variable $, softmax failed to take the ordinal relation between ages into S. denotes the order statistics consideration. Thus, instead of a softmax classifier, ranking Since y is an integer, by (3, et 2 1 and e method is preferred for age estimation I for any j∈{1,2,…,E+}. We observe that a basic cnn in rankingcnn differs from the softmax multiclass classification approach in the output layer. Su +)1+=t)1l+…+l=tE (7) pose after fullyconnected layer, we get z1, 22, ".zK from K networks. Denote y as the estimated age label, and aie E(r2(x)yly=2)=P(r2(x)=1x)+P(r2(x)=3x where el, is the natural exponential function. For softmax a+c the posterior probability of each class is given by a+b+c (16 P(°∈ix) Given y=3, K 8) E(rI(x)ylly=3 fori= 1, .. K. Then, the expected error given the label of 2P(f1(x)<0,n2(x)<0x)+P(f1(x)>0,12(x)<0x) the observation(r, y)is +P(f1(x)<0,f2(x)>0x) 2ab+b+ac E(r(x)ylly)=EkiliyP(=ilx (a+b)(b+c) For rankingCNN. we use Kl classifiers to determine (17) ordinal relation between adjacent ages. The posterior prob E(n2(x)yy=3)=2P(r2(x)=1x)+P(r2(x)=2x 2u+b ability for a prediction of age greater than a specific age i is a+6+ given Thus, for rankingcnN. it follows i+1 P(f(x)>0x)=c2+e+at+a1+1 E(ri(x)yD)=egr(x)illy=i ab+bc Then, the expected error for a given sample is (19) a+b)(b+c) E(r(x)yy)=∑1÷yP(=ix.(11) Similarly, for soft We present a theorem for a three ordinal class problem. In atc the theorem, we use a, b, c to represent a1, a2, a3 respec E(12y)=∑31E(n2(x)iy=i)=2+ a+b+ tively for better clarity (20) Since Theorem 2 Suppose we have classes 1, 2 and 3 with u, b, c>0 respeclively. There exists an ordinal relation a+c ab+ bc 1<2<3. Denote the rank obtained by rankingCNN as a+b+c(a+b)(b+c ac+ca (21) rI(x)and the result by softmax as r2(x). Then (a+b)(b+c)(a+h+c) E(n1(x)y)<E(r2(x)y) then we conclude Proof. Given a sample with label 1, the expected errors for E(r(x)y)<E(r2(x)y) (22 rankingcnn and softmax are Furthermore, the cases for K=4.5, ..could be shown E(n1(x)yy=1) in a similar way by induction. However, when the number =2P(1(x)>0.2(x)>0x)+P(1(x)>0,(x)<0x)ofclasKincreases,theanalyticepresionofthedstibu +P(f1(x)<0,f2(x)>0x) tion for each class i=1, ... K, becomes b b b a+bb+c a+bb+c a+bb+c P=iy)=∑Ip∏(1P),(23) 26C+b+ac 4∈tj∈A (a+b(b+c) atisfying a PoissonBinomial distribution, where pi (13) i is the subset of i integers that could be and lected from 11, 2,.. ., K and Ac is the complement of A E(r2(r)ylly=1)=2P(r2(x)=2 x)+P(r2(x)=3 x) Notice that i represents C2 possible cases. Then,to com 2c+h pute the expected value becomes dreadful since listing all a+b+c the probability out as we did in thcorem 2 seems impracti (14) cal. Though Le Cam et al. [23] gave an approximation of respectively PoissonBinomial by a Poisson distribution, the computa Similarly, given y= 2, tion for E(r(xy KK =P(1(x)>0.2(x)>0x)+P(1(x)<0,/P(x)<0x) E(n1(x)y)=∑∑yP(=r)(24) ab+be y=1r=1 a+b)(b+c) is still unrealistic. So, we generalize with the help of learn ( 15) ing theory Theorem 3 Suppose the vC dimension of each basic CNn The age and gender information of the selected samples is classifier's hypothesis spaces ii is d, the sample size for shown in Tablel Note that these images are not used in the training is m. Then for any 8E 0, 1], with probability at pretraining stage. All the selected samples are then divided least 18, the expected error of the ranking CNN is upper into two sets: 80% of the samples are used for basic net bounded as follows works training and the rest 20%0 samples for testing. There is no overlapping between the training and testing sets, and EpIr(x)yls maxx(x)+21/dlog(2m)+log(o) we use 5fold crossvalidation to evaluate the performance (25 during experiments he (x) denotes the empirical values for Epek(x) Table 1. The age and gender information of the 54, 362 samples randomly selected from MORPH Album 2 Proof. Taking expectation on both sides of Eq (4), we <20202930394049>50 Total Mle654313849123229905332145940 EDr(x)y≤ Ep max ek(x) (26 emale8292291288619754418422 k Total7372161401520811880376254362 Using VapnikChervonenkis theory [35], the desired resu follows We adopt a general preprocessing procedure for face de tection and alignment before feeding the raw data to the net Remark 4 Notice the expected error for rankingCNN is works. Specifically, given an input color image, we first bounded by the maximum training error produced by its ba perform face detection using Harrbased cascade classifiers sic CNNs with binary output, adding a term associated with [36]. Then, face alignment is conducted based on the loca VC dimension. Since the vC dimension d of a softmax out tions of eyes. Finally, the image is resized to a standard size put CNn is greater than that of a basic cnn presented in of 256x256x3 for network training and age estimation Fig.2 (32 if the weights of previous layers are fixed, it re sults in a greater second term on right hand side of Eq.(25) 4. Experiments for a cnn with softmax output layer. It follows that given the same training samples, rankingCNn is more likely to In this section, we demonstrate the performance of attain a smaller error by minimizing the training errors(the rankingCNN through extensive experiments. We imple first term in Eq.(25) than the one with a softmax output mented the architecture for ranking CNN in the GPu mode with Caffe[17. For the 3+3 architecture of a basic CNN The error bound in Eq. (25) provides a solid support for our shown in Fig. 2 it is derived from a simplified version of the framework. We will further verify this conclusion in the Image Net CNn[19 with fewer layers for higher efficiency sense of statistical significance by ttest later in the experi [25]. The network is initialized with random weights fol ment section lowing Gaussian distribution, the mean is 0, and standard deviation is ool 3.3. Age Estimation For our hardware settings, we use a single GtX 980 When humans predict a persons age, it is generally eas graphics card (including 2,048 CUDA cores), 174790K to determine if a person is elder than a specific age than CPU, 32GB RAM, and 2Tb hard disk drive. The training directly giving an exact age. With rankingCNN, it pro time for the base cnn with the selected 3+3 architecture vides a framework for simultaneous feature learning and is around 6 hours. Finetuning takes about 20 to 30 min age ranking based on facial images. The rationale of us utes for each basic CNN. Totally, it takes about 30 hours to ing rankingCNN for age estimation is that the age labels pretrain the base Cnn and finetune 50 baSIc CNNs are naturally ordinal, and rankingCNn can keep the rela 4. 1. Evaluation metrics tive ordinal relationship among different age groups First, we pretrain a base network with 26,580 image For multiple age estimation, we compared the features samples from the unfiltered faces dataset 19 The age learned by rankingcnn with the ones obtained througl group labels for these images are used in training as sur BIF+OLPP [15], ST[2], and multiclass CNN. BIF features rogate labels 18. Then, we finetune our rankingcnn are implemented with Gabor filters in 8 orientations and 8 model on the most commonly used age estimation bench scales and followed by maxpooling. In addition, OLPP is mark dataset: MORPH Album 2 [30]. MORPH contains employed to learn the age manifold based on BIF features 55, 134 facial images with the age range from 16 to 77. Fol in which the top 1,000 eigenvectors are used. In St, the ga lowing the settings used in some recent work on age esti bor coefficients are scattered into 417 routes in two convo mation [29,3743 we randomly select 54, 362 samples lutional layers and pooled with gaussian smoothing. multi in the age range between 16 and 66 from mOrPh dataset. class cnn is commonly used for age estimation [25 39 Table 2. Comparison of MAe among different combinations of features and estimators. The lowest MAE is highlighted in bold. A dash in the table means that the selected feature is not applicable to the selected estimator. ENGINEERED FEATURES LEARNED FEATURES BIF+OlPP CNN FEATURE RANKINGCNN FEATURE CLASSIFICATION SVM 4 00 5.15 MODEL MULTICLASS CNN 3.65 RANKIN RANKINGSVM 5.03 4.88 3.63 MODEL RANKINGCNN 2.96 but it completely ignores the ordinal information in age la features (i.e., BIF+OLPP and ST),learned classification bels. Its structure is similar to a basic cnn (three convolu features (Multiclass CNN) and learned ranking features tional and pooling layers and three fully connected layers rankingCNN), and two sets of age estimators: classifica with the exception that the last fullyconnected layer con tion methods (i.e, SVM and MulticlaSs CNN) and rank tains multiple outputs corresponding to the number of ages ing methods (rankingCNN and rankingSVM). We report to be classified instead of the binary ones. As for the age es MAE of all possible combinations of feature extractors and timators, SVM is selected for comparison due to its proved age estimators (eight in total) in Table 2 A dash in the table performance [15 In rankingbased approach(Ranking means that the selected feature set is not applicable to the SVM), following [2, SVM is used as the binary classifier selected estimator for each age label and the results are aggregated to give the As shown in Table 2 rankingCNN with its features final output achieves the lowest mae of 2. 96 in all the combinations The comparison and evaluation of different methods in RankingCnn features with Ranking SVM achieves the our experiments are reported in terms of accuracy of each second best mae result and this validates the effectiveness binary ranker as well as two widely adopted performance and generality of rankingCNN features. In comparison, the measures [29 2]: Mean Absolute Error(MAE)and Cumu lowest MAE achieved by the learned classification features lative Score(CS). Mae computes the absolute costs be is 3.65. Note the multiclass CNn represents the commonl tween the exact and the predicted ages the lower the bet used CNNbased age estimation methods [2539 Our ex ter): MAE=EHjei/m, where ei= ilil is the absolute perimental results strongly support the theoretical results cost of misclassifying true label l; to li, and M is the to (ranking v.S. softmax)we presented in Section 3.2.3An tal amount of testing samples. CS indicates the percentage other fact we can see is that the performance of CNNbased of data correctly classified in the range of (liL, I i+L, features gets weakened when combined with SVMbased a neighbor range of the exact age l;(the larger the better ): estimators. The lowest MAE achieved by engineered fea CS(L)=EilleisLI/M, where is the truthtest operate tures is 4.88 by St+rankingSVM. Notice that ST works and L is the parameter representing the tolerance range. better with rankingSVM. and bif+olpp works better with Also, we used paired ttest to demonstrate the statisti SVM. This could be caused by the fact that in the literature cal significance of our empirical comparison. We employ pecific features were manually selected for certain estima paired ttest to determine if rankingcnn significantly out tors to achieve the best performance performs other methods. A twosample tstatistic with un Table 3. Comparison with MRCNN, ORCNN and DEX on the known but equal variance is computed MoRPh dataset. The lowest Mae is highlighted in bold RankingCNN MRCNN ORCNNDEX 4.2. Age Estimation Results MAE 296 3.27 3.34 3.25 In this section, we consider the age estimation problem In Table 3 we compare rankingCNN with the most re in the range between 16 and 66 years old and compare cent age estimation models, i. e, Ordinal Regression with rankingCNN with other stateoftheart feature extractors CNN(ORCNN), Metric Regression with CNN(MRCNN and age estimators. As there are 51 age groups in this age [29] and Deep EXpectation(DEX)[31]. Since the experi range, 50 binary rankers are needed for ranking approaches ments are all carried out on morph dataset and we fol (i.e, rankingCNN and rankingSVM) In our experiments lowed the settings in [29] for data partition, we can directly 43,490 samples(80% of all the randomly selected samples) compare the MAE of RankingCNN with the ones obtained with binary labels are selected to train each basic network by MRCNN, ORCNN and DEX. Clearly, rankingCNN or SVM in rankingCNN and rankingSVM, respectively. outperforms all MRCNN, ORCNN and DEX, and signifi The exactly same set of samples with multiclass labels are cantly improves the performance of age estimation used to train multiclass CNN and SVM, respectively. The The comparison in terms of cs of the eight combina rest 10, 872 samples were used for testing results. All exper tions of features and estimators are given in Fig 图 Clearly iments are carried out with 5fold crossvalidation ankingCNN outperforms all others across the entire range Basically, we have three sets of features: engineered of L(age error tolerance range) from 0 to 10. Specifically, Table 4. T test outcomes of all eight combinations of features and estimators. Numbers #l to #8 correspond to eight compared models in the sequence of: RANKINGiCNN, RANKINGiCNN FEATURE+RANKINGSVM, ST+RANKINGSVM, BIF+OLPP+RANKING SVM. MULTICLASS CNN. CNN FEATURE+SVM. ST+SVM and bif+olpP+svm #3 6 # #1 RANKINGCNN NAN #2 RANKINGCNN FEATURE 6.36e 148 NAN 0.85 tRANKINGSVM #3 ST+RANKINGSVM 000 NAN #4 bf+olpPtraNKINgSvm 1.79e135NAN 800 00.990.81 #5 MULTICLASS CNN 0.14 NAN #6 CNN FEATURE+SⅤM 4.12c276890e184 543c24NAN1 #7 ST+SVM 0 0 194e1212.00e 0 0 nan 3. 66e #8 bif+olpp+svm 0 4.56e900.18 0 0.99 NAN our results clearly illustrated the remarkable improvement of using rankingCNN for age estimation Last, to demonstrate that the experimental results we ob tained do not happen simply by chance, we report in Table 4 the pvalues from paired ttest at significant level 1%.In RANKINGNN FFAT IRF.RANKING BIF+OLPP.RAVKINGSVM Table 4 if p< 1%0, we reject the null hypothesis. Other MULTICLASS CNN CNM FEATURE· wise, we dont. For example, when comparing"ranking CNN'withrankingCNn featuretranking sv,, the p value 6.36e148 is much less than 0. 01. which means that we reject the null hypothesis that "the performance of Age error tolerance rankingCNN is not significantly improved". The"NaN Figure 3. Comparison on Cumulative Score with L in[0, 10 in the table means we could not compare a method with itself. As we can see, statistically, rankingCNN signifi RankingCNN can reach the accuracy of 89.90%c for L=6 cantly outperforms all other methods, which implies if we and 9293%o for L=7. The other fact we notice is that four repeat the experiments for numerous times, then in 99%o of CNNbased methods reach a higher accuracy for L10 those experiments, rankingCnn would significantly out than the others perform. From the table, RankingCNN Feature+Ranking SVM and the MultiClass Cnn tied for the second place followed by cnn Feature+sVM. st+Ranking svm stands out among the engineered featurebased methods Lastly BIF+OLPP+RankingSVM ties with BIF+OLPP+SVM and ST+SVM has no significant improvement than any other method 5. Conclusion RANKING S\M In this paper, we proposed rankingCNN, a novel deep 618202224262830323436 ranking framework for age estimation. We established a Figurc 4. accuracy of cach binary ranker in ranking modcls. much tighter error bound for rankingbased age estimation and showed rigorously that rankingCNN, by taking the g.图 we further compare the four rankingbased ordinal relation between ages into consideration, is more methods and report their performance on each binary likely to get smaller estimation errors when compared with ranker. Again, rankingcnn demonstrates a consistent multiclass classification approaches. Through extensive outstanding performance throughout all binary problems. experiments, we show that statistically, rankingCNN sig Note that when the data for the binary rankers are not bal nificantly outperforms other stateoftheart age estimation anced(and thus higher baseline accuracy, e. g, age< 20 and methods on benchmark datasets age> 48), all rankers seem to perform quite well. However, when it comes to the age range with more balanced data Acknowledgment This work was partially supported by (and thus lower baseline accuracy, age 2048), the supe US National Science Foundation (NSF)under grant CNS rior performance of rankingCNN is shown, and this would 1637312, and by Ford Motor Company University Research lead to better overall performance of age estimation. again Program under grant. 20159186R References on Computer Vision and Pattern Recognition(CVPR), pages 112119.2009 [1] T. Ahonen, A Hadid, and M. Pietikainen. Face description [16] S Ioffe and C. Szegedy. Batch normalization: Accelerating with local binary patterns: Application to face recognition deep network training by reducing internal covariate shift IEEE Transactions on Pattern Analysis and Machine intelli arXiv preprint ar Xiv: 1502.03167, 2015 gence,28(12):20372041,2006 [17] YJia, E. Shelhamer, J. Donahue, SKarayev, J. Long, R Gir [2]K.Y Chang and C s. Chen. A learning framework for age shick. S. Guadarrama. and T. Darrell. Caffe: Convol rank estimation based on face images with scattering trans tional architecture for fast fcaturc embedding. arXiv preprint form. /FEE Tran saction s on Image Processing, 24(3): 785 arXiv:l408.5093,2014 798,2015 [18 N. Kalchbrenner, E Grefenstette, and P. Blunsom. A con [3] K.Y. Chang, C.S. Chcn, and Y.P. Hung. Ordinal hyper volutional neural network for modelling sentences. arXiv planes ranker with cost sensitivities for age estimation. In preprint arXiv: 1404.2188, 2014 IEEE Conference on Computer Vision and Pattern recogni [19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet tion(CVPR), pages 585592. IEEE, 2011 classification with deep convolutional neural networks. Ir [4]K Chen, S Gong, T. Xiang, and C. Change Loy. Cumula Advances in Neural Information Processing Systems, pages tive attribute space for age and crowd density estimation In 10971105,2012 IEEE Conference on Computer Vision and Pattern Recogni [20] Y II. Kwon and N. D. V Lobo. Age classification from fa tion(CVPR), pages 24672474, 2013 cial images. In IEEE Computer Society Conference on Com 5R. Collobert, J. Weston, L. Bottou, M. Karlen puter Vision and Paltern Recognition (CVPR), pages 762 K. Kavukcuoglu, and P. Kuksa. Natural language pro 767.1994. cessing(almost)from scratch. Journal of machine learning [21 A. Lanitis, C. Draganova, and C. Christodoulou. Compar Research,12:24932537.Nov.2011 ing different classifiers for automatic age estimation. IEEE [6] T.F. Cootes, G.J. Edwards, and C J Taylor. Active appear Transactions on Systems, Man, and Cybernetics, Part B(Cy ance models. IEEE Transactions on Pattern Analysis and hermetics),34(1):621628,2004 Machine intelligence, 23(6): 681685, 2001 [22]A. Lanitis, C.J. Taylor, and T. F. Cootcs. Toward auto [7] 3. Deng, W. Dong, R. Socher, L.. Li,K. Li, and L. Fei matic simulation of aging effects on face images. IEEE Fei. Imagenet: A largescale hierarchical image database. In Transactions on Pallern Analysis und Machine intelligence, IEEE Conference on Computer Vision and Pattern recogni 24(4:442455,2002 tion(CVPR), pages 248255. IEEE, 2009 [23] L Le Cam et al. An approximation theorem for the pois 8 A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and son binomial distribution. Pacific jMath, 10(4): 11811197 T. Brox. Discriminative unsupervised feature learning with 1960 convolutional neural networks. In Advances in Neural Infor 124 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient mation Processing Systems, pages 766774, 2014 bascd learning applicd to document recognition Proceed [9] E. Eidingcr, R. Enbar, and T. Hassncr. Agc and gender cs ings of the IEED,86(11):22782324,1998. timation of unfiltered faces. IELL Transactions on Informa [25 G. Levi and T. Hassner. Age and gender classification us lion Forensics and Securily, 9(12): 21702179, 2014 ing convolutional neural networks. In Proceedings of the [10 X Geng, Z.H. Zhou, and K. SmithMiles. Automatic age IEEE Conference on Computer vision and Pattern recogni estimation based on facial aging patterns. IEEE Transactions tion Workshops(CvPRW), pages 3442, 2015 on Pattern Analysis and Machine intelligence, 29(12): 2234 [26] C. Li, Q. Liu, J. Liu, and H. Lu. Learning ordinal dis 2240.2007 criminative features for age estimation. In IEEE Conference [11 X. Geng, Z.H. Zhou, Y. Zhang, G. Li, and H. Dai. Learning on Computer Vision and Pattern Recognition(CVPR), pages 25702577.IEEE,2012. from facial aging patterns for automatic age estimation. Ir [27]J. Long, E. Shelhamer, and T Darrell. Fully convolutional Proceedings of the 14th annual aCM International Confe networks for semantic segmentation. In IEEE Conference ence on Multimedia, pages 307316. ACM, 2006 on Computer vision and Pattern Recognition(CVPR), pages [12R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea 34313440,2015 ture hierarchies for accurate object detection and semantic [28 V. Nair and G. E. Hinton. Rectified linear units improve segmentation In IEEE Conference on Computer Vision and restricted boltzmann machines. In Proceedings of the 27th Pattern Recognition(CVPR), pages 580587, 2014 International Conference on Machine learning, pages 807 3] A. Gunay and V. v Nabiyev. Automatic age classification 814.2010 with lbp. In Computer and Information Sciences. ISCIS08 29 Z Niu, M. Zhou, L. Wang, X Gao, and G. Hua. Ordinal 23rd International Symposium on, pages 14. IEEE, 2008 regression with multiple output cnn for age estimation. In [14 G. Guo, Y Fu, C.R. Dyer, and T.S. Huang. Imagebased IEEE Conference on Computer Vision and Pattern recogni hunan age estimation by manifold learning and locally ad tion(CVPR), June 2016 justed robust regression. IEEE Transactions on Image Pro [30] K. Ricanek Jr and T. Tesafaye Morph: A longitudinal image cassin,17(7):11781188,200 database of normal adult ageprogression. In /th Interna [15 G. Guo, G. Mu, Y Fu, and T.s. huang. human age es tional Conference on Automatic Face and Gesture recogni timation using bioinspired features. In IEEE Conference tion, pages 341345, 2006 31R. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer vision, pages 114,2016 [32E D. Sontag. Vc dimension of neural networks. NATO ASI Series F computer and Systems Sciences, 168: 6996, 1998 [33 N Srivastava, G. E. Hinton, A. Krizhevsky, I Sutskever, and R Salakhutdinov. Dropout: a simple way to prevent neu ral networks from overfitting. Journal of Machine learning Research,15(1):19291958.2014 [34]Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface Closing the gap to humanlevel performance in face verifica tion. In IEEE Conference on Computer vision and Pattern Recognition(CVPR). pages 17011708, 2014 [35] V.N. Vapnik and V. Vapnik. Statistical learning theory, vol ume 1. wiley new York. 1998 [36] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Soci ety Conference on Computer Vision and Pattern recognition (CVPR), volume 1, pages 1511. IEEE, 2001 [37] X. Wang, R Guo, and C Kambhamettu. Deeplylearned fea ture for age estimation. In Applications of Computer vision 2015 IEEE Winter Conference on, pages 534541. IEEE 5. [38] Z. Yang and H. Ai. Demographic classification with local binary patterns. In International Conference on Biometrics, pages 464473. Springer, 2007 [39] D. Yi, Z. Lei, and s.Z.Li. Age estimation by multiscale convolutional network. In Asian Conference on Computer Vision, pages 144158. Springer, 2015

下载
可用于FAQ问答系统的语料集
可用于FAQ问答系统的语料集

下载
MD300模块规格书.rar
MD300模块规格书.rar

下载
20210416长江证券绝对收益（六）：2021年指数分红预测与基差监控工具.pdf
20210416长江证券绝对收益（六）：2021年指数分红预测与基差监控工具.pdf

下载
20210418招商证券海康威视00241521Q1业绩超预期，加码创新把握三年机遇期.pdf
20210418招商证券海康威视00241521Q1业绩超预期，加码创新把握三年机遇期.pdf

下载
visual c++_aes加密
visual c++_aes加密

下载
20210418天风证券汽车行业：一文看透汽车景气现状.pdf
20210418天风证券汽车行业：一文看透汽车景气现状.pdf

下载
互联网+智慧医院数字化平台解决方案.pptx
互联网+智慧医院数字化平台解决方案.pptx

下载
当虹科技：杭州当虹科技股份有限公司2020年年度报告.PDF
当虹科技：杭州当虹科技股份有限公司2020年年度报告.PDF

下载
20210417开源证券财富管理行业深度研究系列一：以海外财富管理为镜，照我国发展转型之路.pdf
20210417开源证券财富管理行业深度研究系列一：以海外财富管理为镜，照我国发展转型之路.pdf

下载
20210418国泰君安普门科技688389首次覆盖报告：电化学发光发力在即，高增长可期.pdf
20210418国泰君安普门科技688389首次覆盖报告：电化学发光发力在即，高增长可期.pdf