Deep Learning for Image Memorability Prediction the Emotional Bias.pdf

所需积分/C币:9 2016-11-16 20:37:05 540KB PDF
收藏 收藏

Table 2: Influence of the object and scene semantics com 3.30 min) in order to measure memorability corresponding to posed of over 100 pictures on the prediction performance of a long-term memory performance. Whenever the space bar MemoNet 30k. The size and the mean of the memorability Is pushed image is framed by a green rectangle to show scores(GT) for each category are also indicated the participants that their answer is taken into account. Re- sponses that may occur during the 1 second-long inter-stimuli Rank Category Size GT black frame following the target image are also considered 1650.7530.655 The task ng phase to familiarize th 2 5540.7250.628 ats with the task 3 1080.6670.613 mountain 2720.5930.606 4.1.2 Emotional rating task painting 1010.6960.593 9890.7180.581 An emotional rating task was set up to collect arousal 7 window 5890.6620.580 and valence scores for 100 images displayed for 6 seconds 8 table 2120.7030.580 Before the display of each image, the participant is invited 9 sign 1470.6640.572 to get prepared for the rating process of the next image. The 3610.6680.570 ratings are collected on the 9-point Self-Assessment Manikin chair 2680.7120.564 SAM) scales for arousal and valence 6, which is a powerful 12 1810.6560.554 and easy to use pictorial system regardless of age, educational 13 10800.6280.550 or cultural background due to its non-verbal design. The 14 8140.6:300.548 images are randomly displayed, for a total task duration 15 lant 4170.6400.543 60 minutes. Similarly to the memorability task, 16 foor 7660.7270.537 the emotional rating task starts with instructions, followed 2690.6370.521 y a training phase composed of training images spanning 18 5710.7130.514 the entire arousal-valence emotional space to familiarize the 19 1510.6310.511 participants with the task but also with the rating scales 20 2970.6470.497 21 sidewalk 1630.64:30.493 4.1.3 Procedure 6990.6300.492 3 ceiling lamp 2890.7130.491 The images were displayed on a. 40 inch monitor (TV 24 be 1210.7190.489 LOGIC LVM401)with a display resolution of 1, 920X 1, 080 25 150.6 0.477 The participants were seated at a distance of 150 cent meters 410.6300.464 from the screen(three times the screen height). The 1,024 27 1920.6490.464 x 768 images were centered on a black background: at a viewing distance of 150 cm. the stimuli subtended 18.85 degrees of vertical visual angle. Fifty participants(18-41 4.1 Memorability for IAPS Images years of age; mean= 22.54; SD=5.01; 60%o of them female compensated for their participation were recruited in Nantes, A subset of 150 images randomly selected from the In- France. All participants have either normal or corrected ternational Affective Picture System(IAPS) dataset [19 is to-normal visual acuity. Correct visual acuity was assured used in this work to analyze the performance of MemoNet prior to this experiment through near and far vision tests for emotional images. Affective scores are available for these using Parinaud and Monover charts respectively. The first inages in terms of valence, arousal, and dominance 14, 19 experimental phase was then launched, corresponding to the Valence ranges from negative(e. g, sad, disappointed )to memory task. The next day, participants performed the positive (e.g, joyous, elated), whereas arousal can range emotional rating task. For each task, displayed images were from inactive(e.g, tired, pensive) to active (e. g, alarmed selected to ensure that each memorability score is generated angry), and dominance ranges from dominated (e. g, bored from at least 16 annotations and that each affective score is ad)to in control(e.g, excited, delighted) generated from at least 32 annotations The experimental protocol presented by Isola et al. [13 is reproduced in a laboratory environment to collect memora- 4.2 Result ts bility scores for each of the 150 selected images. Participants From Section 4.1, a set of 150 images selected from IAPS had to perform a memory task and an emotional rating task with memorability and emotional scores is created. The 4.1.1 Memory task 150 images are pre-processed in order to be given as input to Memo net 30k. Indeed. MemoNet 30k is fed with the The memory task consists of a memory encoding phase resized 224 x 224 center crop of each image. Black bands interlaced with a recognition memory test. The task instruc- added to several pictures by Lang et al. 19 to obtain th tion is to press the space bar whenever an image reappears same ratio for each image are removed before cropping the in a sequence of images. a black frame is displayed between image. As expected, the global performance of MemoNet each image for 1 second. During the task, 50 targets(i.e, 30k for this new dataset is lower than the performance of the images repeated once selected from the subset of 150 images model measured on the dataset created by Isola et al. 13 and 200 fillers are displayed, each of them being displayed for (p-0. 251; MSE-0033 2 seconds. The fillers, composed of other images randomly A local performance of MemoNet 30k is defined as the selected from IAPS, provide spacing between the first display difference between the ground truth memorability score and of a target image and its repetition. Images are di the mean of the predictions from the 25 trained models for pseudo-randomly: the spacing between a target image and an image. Table 3 shows the rank correlation between the its repetition has to be separated by at least 70 images(i.e, local performances of MemoNet 30k and the affective scores 493 Table 3: Spearmans Rank Correlation Coefficient (p) be- tween the local performances of memonet 30k for the subset of IAPS images and their affective labels(p<.05: **p <.01 ∴ Dimension Dat a origin Lang et al.[19-0.285 Valence Ito et al. [14] 0.248** Ours -0.284 Lang et al. [19 0.096 arousal 0.222x Arousal Ours 0.198* (a)Ⅴ alence scores (b) Arousal scores Dominance Lang et al.[19-0.221 collected either in the experiment detailed in Section 4.1,or in previous work [14, 19. These correlations measure the elation between the considered al dimension and the fact that the model predicts a memorability score higher or lower than the ground truth. Please note that the local per e formances using Ito et al.'s data are computed on the subset of images from IAPS sclected both in this work and Ito et al's work, corresponding to 104 images. The local perfor 6789 mances of Memo nct 30k and the cmotional scores collected (c)k-means clustering result in this work arc shown in Figurc 1. Thc rank correlations exhibit a moderate, but coherent, rclationship between the Figure 1: Local performance of MemoNet 30k with(a)va performancc of McmoNct 30k and the valence, arousal and lence and(b) arousal scores collected in our experiment for dominance scores for the cmotional ratings collected in both each of the 150 images selected from IAPS, as well as(c)the provious work[14, 19] and our work. Valence is negatively result of the k-means clustering(k correlated with the local performances of Memo ct 30k, dominance exhibits a negative corrclation with the local per a memorability model depends on the emotions induced by formances of mcmonct 30k. Valence and arousal account the images used to train the model. The results also suggest for most of thc independent variance 11, 18. Conscquently that emotional information could be a valuable feature to dominance is not taken into account in the following analvsis increase the performance of the model for neutral and positive To provide a deeper analysis of the relationship between pictures, especially as it is possible to computationally infer the accuracy of the predicted memorability scores and the emotional information from pictures [21, 25 affective properties of the pictures collected in this work the k-means clustering algorithm is used to separate the 150 images into three clusters in the valence-arousal space(see 5。 CONCLUSIONS Figure 1(c)). The three clusters separate the images inducing The study reported in this paper focuses on the image arousing and negative emotions(cluster 1) from the images memorability prediction using deep learning. The proposed inducing neutral emotions(cluster 2 )and from those inducing model significantly outperforms previous work and obtains a moderately arousing and positive emotions(cluster 3). It 32.78% relative increase in performance compared to the best- is important to note that arousal and valences scores of the performing model from the state of the art. An experimental 150 images are significantly correlated (r=-0.522, t(148) protocol has also been set up to collect memorability and 7.445,p <.0001). A one-way ANOVA revealed that the emotional scores in a laboratory environment. However, th local performances of Memonet 30k is significantly different generalization of the performance of the deep learning model for the three clusters(F(2, 147)=5.82: p <.005). The to this new dataset is a mitigated success. In particular, Tukey's multiple comparison test confirmed that the local an emotional bias appears to influence the performance of performances for the pictures in the first cluster is in average the proposed mode the deep learning framework obtains a loser to zero -i.e. the optimal performance-(mean higher predictive performance for arousing negative pictures 0.0298 )than for images in the second (mean --0.0536) than for neutral or positive ones. This underlines the impor or third(mean --0.0847) clusters. In other words, this tance for an image dataset used for memorability prediction result shows that MemoNet 30k has the highest predictive to consist in images appropriately distributed within the performance for arousing negative pictures. For the other emotional space groups(i.e, neutral and positive) the model is less reliable Because memorability is also subjective, the memorability to predict the memorability prediction is doomed to inaccuracy if one is only interested The results suggest that affect should be taken into account in the intrinsic information of the images. Our current work in datasets of images labeled with memorability scores to focuses on the integration of context-dependent and observer ensure they induce a large variety of emotions. Indeed, dependent information for the purpose to personalize the emotion and memorability being related, the performance of memorability prediction 494 6. REFERENCES 5 E. A. Kensinger, B. Brierley, N. Medford,J. H [1 W. C. Abrahann and A. Robins. Menory retention-the Growdon, and S Corkin. Effects of normal aging and synaptic stability versus plasticity dileInInla. Trends irt alzheimer's disease on emotional memory. Emotion euroscience,28(2):73-78,2005 2(2):118134,2002. Abrisqueta- Gomez, O. F. A. Bueno, M.G. M [16 T Konkle, T. F. Brady, G. A. Alvarez, and A Oliva Oliveira, and P.H. F. Bertolucci. Recognition memory Scene memory is more detailed than you think the role for emotional pictures in alzh Ss patients. Acte of categories in visual long-term memory. Psycholog ca. Neurologica Scandinavica, 105(1): 51-54, 2002 science,21(11):1551-1556,2010 3 B. Ans and S. Rousset. Neural networks with a [17 A. Krizhevsky, I. Sutskever, and G. E. Hi inton self-refreshing memory: knowledge transfer i Imagenet classification with deep convolutional neural sequential learning tasks without catastrophic networks. In F. Pereira, C. J. C. Burges, L. Bottou forgetting. Connection science, 12(1): 1-19, 2000 and K. Q. Weinberger, editors, Advances in Nearal 4W. A. Bainbridge, P Isola, and A Oliva. Tho intrinsic Information Processing Systems 25, pages 1097-1105 memorability of face photographs. ournal of Curran Associates, Inc, 2012 Experimental Psychology: General, 142(4): 1323 1334, [18 P.J. Lang, M. M. Bradley, and B. N. Cuthbert 2013 International affective picture system(iaps): Technical 5 M. M. Bradley, M. K. Greenwald, M. C. Petry, and manual and affective ratings. NIMH Center for the P.J. Lang. Remembering pictures: pleasure and Study of Emotion and Attention, pages 39-58, 1997. arousal in memory. Journal of experimental psychology: [19 P.. Lang, M. M. Bradley, and B N. Cuthbert Learning Menory, and Cognition, 18(2): 379-390. 1992 International affective picture systen(iaps ): Affective 6M. M. Bradley and P J. Lang Measuring emotion: the ratings of pictures and instruction manual. Techrical self-assessment manikin and the semantic differentia r℃prtA-8,2008. Journal of behavior therapy and eperimental 20 Y. LeCun, B. Boser,J S Denker, D. Henderson, R. E psychiatry,25(1):49-59,1994 Howard, W. Hubbard. and l.d. jackel [7 L. Cahill and J. L. McGaugh. A novel demonstration of Backpropagation applied to hand written zip code enhanced memory associated with emotional arousal recognition. Neural computation, 1(4): 541-551, 1989 Consciousness and Cognition, 4(4): 410-421, 1995 21 N. Liu E. Dellandrea, B. Tellez, and L Chen [8 M.J. Choi, J.J. Lim, A. Torralba, and A S. Willsky Associating textual features with visual ones to Exploiting hierarchical context on a large database of improve affective image classification. In ith object categories. In 2010 IEEE Conference on International conference on A ffective Computing and Computer Vision and Pattern Recognition(CVPR) Intelligent Interaction, pages 195-204. Oct 2011 129-136,June2010 22 M. Mancas and O. L. Mcur. Memorability of natural 9 R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich scones: Thc rolc of attention. In 2013 20th IEEe feature hierarchies for accurate object detection and International Conference on Image Processing(ICIP) semantic segmentation. In The IEEE Conference on pages196-200,Spt2013 Computer Vision and Pattern Recognition (CVPR) 23 J. L. McClelland, B. L. McNaughton, and R. C pages 580-587, June 2014 O'Reilly. Why there are complementary learning 10X. Glorot and Y. BengiO. Understanding the difficulty systems in the hippocampus and neocortex: insights of training deep feedforward neural networks. In from the successes and failures of connectionist models International conference on artificial intelligence and of learning and memory. Psychological review, 249-256,2010 102(3):419457,1995 11 M. K. Greenwald, E. W. Cook, and P. Lang 24C Szegedy, W. Liu, Y. Jia, P. Se S. Reed Affective judgment and psychophysiological response D. Anguelov, D. Erhan. V. Vanhoucke, and mensional covariation in the evaluation of pictorial A. Rabinovich. going deeper with convolutions. In stimuli. Journal of psychophysiology, 3(1): 51-64, 1989 IEEE Conference on Computer vision and Pattern 12 P. Isola, J. Xiao, D. Parikh, A. Torralba, and A Oliva Recognition(CVPR), pages 1-9, June 2015 What makes a photograph memorable? IEEE 25W. Wang and Q. He. A survey on emotional semantic Transactions on Pattern Analysis and Machine image retrieva L. In 2008 15th /FEE International Intelligence,36(7):1469-1482,2014 Conference on /mage Processing, pages 117-120. Oct 13 P Isola, J. Xiao, A. Torralba, and A Oliva. What 2008. makes an image memorable? In IEEE Conference on 26J. Xiao, J. Hays, K. A Ehinger, A Oliva, and Computer Vision and Pattern Recognition(CVPR) A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE 14 T. A Ito, J. T Cacioppo and P.J. Lang. Eliciting Conference on Computer vision and Pattern affect using the international affective picture system ecognition(CVPR). pages 341853192, June 2010 Trajectories through evaluative space. Personality and Social Psychology Bulletin, 24(8: 855-879, 1998 495

试读 5P Deep Learning for Image Memorability Prediction the Emotional Bias.pdf
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
  • 分享王者

关注 私信
Deep Learning for Image Memorability Prediction the Emotional Bias.pdf 9积分/C币 立即下载
Deep Learning for Image Memorability Prediction the Emotional Bias.pdf第1页

试读结束, 可继续读1页

9积分/C币 立即下载