A survey: facial micro-expression recognition

所需积分/C币:50 2018-09-09 08:57:56 1.37MB PDF
收藏 收藏

综述人脸微表情识别的定义和研究现状, 总结微表情识别中的一些关键技术, 探讨潜在的问题和可能的研究方向.
Multimed Tools Appl(2018)77: 19301-19325 19303 clue for lie detection. For example, in situations when the suspects are being questioned,a micro-expression fleeting across the face can tell the police that the criminal is pretending to be innocent. It can also benefit the border security officers for identifying suspicious behavior of the individuals during usual interviews of checking for potential dangers. In the study of psychotherapy, micro-expressions have been proved very helpful in understanding genuine emotions of the patients. Micro-expression recognition systems are sometimes also used as an additional module for user authentication [44]. In other fields, such as marketing, distance learning. and many more. micro-expressions can be used as recognition to reflect human reactions and feedback to advertisements, products, services and learning materials This paper was compiled with the intention of providing a comprehensive survey of the existing micro-expression recognition methods along with their outcomes, to offer researchers a convenient introduction to the recent developments in this domain The remaining paper is organized into five sections. Section 2 highlights the factors that affect the recognition accuracy of micro-expressions. Section 3 discusses existing methods of micro-expression recognition. We further discuss the specifications and properties of the current micro-expression databases in Section 4. Section 5 gives a comparison of the existing studies conducted on different databases. Finally, Section 6 points out the challenges, open issues and the future directions in micro-expression recognition 2 Factors influencing recognition of micro-expressions Micro-expression is contained in the flow of expressions when individuals are trying to repress their emotions. According to studies in [45, 66], it is observed that certain factors affect recognition of micro-expressions 2.1 Emotional context The existing studies have employed neutral expressions before and after the emotion. The research indicates that micro-expressions may be embedded not only in neutral expressions but also in other facial expressions such as sadness and happiness. according to the emotional regulation theory [66], in the priming task, primes presented for longer duration may lead to greater priming effect. Moreover, it is observed that emotional information influences attention [66]. The aims of this research are: 1) to investigate the effect of emotional context on micro- expressions; 2) to explore if the effect of the context was limited to a particular material, and 3) to investigate the reason of the effect. The findings will lead researchers to predict that the emotional context would indeed influence micro-expression recognition 2.2 Duration of expression The significant difference between a micro-and a macro-expression is the length for which the expression lasts. There have been many different estimates of the duration of a mi expression Thus, there is still a lack of consensus about the time range of the length of a micro-expression. Although the difference in duration might not be significantly noticeable for micro-expressions it needs to be taken into account To verify the effect of duration on micro-expressions recognition, the researchers conducted wo experiments [45] asking the participants to recognize the micro-expressions in the images 9 Springer 19304 Multimed Tools Appl(2018)77: 19301-19325 shown to them. In Experiment 1 expression images were shown to participants for 40, 120, 200 or 300 ms. The researchers employed brief Affect Recognition Test (bart) for Experiment 1 In Experiment 2 the participants were given the micro-expression recognition training using the Micro Expression Training Tool (METT) paradigm which played a significant role in recognition of the micro-expressions. The outcome of the experiments indicated that the participants could recognize the micro-expression in the images in 200 ms without training and 160 ms after training. The results suggest that the critical time point that differentiates micro-expressions was about 200 ms or less. Thus. in conclusion, the accuracy of the micro- expression recognition is a function of the duration of the expressions 3 Existing methods on micro-expression recognition Micro-expression recognition systems are developed by considering many factors and param eters. Many studies have been undertaken and still undergoing in delivering better recognition accuracy. In this paper, we break down the micro-expression recognition systems into their funda mental components, as shown in Fig. 1, including face detection, pre-processing, facial feature extraction classification and databases We discuss the role of each element in detail as below 3.1 Face detection Face detection is the primary stage of the recognition process. Human face(s)is located in the digital images or image sequences. This step is useful for selecting the region(s) of interest (ROD in the images or selects roi in the first frame and track the face in the remaining frames In case of image sequences e There are several face detection methods enforced till date [26, 43, 48-50 Some of the face detection techniques are summarized here. Mohammad Yeasin et al. [65] used automated face detection method to segment the face region which was based on the work of rowley et al. [43]. M. Matsugu et al. [38] adapted convolutional neural network for detecting the face, and the rule-based algorithm is used for classification Viola et al. [49 introduced the first framework to provide competitive detection rates in real-time since 2001. This framework is capable of processing images rapidly while achieving high detection rates. The algorithm has four stages: 1) Haar feature selection; 2) Creating an integral image; 3)Adaboost training, and 4) Cascade classifiers. Fig. 2. illustrates the four distinct types of features used in the framework. The value of given feature is the sum of the pixels within white rectangles subtracted from the sum of the pixels within grey rectangles Image/video Face Pre- Facial feature acquisition Detection processIng extracton Micro-expression class Classification Datab Fig. 1 A framework for micro-expression recognition analysis Springer Multimed Tools Appl(2018)77: 19301-19325 19305 B Fig. 2 Feature types used by Viola-Jones [49] The study [8, 22] shows the use of Viola-Jones pre-processing method. The raw image is pre-processed and cropped using Haar feature. They implemented face detection using Haar features for facial features, which are then classified using a Classification And Regression Tree(CART). The Cascadeobject Detector System defined in the Matlab Computer Vision Toolbox is implemented. This object detector has several built-in object detectors (eye, nose and mouth detectors ). This function draws a bounding box around the face in the given image. The image was cropped using this bounding box and then resized to allow for faster processing of obtaining the feature descriptors. Viola-Jones algorithm is also implemented in OpenCV as cvHaarDetectionobjectso or Cascade of Classifiers which is used by researchers in [46] 3.2 Pre-pI rocessing Pre-processing is the common name for operations performed on images at the lowest level. The aim is to achieve improvement of the image data that suppresses unwanted distortions or enhances some features for further processing. The sequences for micro expressions are of very short duration wherein the intensity of the facial movements is low. There are several methods implemented to normalize the input data so that sufficient details about the micro-expression are extracted for further processing. Some of the novel pre-processing methods are discussed below 1) Temporal normalization(TIM) Temporal Interpolation Model (TIM) is used to increase short video lengths [23, 31 40, 41, 54]. TIM uses graph embedding to interpolate images at arbitrary positions within micro-expressions. This interpolation allows inputting a sufficient number of frames to the feature descriptor. TIM is a manifold-based interpolation method that inserts a curve in a low-dimensional space after embedding of an image sequence. In Fig. 3, a micro-expression video is represented as a set of images sampled along the curve creating a low-dimensional manifold by delineating the micro-expression video as a path of the graph with vertices. The interpolated frames are mapped back to a high dimensional space to form the temporally normalized image sequence [34] 9 Springer 19306 Multimed Tools Appl(2018)77: 19301-19325 Recognition←MKL← Fig. 3(a) An example of micro-expression being interpolated through graph embedding;(b) Temporal interpolation method. The video is represented onto a curve along which a new video is sampled [34 2) Integral projection Huang et al. [23 proposed a new framework to obtain the horizontal and vertical projections using the integral projection method based on calculating the difference of images, which helps to preserve the shape attributes of facial images. The integral projection generates a one-dimensional pattern by summing the given set of pixels along a given direction. The integral projections can extract common structure for the same person. In a micro-expression video clip, supposing that a frame is neutral, the difference between neutral face image and the expression image derive new images The new derived facial images help reduce the influence of face identity on recognition methods. The integral projection itself does not describe the appearance and motion of facial images. It is, therefore combined with feature extraction method, e.g. LBP-TOP (as discussed in Section 3), to get the appearance and motion features. To preserve sufficient information in the process of projection, a new spatiotemporal method based on integral projection is introduced in [23]. Hence, the method is called as Spatiotem poral Local Binary Pattern with Integral Projection (STLBP-IP). Fig. 4 shows the procedure to encode integral projection by using LBP. STLBP-IP achieves state-of- the-art performance compared to TIM 3 Colour space model Colour is a fundamental aspect of human perception, and its effects on cognition and behaviour have attracted interests of many generations of researchers. Recent research revealed that colour might supply useful data for face recognition Wang et al. [51 demonstrated a Tensor Discriminant Color Space (TDCs) model that uses a 3rd-order tensor to represent a color facial image To make the model Face alignment Difiference images Integral ID-LBP Projection istogra Horizontal d.n.L.aM d,n.L.aM Vertical A'uwuhn Fig. 4 The procedure of encoding difference- image based integral projection on the spatial domain [23] Springer Multimed Tools Appl(2018)77: 19301-19325 19307 robust to noise, they [52] also used an elastic net to propose a Sparse Tensor Discriminant Color Space(STDCS). Lajevardi and Wu [28 also addressed a color facial expression image as a 3rd-order tensor and presented that the perceptual color spaces(CIELab and CIEluv) are better overall than other color spaces for facial expression recognition A new color space model called tensor independent color space model (TICs)[55 57 reveals that a micro-expression color video sequence is conceived as a fourth-order tensor, i.e., a four-dimension array. The first two dimensions cater the spatial informa tion; the third delivers the temporal information, and the fourth is the color information Wang et al. [57] transformed the fourth dimension from RgB into TICS, in which color components are as independent as possible. In a color micro-expression video clip, the correlated R, G and B components in RGB space are transformed into a series of uncorrelated components T1, T2 and T3, and extract the dynamic texture features from each uncorrelated component to obtain better results 4) Eulerian video magnification(EVM EVM [8, 29, 31, 59], magnifies small and subtle motion which is impossible to be identified with naked eyes. EVM technique not only magnifies motion but also amplifies color. In eVm, certain spatial locations are selected to expand the variations in the temporal domain. Thus, eVm increases non-periodic movements with smaller magni- des that are exhibited by the face. The extraction of features becomes easier with the magnified motion and colour videos part from the above mentioned pre-processing methods, the traditional methods such as Gaussian filter [32], Gaussian pyramid [56] are also widely used 3. 3 Facial feature extraction For micro-expression recognition, feature extraction is an important critical issue. Recent studies show that spontaneous facial micro-expression analysis has been receiving atten tion from numerous researchers [30, 41] since involuntary micro-expressions can reveal genuine emotions which people try to conceal Face representations can be categorized as spatial and spatio-temporal. Spatial information encodes image sequences frame-by-frame whereas spatio-temporal infor mation considers a sequence of frames within a temporal window as a single parameter and enable modeling temporal variation to represent subtle expressions more efficiently Another classification is based on the type of information encoded in space: shape and appearance. Geometry-based and appearance-based features have been commonly used to examine facial expression recognition Facial feature extraction is a two-step process: 1) Feature detection and 2) Feature extraction 33/ Feature detection A feature is defined as an"interesting" component of an image. Feature detection is a low-level image processing operation which it is usually performed as the first operation on an image and analyzes every pixel to see if there is a feature present at that pixel Some of the feature detection methods are discussed as follows 9 Springer 19308 Multimed Tools Appl(2018)77: 19301-19325 1) Facial Action Coding System(FACS The Facial Action Coding System (facs) is an anatomically based system for compre hensively describing all facial movements. FACS devised by Ekman and Friesen [17] provides an objective means for measuring the facial muscle contractions involved in a facial expression. FACS was developed to allow researchers to measure the activity of facial muscles from video images of faces. Each noticeable component of facial movement is called an Action Unit(AU). Ekman and Friesen [17 defined 46 distinct action units, each of which corresponds to displacement in a specific muscle or muscle group, and produces facial feature deformations which can be identified in the images 2) Active Appearance Models(AAM) Active Appearance Models(AAM) is a statistically based template matching method, where a representative training set takes the variability of shape and texture. a group of images with landmark coordinates that appear in all of the images is given to the training supervisor Edwards, Cootes, and Taylor [16] were the first to introduce the model in the context of face analysis. The method is widely used for matching and tracking faces and for medical image interpretation [10, 11 The algorithm applies the difference between the current estimate of appearance and the target image to derive an optimization process To match an image the current residuals are measured and use the framework to anticipate changes to the present parameters, leading to a better match. A good overall match is obtained in a few iterations, even from poor startin estimates. AAMs, learn what are the valid shapes and intensity variations from their training set 3) Active Shape Models(asm) Active Shape Model (as algorithm is a fast and robust method of matching a set of points controlled by a shape model to a new image. Cootes et al. [9] proposed the active shape model where shape variability is learned through observation. ASM is again a statistical model of the shape of objects which iteratively deform to fit an example of the object in a new image The technique relies on each object or image structure being represented by a set of points The points can represent a boundary, internal features, or even external ones, such as the center of a concave section of the border. Points are placed in the same way on each of a training set of examples of the object. The sets of points are aligned automatically to minimize the variance in the distance between similar points. By analyzing the statistics of the positions of the labeled points a"Point Distribution Model(PDm)"is derived. The model gives the average positions of the points and has some parameters which control the main modes of variation found in the training set [9] 4) Discriminative Response Map Fitting(DRMF Registering and tracking a non-rigid object has significant variations in shape and appearance DRMF is one of holistic texture based methods, which relies on shape initialization. Moreover, as a discriminative regression-based approach, DRMf performs impressively well in the gene face fitting scenario [3]. DRMf is used to detect a set of facial feature points in the facial region of the first frame in each micro-expression video clip. DRMF located 68 feature points in a facial Springer Multimed Tools Appl(2018)77: 19301-19325 19309 region. With the help of Facial Action Coding System(FACS), 36 Regions of Interest(ROD)are marked, and the face region is partitioned [33, 39]as shown in Fig. 5 5)Optical flow vectors Optical flow infers the motion of objects by detecting the changing intensity of pixels between two image frames over time. Lucas-Kanade [35] method assumes the displace ment of the pixels between two nearby frames is small and nearly constant. Horn- Schunck introduces a global constraint of smoothness to solve the aperture problem The method assumes smoothness in the flow over the whole image, trying to minimize distortions in the flow [21. Optical flow method can be used for the face alignment provide better results than the image-domain-based method [33]. Usually, optical flow is extracted and analysed for cropped and pre-processed images to identify pose and face variations [46]. The research in [39] uses raw images as the input and Total Variation (TV-LI) optical flow estimation to analyse optical flow and discards the head move ments. The integral of Ll norm of the gradient is referred as a Total variation (TV) Hence the name is TV-L1 33.2 Feature extraction Feature extraction involves scaling down the amount of data required to represent a large set of data Extraction of facial features is the most important step in recognition of micro-expressions Many researchers have introduced different feasible features to represent facial characteristics These features are classified as Geometric based features and Appearance-based features Geometric based features Geometric-based features represent face geometry, such as shapes and location of facial landmarks. These representations ignore skin texture. The facial elements or facial feature points are extracted to form a feature vector that represents the face geometry. Some of the recently introduced geometry-based approaches are explained below. 1) Delaunay-based temporal coding model (TCM a b 17 21 Fig 5(a)66 feature points using DRMF;(b)36 regions-of-interest(ROls)[3] 9 Springer 19310 Multimed Tools Appl(2018)77: 19301-19325 Lu et al. [34] proposes the Delaunay-based temporal coding model (dtcm) in which the image sequences containing micro-expressions are normalized temporally as well as spatially based on delaunay triangulation to remove the influence of personal appearance on micro expression recognition. They applied Delaunay triangulation and standard deviation analysis to locate facial sub-regions related to micro-expressions. The variations of textures are converted instead of the feature points movements as is the case in most of the other methods 2) Main Directional Mean Optical Flow(MDMO) MDMO is roi-based. a normalized statistic feature that considers both local static motion information and spatial location [33]. MDMO also reduces feature dimension. In MDMO, it selects the strongest component of roi i.e. the main direction of the optical flow. The face region is divided into 36 regions and spatial coordinates and converted to polar coordinates (2 components). Therefore, the dimension is 36 x 2=72, which is far less then HOOF (36x8x2=576). The HooF dimensions are obtained by trivially applying the hoof feature in each ROI (36 ROls, 8 bins, and 2 components). Histogram of Oriented Optical Flow (HOOF)[7] is used to model the activity profile in each video frame. It captures the optical flow orientation of the features and provides the histogram. An advantage of MDMo is that it does not depend on the number of frames in the image sequence. In comparison to LBP-TOP, MDMO Showcased a better result in micro-expression recognition 3) Facial Dynamics Map(FDM Another novel feature extraction method is called Facial Dynamics Map [62]. FDM is performed after the pre-processing and feature detection steps. Optical flow estimation finely aligns the cropped face image. To detect micro-expressions a more compact representation of dynamics is required Two assumptions, i.e. pixel-level description, and very high frame rate lead to the introduction of FDM. As shown in Fig. 6. there are three major steps firstly facial landmarks points are located and used for face cropping and alignment. Secondly, an optical flow map is extracted for finer alignment. Finally, Facial Dynamics Maps are calculated for each clip for classification Appearance-based features 1) Local Binary Pattern- Three Orthogonal Planes (LBP-TOP) The basic idea behind Local Binary Pattern (LBP)is to compare the center pixel value with the neighborhood pixel values. a binary code is generated by assigning the value one to the greater neighbor pixel value and assigning zero to the rest. The obtained binary code is converted to decimal to get the local binary pattern value of the center pixel. The calculation process of a basic LBP operator can be understood from Fig. 7 Coarse Alignment and Fa Categorizati Fig. 6 Facial Dynamics Map: Adapted from [62 Springer

试读 25P A survey: facial micro-expression recognition
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    关注 私信 TA的资源
    A survey: facial micro-expression recognition 50积分/C币 立即下载
    A survey: facial micro-expression recognition第1页
    A survey: facial micro-expression recognition第2页
    A survey: facial micro-expression recognition第3页
    A survey: facial micro-expression recognition第4页
    A survey: facial micro-expression recognition第5页
    A survey: facial micro-expression recognition第6页
    A survey: facial micro-expression recognition第7页
    A survey: facial micro-expression recognition第8页


    50积分/C币 立即下载 >