[PDF] A comparative study on movement feature in different directions for micro-expression recognition

Abstract

Micro-expression can reflect people's real emotions. Recognizing micro-expressions is difficult because they are small motions and have a short duration. As the research is deepening into micro-expression recognition, many effective features and methods have been proposed. To determine which direction of movement feature is easier for distinguishing micro-expressions, this paper selects 18 directions (including three types of horizontal, vertical and oblique movements) and proposes a new low-dimensional feature called the Histogram of Single Direction Gradient (HSDG) to study this topic. In this paper, HSDG in every direction is concatenated with LBP-TOP to obtain the LBP with Single Direction Gradient (LBP-SDG) and analyze which direction of movement feature is more discriminative for micro-expression recognition. As with some existing work, Euler Video Magnification (EVM) is employed as a preprocessing step. The experiments on the CASME II and SMIC-HS databases summarize the effective and optimal directions and demonstrate that HSDG in an optimal direction is discriminative, and the corresponding LBP-SDG achieves state-of-the-art performance using EVM.

Full PDF

AA Comparative Study on Movement Feature in Diﬀerent Directions forMicro-Expression Recognition

Jinsheng Wei , Guanming Lu ∗ , Jingjie Yan Nanjing University of Posts and Telecommunications, Nanjing (210003), China

Abstract

Micro-expression can reﬂect people’s real emotions. Recognizing micro-expressions is diﬃcult because theyare small motions and have a short duration. As the research is deepening into micro-expression recogni-tion, many eﬀective features and methods have been proposed. To determine which direction of movementfeature is easier for distinguishing micro-expressions, this paper selects 18 directions (including three typesof horizontal, vertical and oblique movements) and proposes a new low-dimensional feature called the His-togram of Single Direction Gradient (HSDG) to study this topic. In this paper, HSDG in every directionis concatenated with LBP-TOP to obtain the LBP with Single Direction Gradient (LBP-SDG) and analyzewhich direction of movement feature is more discriminative for micro-expression recognition. As with someexisting work, Euler Video Magniﬁcation (EVM) is employed as a preprocessing step. The experiments onthe CASME II and SMIC-HS databases summarize the eﬀective and optimal directions and demonstrate thatHSDG in an optimal direction is discriminative, and the corresponding LBP-SDG achieves state-of-the-artperformance using EVM.

Keywords:

Micro-Expression Recognition, Movement Feature, Histogram of Single Direction Gradient,Eﬀective Direction

1. Introduction

Facial expression can be divided into macro-expression and micro-expression. Macro-expressionis an expression which is observed on the face di-rectly, but when people want to hide their realemotions, we cannot infer their real emotions frommacro-expression. When people try to hide theirreal emotions, micro-expression can represent theirreal emotions. Although people’s real emotions canbe inferred from micro-expression, micro-expressioncannot be recognized as easily as macro-expressionbecause it is momentary and minor.Micro-expressions have great research signiﬁ-cance and have been studied in the ﬁeld of psychol-ogy for many years[1, 2, 3, 4, 5]. Micro-expressionhas been proven to be eﬀective in reﬂecting people’sreal emotions, and recognizing micro-expression ∗ Corresponding author

Email addresses: (JinshengWei), [email protected] (Guanming Lu), [email protected] (Jingjie Yan) is valuable for many applications, including liedetection[6, 7], medical diagnosis[5], public safety[8]and so on[9, 10]. Ekman et al. developed the Micro-Expression Training Tool (METT)[11] to train ordi-nary people to recognize micro-expressions in sevencategories. As micro-expression has minor move-ments and a momentary duration, recognizing itis artiﬁcially diﬃcult. In [12], it was shown thatundergraduate students who receive the help ofMETT can only achieve about an accuracy of 40%in detecting micro-expression. Fortunately, the de-velopment of computer vision and pattern recogni-tion has promoted research into automatic micro-expression recognition, and recent research hasshown that automatic micro-expression recognitioncan achieve an excellent recognition rate(from hereon, “micro-expression recognition” below means“automatic micro-expression recognition”).To study micro-expression recognition moreconveniently, many teams have collected micro-expression video databases that are used to ver-ify the eﬀectiveness of their proposed method.

Preprint submitted to Journal of L A TEX Templates February 17, 2021 a r X i v : . [ c s . C V ] F e b icro-expression databases mainly include spon-taneous and non-spontaneous types, and sponta-neous databases are more diﬃcult to collect andmore realistic than non-spontaneous databases. Atpresent, commonly used spontaneous databases in-clude CASME II[13], SMIC[14], SAMM[15] and soon. The recognition rate for these databases hasbeen improved continuously, but micro-expressionrecognition still faces many challenges, and moreeﬀective and robust methods are still needed topromote micro-expression recognition. In this pa-per, an eﬀective feature is proposed to study move-ment feature in diﬀerent directions and improvethe recognition rate, and the proposed feature ob-tains state-of-the-art performance after being con-catenated with LBP-TOP.

2. Related Works

This section introduces the related works thathave been explored in the ﬁeld of micro-expressionrecognition. The recognition of micro-expressionsmainly includes three parts: preprocessing, featureextraction and classiﬁcation. Since the focus of thispaper is feature extraction, the following sectionintroduces preprocessing and classiﬁcation brieﬂyand presents feature extraction in more detail.In the early stage, some simple explorations andattempts were presented regarding preprocessingand classiﬁers, but the two parts are relatively ﬁxed.Many databases use video clips where faces havebeen detected, intercepted and aligned. EulerianVideo Magniﬁcation(EVM) [16] and Time Interpo-lation Model(TIM)[17] also are eﬀective preprocess-ing methods, which have been used widely. The ef-fectiveness of EVM and TIM has been proven in agreat deal of work, but they are treated separately.Recently, Peng et al.[18] combined TIM and EVMto eliminate the side eﬀects caused by the interme-diate process and to obtain a state of the art recog-nition rate. For classiﬁers, to date, AdaBoost[19],Softmax[20], KNN[21] and so on [22, 23] have beenselected, but the most common and eﬀective classi-ﬁer still is the Support Vector Machine (SVM).Feature extraction is a key step in micro-expression recognition, and it has always been aresearch focus. A large number of features havebeen proposed, such as LBP-TOP, LBP-IP, HIGO,etc. These features can be roughly divided intoLBP-based, optical ﬂow-based, gradient-based anddeep learning-based methods.

LBP based features

LBP descriptor was proposed to extract texturefeature from two-dimensional images by Ojala[24].Zhao et al.[25] extended LBP from two-dimensionalimages to three-dimensional videos and obtainedLBP-TOP to describe dynamic texture of facialexpression video. LBP-TOP is very eﬀective forvideo analysis, and the LBP-TOP based featureshave been a research hotspot in micro-expressionrecognition. A large number of teams have madeinnovative work. Inspired by the concept of LBP-TOP, Wang et al. proposed more compact LBP-SIP[26] and LBP-MOP[27]. LBP-SIP removes re-dundant points and only calculates the LBP valueof six points, and LBP-MOP descriptor does notextract the features from all frames but extractsthe features from the average plane. Huang etal.[28] proposed SpatioTemporal Completed Lo-cal Quantization Patterns (STCLQP) and usedeﬀective vector quantization and codebook selec-tion to process extracted sign, magnitude and ori-entation components. After that, they put for-ward Spatiotemporal Local Binary Pattern withIntegral Projection (STLBP-IP[29]) based on dif-ference image; they calculated the diﬀerence im-ages and then extracted LBP features from theintegral projection map of the diﬀerence images.Huang[30] and Zong[22] proposed Discriminativeand Hierarchical STLBP-IP respectively to enhancethe STLBP-IP feature. Wang et al.[31] introducedEVM into micro-expression recognition as a pre-processing step and then extracted LBP-TOP toachieve a satisfactory recognition rate. However,their work does not give the recognition rate un-der the LOSO cross-validation and only tested theirmethod on the CASME II database. optical ﬂow based features

Optical Flow (OF) technology was ﬁrst pro-posed by Horn et al.[32] and proven to be eﬀectivefor micro-expression recognition by several studies.Early, Liong et al.[33] employed Optical Strain (OS)feature to recognize micro-expressions, and in thework[34], the LBP-TOP was weighted by the tem-poral mean-pooled OS map in every region; afterthat, they proposed a Bi-Weighted Oriented Opti-cal Flow (BI-WOOF)[35] that locally and globallyweights HOOF features. To eliminate the inﬂu-ence of noise and illumination changes, Xu et al.[36]proposed the Facial Dynamics Map (FDM) featureto select the principal direction from the optical2ow map. Using the ROI-based OF feature, Liuet al.[37] presented Main Directional Mean Opticalﬂow (MDMO) to choose the main direction in everyROI. Recently, they[38] employed sparse represen-tation technology to enhance the feature represen-tation of MDMO and achieve a promising recogni-tion rate. Furthermore, some works combine op-tical ﬂow and histogram. Zhang et al.[23] aggre-gated the Histogram of the Oriented Optical Flow(HOOF) with LBP-TOP features, and inspired bythe idea of the fuzzy color histogram, Happy etal.[39] proposed Fuzzy Histogram of Optical FlowOrientation (FHOFO) that is robust to the varia-tion of expression intensities. gradient based features

The gradient descriptor can extract the move-ment feature. Histograms of Oriented Gradients(HOG) is an eﬀective gradient feature, and HOG-TOP can be eﬀectively applied to micro-expressionrecognition after extending HOG to three orthogo-nal planes. HOG-TOP was ﬁrst employed by Po-likovsky et al.[40] for micro-expression recognition.In their work, two gradient operators were used tocalculate the vertical and horizontal gradients ofeach pixel in every region of interest (ROI) fromthree orthogonal planes; then, the gradient direc-tion and gradient magnitude were calculated ac-cording to the horizontal and vertical gradients,and the gradient direction weighted by the gradientmagnitude was quantiﬁed; ﬁnally, histogram oper-ation was employed to process these quantized di-rections. Recently, the histogram of image gradientorientation (HIGO) proposed by Li et al.[41] doesnot use the gradient magnitude to weight the gra-dient direction, and their work employed EVM as apreprocessing method and then achieved an excel-lent recognition rate. deep learning based features

Recently, deep learning method has been appliedin many ﬁelds, and Convolutional Neural Network(CNN) is an eﬀective method in the ﬁeld of im-age understanding. Because of the limited sam-ple size, CNN model is diﬃcult to be trained formicro-expression recognition. To solve this prob-lem, the pre-trained VGG model and data enhance-ment technology were employed in the work[42],and then the model was ﬁne-tuned to recognizemicro-expression using the micro-expression apexframe. Kim et al.[20] adopted CNN and the Long Short-Term Memory (LSTM) Recurrent Neu-ral Networks to extract spatial features of ﬁve statesand temporal features respectively. The work[43]directly use CNN to extract the spatial features ofeach frame and input these features into LSTM.3D Convolution Neural Network is a deep learn-ing method for video processing. Li et al.[44] triedthis method in the micro-expression recognition;and in their work, the optical ﬂow maps (horizon-tal and vertical) and gray-scale frames are gath-ered and then input into a designed 3DCNN model.Also, the optical ﬂow maps (horizontal and verti-cal) and the optical strain image are stacked to-gether and input into CNN in work[45]. The aboveworks promoted the application of deep learningin micro-expression recognition, but their recogni-tion rates are not outstanding compared with tra-ditional methods. By incorporating Accretion Lay-ers (AL) in the network, Verma et al.[46] proposeda Lateral Accretive Hybrid Network (LEARNet)that reﬁnes the salient expression features in ac-cretive manner. Their method achieves a excellentrecognition rate, but their experiment don’t adoptsthe mainstream cross-validation method. Further-more, Khor et al.[47] proposed a lightweight dual-stream shallow network that has the form of a pairof truncated CNNs with heterogeneous input fea-tures, and this method obtains state of the art per-formance on CASME II database. Recently, us-ing apex frame in micro-expression video, Song etal.[48] designed a dynamic-temporal stream, static-spatial stream, and local-spatial stream module forthe TSCNN that respectively attempt to learn andintegrate temporal, entire facial region, and faciallocal region cues to recognize micro-expressions,and TSCNN achieves the promising recognition.Also, the traditional methods separate feature ex-traction and classiﬁcation, while the deep learningmethods merge feature extraction and classiﬁcationand get one model to extract features and classifymicro-expressions.The optical ﬂow or gradient-based descriptorscan extract movement features, but which directionof movement is most conducive for distinguishingmicro-expression is still unclear? Extracting themovement feature can be divided into two parts:passive and active. Here, ’passive’ means that theextracted movement feature is based on the cur-rent video data (for example, in a video clip, if thedirection of muscle movement is mainly directionA, the extracted movement feature describes themovement in direction A), while ’active’ means ex-3racting the movement feature in a certain directionfor all video clips (for example, in all video clips, themovement feature is extracted in the speciﬁed direc-tion B.). To study the movement feature in diﬀer-ent directions, the ’active’ method is more suitable.In existing works, both optical ﬂow and gradient-based descriptors adopted the ’passive’ method toextract movement features, so these works cannotbe used directly. Besides, according to the deﬁ-nition of optical ﬂow, the optical ﬂow based de-scriptors cannot be employed to extract movementfeature in a speciﬁc direction. Thus, we innovatedthe gradient-based HOG descriptor to meet this de-mand.For the studied aim, this paper proposes a newlow-dimensional feature called the Histogram ofSingle Direction Gradient (HSDG) that actively ex-tracts movement feature in a certain direction. Theinnovation of HSDG is that it simpliﬁes HOG andextracts the movement feature actively. Speciﬁ-cally, on one hand, HSDG only extracts the gra-dient value in a single direction to ensure that themovement feature in a certain direction are ex-tracted; on the other hand, it does not calculatethe gradient direction based on these gradient val-ues, while it directly quantizes these gradient valuesto reduce the feature dimension. Furthermore, con-sidering that LBP-TOP provides eﬀective appear-ance texture information, and HSDG cannot pro-vide comprehensive feature information, LBP-TOP,as a basic feature, is concatenated with HSDG toobtain LBP with a Single Direction Gradient(LBP-SDG). The contributions in this paper can be sum-marized as follows:1) In this paper, we study movement feature indiﬀerent directions and summarize the direc-tions in which movement feature is more usefulto distinguish micro-expressions. The experi-mental results show that the eﬀective directiontends to be the same under diﬀerent magniﬁ-cation coeﬃcients of EVM.2) For the studied aim, a new feature HSDGis proposed. HSDG descriptor can activelyextract movement feature in a speciﬁc direc-tion. LBP-SDG concatenates HSDG withLBP-TOP and achieves state-of-the-art perfor-mance after using EVM.3) HSDG is a low-dimensional feature, and thecomparative experiments are used. The resultsdemonstrate that HSDG in the optimal direc- tion is a discriminative feature and can provideeﬀective information. Furthermore, HSDG canbe taken as a supplementary feature to improvethe performance of basic features.

3. The Proposed Method

This section introduces our proposed method forthe studied aim. The proposed HSDG featureprovides eﬀective feature information and achievesa competitive recognition rate after concatenatingwith LBP-TOP. The proposed method mainly in-volves preprocessing (EVM, TIM), feature extrac-tion (LBP-TOP, and HSDG). These are describedbelow.

In our work, two eﬀective preprocessing methods(EVM and TIM) were employed, and we introducethe two technologies brieﬂy below.The diﬃculties of recognizing micro-expressionlie in the fact that micro-expressions are tiny facialmovements and have a short occurrence time. For-tunately, EVM can enlarge the muscle movementin the video and has been used widely in micro-expression recognition, so our work also employedthis technology. In EVM, the magniﬁcation factor

Alpha is a key parameter and its value cannot bearbitrarily selected or inﬁnitely large. The optimal

Alpha value is diﬀerent for diﬀerent databases, soit was ﬁne-selected in the experiment. For detailabout EVM, refer to [16]. Collecting spontaneousmicro-expression databases is diﬃcult and the num-ber of collected video frames is uneven. TIM as anormalization method can unify the frames numberto solve the problem about the uneven number offrames. TIM was originally proposed for a lipread-ing system and achieved a very good recognitionrate. For detail about TIM, refer to [17].In our work, the cropped databases that haveoperated preprocessing such as size normalization,face interception, and face alignment were used di-rectly, and apart from EVM and TIM, no otherpreprocessing methods are employed.

Feature extraction is the key step of micro-expression recognition and is also the focus of thispaper. HSDG is proposed for the studied aim, butthe feature information provided by HSDG is in-comprehensive. Therefore, HSDG is concatenated4ith LBP-TOP to obtain LBP-SDG. LBP-TOPand HSDG are introduced in this section.

A large amount of previous work in diﬀerent ﬁeldsshows that LBP-TOP is very eﬀective in video anal-ysis. LBP-TOP can extract texture features fromthree planes: XY, XT, and YT. The appearancetexture features are extracted from the XY plane,while the dynamic texture features are extractedfrom the other two planes. As LBP is the basis ofLBP-TOP, this section ﬁrst introduces LBP for easeof understanding, and then presents LBP-TOP.

LBP

LBP extracts texture features from two-dimensional images. Suppose I is a two-dimensional image, and a point P I = ( X I , Y I ) in I is taken as the center point; also, R XI and R Y I arethe radii on the horizontal and vertical directions,respectively, and N I points around P I are selectedto calculate the LBP value. First, the coordinatesof the N I points are conﬁrmed, and the coordinatesof the i-th point D i are: D i = D ( X I , Y I , R XI , R Y I , N I , i ) = ( X I + R XI ∗ sin(2 πi/N I ) , Y I + R Y I ∗ cos(2 πi/N I )) i = 0 , ...N I − D I . Based on D I , the LBP value of thepoint P I is calculated: LBP ( P I , D I ) = N I − (cid:88) i =0 ε ( g ( D i ) − g ( P I )) ∗ i (2)Where g ( x ) represents the pixel value at point x ;i represents the index of points in D I ; and the ε ( ∗ )function is deﬁned as follows: ε ( x ) = (cid:40) x < x ≥ I are cal-culated, and the histogram of these LBP values isthe LBP feature. LBP-TOP

After presenting the concept of LBP, LBP-TOPis easy to explain. LBP-TOP concatenates these LBP features from the three planes. A three-dimensional video has three coordinate axes (hori-zontal X, vertical Y, time T), and these three coor-dinate axes can form three planes (XY, XT, YT).Suppose V is a three-dimensional video, and a point P V = ( X V , Y V , T V ) in V is taken as the centerpoint; also, R XV , R Y V , and R T V are the radii inthe horizontal, vertical and time directions, respec-tively, and N XY V , N XT V and N Y T V points around P V are selected to calculate the LBP values fromthe plane XY, XT, and YT, respectively. First,the coordinates of the i-th point ( D XY i , D XT i , and D Y T i ) on the three planes are as follows: D XY i = D ( X V , Y V , + R XV , − R Y V , N

XY V , i )(4a) D XT i = D ( X V , T V , + R XV , + R T V , N

XT V , i ) (4b) D Y T i = D ( Y V , T V , − R Y V , + R T V , N

Y T V , i ) (4c)where function D ( ∗ , ∗ , ∗ , ∗ , ∗ ) mentioned in Equa-tion 1; XY i = 0,1,..., N XY V − XT i =0,1,..., N XT V −

1; and

Y T i = 0,1,..., N Y T V − D XY i , D XT i , and D Y T i over all i form threeordered point sets, D XY , D XT , and D Y T , respec-tively. Next, the LBP values (

LBP XY , LBP XT ,and LBP

Y T ) of point P V on the three planes aredetermined by the following equation: LBP XY ( P V ) = LBP ( P V , D XY ) (5a) LBP XT ( P V ) = LBP ( P V , D XT ) (5b) LBP

Y T ( P V ) = LBP ( P V , D Y T ) (5c)Where function

LBP ( ∗ , ∗ ) mentioned in Equa-tion 2.Finally, these LBP features from three planes arecalculated and concatenated to get LBP-TOP. The studied aim in this paper is to determinewhich directions of movement feature are most ef-fective, so we need to actively extract movementfeatures in a single direction. Considering the gra-dient feature can extract the movement feature asdescribed by [49], the proposed HSDG feature em-ploys the gradient method and extracts gradientfeatures in a single direction. Besides, the proposedfeature is inspired by HOG features to calculategradient values in a single direction and then thehistogram of these gradient values. Furthermore,5igure 1: The framework of extracting LBP-SDG. LBP-TOP and HSDG features are extracted from themicro-expression video clips before these two features are concatenated to obtain LBP-SDG.before calculating the histogram, these gradient val-ues are quantiﬁed into several values to reduce thefeature dimension.The diﬀerences between HOG/HIGO and HSDGare as follows: ﬁrst, HOG/HIGO calculates the gra-dient direction according to the calculated horizon-tal and vertical gradient values, while HSDG calcu-lates the gradient value in only a certain direction.Second, HOG/HIGO quantiﬁes the gradient direc-tion into several directions, while HSDG quantizesthe calculated gradient value into several values.Third, overall, HOG/HIGO calculates the gradi-ent direction passively, while HSDG calculates thegradient values in a single direction actively. Thespeciﬁc details of HOG/HIGO can be found in [50]and [41], and the method of calculating HSDG is asfollows.

Directions

Before introducing HSDG, the 18 directions wereselected for testing. In V , we select P V =( X V , Y V , T V ) as the center point, and R X , R Y and R T are the radii in the horizontal, vertical and timedirections, respectively. The point i is selectedin V , and i is from 1 to 18. As the extracted fea-ture is used to describe movement information, R T must not be equal to 0; that is, the points shouldbe selected from the previous or subsequent frames.As shown in Figure 2, six points, four points, andeight points are selected on the horizontal (XT),vertical (YT), and oblique planes, respectively; forexample, point belongs to the horizontal plane. Figure 2: The selected 18 directions. P V representsthe center point. The directions , , , , and lie the horizontal plane (XT),the directions , , and lie the verticalplane (YT), and the other directions lie the obliqueplanes.Then, based on these points, (cid:126)P V i is selected asthe tested direction. Since the directions are num-bered as the same order of point, and i is useddirectly to represent the direction in the subsequent6ext. The selected 18 points form a point set D: D = { ( X V + 0 , Y V + 0 , T V − R T ) , ( X V − R X , Y V + 0 , T V − R T ) , ( X V − R X , Y V + R Y , T V − R T ) , ( X V + 0 , Y V + R Y , T V − R T ) , ( X V + R X , Y V + R Y , T V − R T ) , ( X V + R X , Y V + 0 , T V − R T ) , ( X V + R X , Y V − R Y , T V − R T ) , ( X V + 0 , Y V − R Y , T V − R T ) , ( X V − R X , Y V − R Y , T V − R T ) , ( X V + 0 , Y V + 0 , T V + R T ) , ( X V − R X , Y V + 0 , T V + R T ) , ( X V − R X , Y V + R Y , T V + R T ) , ( X V + 0 , Y V + R Y , T V + R T ) , ( X V + R X , Y V + R Y , T V + R T ) , ( X V + R X , Y V + 0 , T V + R T ) , ( X V + R X , Y V − R Y , T V + R T ) , ( X V + 0 , Y V − R Y , T V + R T ) , ( X V − R X , Y V − R Y , T V + R T ) } (6)Theoretically, the movement features between di-rections - and directions - should beconsistent. For example, the movement features indirection and direction should be similar,since the two directions are on the same line. How-ever, since HSDG and LBP-TOP adopt the samemethod of block division for consistency and theeﬃciency of the extracted feature, the points in thefront and rear R T V frames of a video cannot betaken as the center point. Thus, the points in therear R T V frames cannot be used when extractingHSDG in directions - , while the points inthe former R T V frame cannot be used for extractingHSDG in directions - . Therefore, the move-ment features in directions - and directions - are diﬀerent and are tested separately. Extraction Detail

We pick a direction, and the HSDG features inthe direction can then be extracted. First, we selecta point P from D; then, the gradient in the direction (cid:126)P V P is: G ( (cid:126)P V P ) = g ( P ) − g ( P V ) (7)As the pixel value ranges from 0 to 255, the gra-dient value ranges from -255 to 255. Considering that the histogram operation on all gradient val-ues(511 types) will generate a huge feature dimen-sion, all gradient values are quantized to N values.All gradient values are quantiﬁed by the followingequation: q ( x ) =  − ≤ x < f ( −

255 + ∗ N )1 f ( −

255 + ∗ N ) ≤ x < f ( −

255 + ∗ N ) ... ...N − f ( −

255 + ∗ ( N − N ) ≤ x ≤ and are the directions along thetime axis, and HSDG features in the two directionsexpress the variations of the pixel values in the samepixel at diﬀerent times. HSDG descriptor can meet the studied aim inthis paper, but the movement feature extracted byit is only in a single direction and provides lim-ited feature information. LBP-TOP descriptor canextract appearance texture and dynamic texturefeatures and provides relatively suﬃcient texturefeature information. Thus, HSDG is concatenatedwith LBP-TOP to obtain LBP-SDG for studyingthe movement features in diﬀerent directions. Tak-ing the recognition rate of LBP-TOP as a reference,when the HSDG feature in a certain direction isadded, if the recognition rate of LBP-TOP is im-7roved, the movement feature in this direction canprovide eﬀective information. The entire featureextraction process is shown in Figure 1.

4. Experiments

A large number of experimental results and theanalysis are shown in this section. Our experimentstested LBP-SDG in 18 directions to summarize thedirections in which movement feature is eﬀective forrecognizing micro-expressions. If the recognitionrate of LBP-SDG in a certain direction is higherthan that of LBP-TOP, the movement feature pro-vided by the corresponding HSDG is eﬀective; thatis, the movement feature in this direction is eﬀec-tive, and vice versa. LBP-SDG in an optimal direc-tion was compared with other features to prove thatLBP-SDG is eﬀective, and HSDG is a discrimina-tive feature to recognize micro-expression. Further-more, comparing with other methods, EVM+LBP-SDG has state-of-the-art performance.The relative recognition rate is mentioned in thispaper for the most convenient description, wherethe relative recognition rate is the recognition rateof the LBP-SDG minus that of LBP-TOP. Thus,the eﬀective directions are the directions in whichthe relative recognition rate is greater than 0. Notethat the radius parameters ( R XV , R Y V , R T V , R X , R Y , and R T ) determine the time step and gradientvalue, so the study is based on these parameters. All experiments reported the recognition rate(RR) under leave-one-subject-out (LOSO) cross-validation. Experiments include the following threeparts: 1) LBP-SDG in 18 directions was testedfor the studied aim; 2) LBP-SDG was comparedwith other features under the same conditions; 3)LBP-SDG in the optimal direction was comparedwith other methods (including the state of theart method) by showing their recognition rate di-rectly. All experiments were performed on SMICand CASME II databases.The SMIC database is the ﬁrst spontaneousmicro-expression database published by the Univer-sity of Oulu, and the participants come from mul-tiple countries. The SMIC database includes threesubsets collected by three types of cameras: high-speed (HS), normal visual (VIS), and near-infrared(NIS). The SMIC-HS database is a subset of SMICwith the largest sample number and frame rate and was adopted in our experiment. The SMIC-HSdatabase contains 164 samples and is divided intothree categories: “Positive”, “Negative” and “Sur-prise”. Support Vector Machine with the linear ker-nel was employed as the classiﬁer. The block strat-egy was adopted, and each video was divided into 8* 8 * 2 blocks[51]. The frames number was uniﬁedto 10 using TIM and each frame was resized to 168* 136 (grayscale image); thus, each video was nor-malized to 168 * 136 * 10. The quantized number Nwas set to 2, which is equivalent to binarization ac-cording to Equation 8. Also, R V = R XV = R Y V = R T V = 1 , , , R = R X = R Y = R T = 1 , , , Alpha = 8 , , , , R XV = R Y V = 1, R V = R T V =2 , , ..., R X = R Y = 1, R = R T = 2 , , ..., Alpha = 17 , , , ,

29. Also, the uniformLBP[52] was employed to calculate LBP-TOP.As for the determination of alpha values, ﬁrst, wetested the alpha values (from 1 to 30) for LBP-TOP,and found the optimal alpha value for LBP-TOP;second, we selected the representative alpha valuesaround the optimal alpha value (including the op-timal alpha value), and under these selected alphavalues, the recognition rate of LBP-TOP ﬁrst in-creases and then decreases; ﬁnally, diﬀerent featureswere veriﬁed under these selected alpha values.

Our experimental aim is to study movement fea-ture in diﬀerent directions. In order to analyze thestudied aims in detail and comprehensively, this ex-periment was carried out under two settings: usingEVM and not using EVM.8 a) The recognition rate under not using EVM. (b) The relative recognition rate under not using EVM.(c) The recognition rate under using EVM. (d) The relative recognition rate under using EVM.

Figure 3: The SMIC-HS database: the line chart of the recognition rate and the dot charts of relativerecognition rate in 18 directions.The experiment results not using EVM are re-ported and analyzed in this paragraph. Table 5(see appendices) shows the recognition rate of LBP-SDG in 18 directions. The maximum recognitionrate (58.57% on SMIC, 51.48% on CASME II) ofLBP-TOP was used as a reference line. In Figure3(a) and 4(a), the 18 points (recognition rates in18 directions) are connected to form a line chart;in Figure 3(b) and 4(b), the relative recognitionrate is displayed as 18 independent points. Asshown in the two ﬁgures, the movement feature(in directions , and on SMIC-HS, and on CASME II) is very eﬀective;the movement feature (in directions , , , and on SMIC-HS, and , , , , , and on CASME II) is suboptimal; and the movement feature in other directions not onlycannot provide discriminative feature informationbut also produces disruptive feature information.The experimental results using EVM are re-ported and analyzed in this paragraph. Table 6and 7 (see appendices) show the recognition ratesof LBP-SDG in 18 directions under diﬀerent Alpha values. As shown in Figure 3(c) and 4(c), underdiﬀerent

Alpha values, the changing trend with thechange of direction is consistent in the line chart,and in Figure 3(c), the recognition rate in directions , , , and reaches the local peak,and in direction , , and , it reachesthe local trough, which means that the eﬀective di-rections also tend to be consistent under diﬀerent Alpha values. As EVM under diﬀerent

Alpha val-9 a) The recognition rate under not using EVM. (b) The relative recognition rate under not using EVM.(c) The recognition rate under using EVM. (d) The relative recognition rate under using EVM.

Figure 4: The CASME II database: the line chart of the recognition rate and the dot charts of relativerecognition rate in 18 directions.ues can aﬀect the movement features, the eﬀectivedirections under diﬀerent

Alpha values have some-what instability, but the overall trends to be con-sistent. As shown in Figure 3(d) and 4(d), withthe red line as a reference(the recognition rate ofLBP-TOP as shown in Table 1), the movement fea-tures in direction on SMIC-HS and - onCASME II are very eﬀective, the one in directions , , and on SMIC-HS, and , , , , and on CASME II is subopti-mal, and the one in other directions is invalid andeven disruptive.Overall, the studied results on diﬀerent databasesare diﬀerent. The diﬀerence is mainly due to thediﬀerences between diﬀerent databases, such as thecountry from which the participants came, the col- lection equipment and environment, etc. Thesediﬀerences and diﬀerent parameters lead to diﬀer-ent eﬀective directions for the two databases andprompted the most suitable Alpha value to be dif-ferent. Thus, on SMIC-HS, LBP-SDG in direc-tions , , and outperforms LBP-TOP whether EVM is employed or not, and theeﬀective directions are , , and .On the CASME II database, the studied resultswere divided into two parts: not using EVM, themovement feature in directions , , , and is very eﬀective, and using EVM, the onein directions - is eﬀective. Furthermore,the optimal direction is direction on SMIC-HS and around on CASME II, which means:on SMIC-HS, the variations of the pixel values10long the time axis are optimal for distinguishingmicro-expressions; on CASME II, the variations ofthe pixel values in the upward or horizontal direc-tion are very eﬀective for distinguishing the micro-expression. This experimental aim is to prove the eﬀective-ness of LBP-SDG and to prove that HSDG canprovide discriminative feature information. All ex-periments used EVM, and the results of LBP-SDGare in optimal directions. We compared LBP-SDGwith other features, including LVP-TOP, GDLBP-LVP-TOP, LBP-SIP, SIP-SDG, LBP-TOP. LVPand GDLBP-LVP are the LBP variants and wereused eﬀectively in face recognition. LVP-TOP andGDLBP-LVP-TOP can be obtained by extractingLVP[53] and GDLBP-LVP[54] on three orthogonalplanes for micro-expression recognition, and SIP-SDG can be obtained by concatenating LBP-SIPwith HSDG. Table 1 shows the comparative resultson both databases, and Table 2 shows the featuredimension of diﬀerent features.Overall, in comparison with other features, LBP-SDG has the best performance under every

Alpha value. In detail, SIP-SDG and LBP-SDG havebetter performance than LBP-SIP and LBP-TOP,respectively, which demonstrates that HSDG pro-vides discriminative feature information. As theperformance of SIP-SDG is similar to that ofLBP-TOP, and the feature dimension of SIP-SDG is lower than that of LBP-TOP, SIP-SDGcan be taken as an alternative features of LBP-TOP; GDLBP-LVP-TOP concatenate LVP-TOPwith three LBP-TOP, but in comparison with LBP-TOP, both LVP-TOP and GDLBP-LVP-TOP haveworse performance, which demonstrates the follow-ing: 1) the eﬀective features in other ﬁelds may notbe eﬀective for micro-expression recognition, andmay even worsen the recognition rate; and 2) theproposed HSDG not only has a low feature dimen-sion but also can enhance the performance of LBP-TOP/LBP-SIP, which shows that HSDG is a dis-criminative feature to represent micro-expressions.Furthermore, we compare our method with HOG-TOP and HIGO-TOP using EVM, and the resultsof the two features can be found in [41]. As shownin Table 3, LBP-SDG is superior to HOG-TOPand HIGO-TOP. In terms of the highest recogni-tion rate, LBP-SDG outperforms these features onboth databases. We further compare the confusion matricesof LVP-TOP, GDLBP-LVP-TOP, LBP-SIP, SIP-SDG, LBP-TOP and LBP-SDG in Figure 5 and 6(see appendices). On SMIC, LBP-SDG does nothave the best performance at recognizing a cer-tain class of expression, while it has the best av-erage performance over three classes. On CASMEII, SIP-SDG has the best performance at recog-nizing “Happiness”; GDLBP-LVP-TOP has bestperformance at recognizing “Surprise”; LVP-TOP,LBP-SIP and LBP-TOP has best performance forrecognizing “Others”; for the other two micro-expressions(“Disgust”, “Repression”), all featurehave poor performance, while LBP-SDG has thebest performance (57% for “Disgust” and 56%for “Repression”); In terms of the average perfor-mance, LBP-SDG is optimal. Furthermore, ac-cording to Table 2, HSDG has a very small fea-ture dimension, which makes the feature dimen-sion of LBP-SDG (SIP-SDG) very close to thatof LBP-TOP (LBP-SIP); LVP-TOP and GDLBP-LVP-TOP have a very high feature dimension,which seriously reduces the speed of recognition.

This section proves that EVM+LBP-SDG iscompetitive. The recognition rate of diﬀerentmethods is shown in Table 4. it can be seenthat EVM+LBP-SDG has the best performance(69.68% on SMIC-HS and 71.32% on CASMEII). CEF is the state-of-the-art method in thetraditional methods. Although CEF employsan additional model to integrate some process-ing, EVM+LBP-SDG still has better performance.LEARNet, SSSN and TSCNN are the state-of-the-art methods in deep learning-based methods.First, as shown in Table 4, LEARNet has thehighest recognition rate, while it adopts N-foldcross-validation and not the mainstream cross-validation method (LOSO); second, comparingSSSN, EVM+LBP-SDG shows only a slight advan-tage on CASME II, but on SMIC-HS, EVM+LBP-SDG has an obvious advantage; third, in terms ofrecognition rate, TSCNN has the best results, butsimilar to SSSN, it needs to detect apex frame be-fore recognizing micro-expressions, and in terms ofinputs, TSCNN need to calculate optical ﬂow, whileour method does not; ﬁnally, the deep learning-based methods train a feature extractor and theclassiﬁer for each subject validation, while the tra-ditional methods (including our method) uses thesame feature extractor for all subject validation.11able 1: Comparison LBP-SDG with other features. (a) On the SMIC-HS

Alpha

LVP-TOP GD-LVP-TOP LBP-SIP SIP-SDG LBP-TOP LBP-SDG8 52.88% 61.15% 60.59% 65.71%

10 51.85% 59.58% 59.74% 64.79%

11 55.15% 60.36% 60.01% 67.17%

12 55.04% 59.09% 58.54% 66.62% (b) On the CASME II Alpha

LVP-TOP GD-LVP-TOP LBP-SIP SIP-SDG LBP-TOP LBP-SDG17 62.53% 65.76% 58.93% 62.73%

20 62.27% 66.78% 61.55% 64.08%

23 65.48% 66.41% 62.92% 64.28%

26 64.90% 66.20% 63.50% 65.32%

29 65.24% 66.37% 64.06% 67.61% / Table 2: Comparison of feature dimension.

Methods CASME II SMIC-HSHSDG 50 256LVP-TOP 17700 90624GD-LVP-TOP 30975 109056LBP-SIP 1500 7680SIP-SDG 1550 7930LBP-TOP 4425 6144LBP-SDG 4475 6400

Table 3: Comparison of accuracy rate betweenLBP-SDG, HOG-TOP and HIGO-TOP.

Methods CASME II SMIC-HSHOG-TOP 63.97 61.59HIGO-TOP 67.21 68.29LBP-SDG

Table 4: Comparison of recognition rate betweenthe proposed method and other methods.Method SMIC-HS CASME IIFDM [36] 54.88% 45.93%STCLQP[28] 64.02% 58.39%STLBP-IP [29] 57.93% 59.51%TIM+STRBP [55] 60.98% 64.37%CNN+LSTM [20] - 60.98%ELRCN-TE [56] - 52.44%STLBP-IP+KGSL [22] 60.78% 63.83%3DFCNN [44] 55.49% 59.11%TIM+EVM+HIGO [41] 68.29% 67.21%LEARNet* [46] - 76.59%SSSN (cid:103) [47] 63.41 71.19%TSCNN (cid:103) [48] 72.74 74.05%CEF [18] 68.90% 70.85%The proposed method - no result in the original paper (cid:103) the method uses the apex frame * the results adopted diﬀerent cross-validation. . Future Works In this work, HSDGs in 18 directions were testedin turn to artiﬁcially select eﬀective HSDGs, whichspent a lot of time and didn’t consider the mul-tiple HSDGs simultaneously. In addition, diﬀer-ent parameters and databases will lead to diﬀer-ent conclusions about the eﬀective HSDGs. In fu-ture works, we will solve it by employing the fea-ture selection algorithm that can automatically se-lect eﬀective HSDGs and process 18 HSDGs simul-taneously. According to the work in this paper,LBP-TOP and HSDG have diﬀerent importance.Concretely, LBP-TOP is the main feature that pro-vides abundant feature information, and HSDGs isthe supplementary feature that provides the limitedfeature information and diﬀerent HSDGs have dif-ferent performance. Thus, this algorithm selectseﬀective HSDGs under concatenating with LBP-TOP and should consider the diﬀerent importanceof LBP-TOP and HSDG. In this way, we needn’ttest each HSDG in turn and can consider multipleHSDGs simultaneously. In addition, more discrim-inative features can be selected to recognize micro-expressions, and the performance can further be im-proved.

6. Conclusions

This paper studies movement feature in diﬀerentdirections and proposes a new and low-dimensionalsingle-direction movement feature HSDG. HSDGactively extracts the movement feature in a cer-tain direction. First, the gradient values are cal-culated in a certain direction, then these valuesare quantiﬁed before performing histogram oper-ations. Concatenating HSDG with LBP-TOP, weobtain LBP-SDG. In experiments, LBP-SDG in 18directions was tested to ensure eﬀective and opti-mal directions, and LBP-SDG in an optimal direc-tion was compared with other features and meth-ods. On SMIC-HS, the movement features alongthe time axis are optimal for distinguishing micro-expressions; on CASME II, extracting the move-ment features in the upward or horizontal direc-tions is very eﬀective for distinguishing the micro-expression. Comparing it with other features, LBP-SDG in an optimal direction has the best perfor-mance. By comparing the changes of LBP-TOPand LBP-SIP before and after adding HSDG, it isproven that HSDG can provide discriminative fea-ture information. Additionally, the results show that LBP-SDG+EVM outperforms state-of-the-artmethods.

7. Acknowledgements

This work was partly supported by the Post-graduate Research and Practice Innovation Pro-gram of Jiangsu Province (Grant KYCX18 0899),partly by the National Natural Science Foundationof China (NSFC) under Grants 72074038, partlyby the Key Research and Development Programof Jiangsu Province(Grant BE2016775), partly bythe National Natural Science Foundation of China(NSFC) (Grants 61971236), partly by China Post-doctoral Science Foundation (Grant 2018M632348).

References [1] E. Haggard, K. Isaacs, Methods of Research inPsychotherapy, 1966, pp. 154–165. doi:10.1007/978-1-4684-6045-2_14 .[2] P. Ekman, E. Rosenberg, Editors, What the face re-veals: Basic and applied studies of spontaneous expres-sion using the facial action coding system (facs) doi:10.1093/acprof:oso/9780195179644.001.0001 .[3] S. Porter, L. Brinke, Reading between the lies, Psy-chological science 19 (2008) 508–14. doi:10.1111/j.1467-9280.2008.02116.x .[4] D. Matsumoto, H. Hwang, Evidence for training theability to read microexpressions of emotion, Motiva-tion and Emotion 35 (2011) 181–191. doi:10.1007/s11031-011-9212-2 .[5] W.-J. Yan, Q. Wu, Y.-H. Chen, J. Liang, X. Fu, Howfast are the leaked facial expressions: The duration ofmicro-expressions, Journal of Nonverbal Behavior 37. doi:10.1007/s10919-013-0159-8 .[6] P. Ekman, Lie Catching and Microexpres-sions, 2009, pp. 118–136. doi:10.1093/acprof:oso/9780195327939.003.0008 .[7] M. O’Sullivan, M. Frank, C. Hurley, J. Tiwana, Policelie detection accuracy: The eﬀect of lie scenario, Lawand Human Behavior 33 (2009) 530–538. doi:10.1007/s10979-008-9166-4 .[8] S. Weinberger, Airport security: Intent to deceive?, Na-ture 465 (2010) 412–5. doi:10.1038/465412a .[9] M. Frank, M. Herbasz, K. Sinuk, A. M. Keller,A. Kurylo, C. Nolan, I see how you feel: Traininglaypeople and professionals to recognize ﬂeeting emo-tions, in: International Communication Association,2009.[10] P. Ekman, Telling lies: clues to deceit in the market-place, politics, and marriage.[11] P. Ekman, Microexpression training tool (mett), Uni-versity of California, San Francisco, CA. (2002).[12] C. J. G. V. Frank, Mark G. Maccario, Behavior andsecurity, Protecting Airline Passengers in the Age ofTerrorism (2009) 86–106.[13] W. Yan, X. Li, S. Wang, G. Zhao, Y. Liu, Y. Chen,X. Fu, Casme ii: An improved spontaneous micro-expression database and the baseline evaluation, PLOS NE 9 (1) (2014) 1–8. doi:10.1371/journal.pone.0086041 .[14] X. Li, T. Pﬁster, X. Huang, G. Zhao, M. Pietik¨ a inen,A spontaneous micro-expression database: Inducement,collection and baseline, in: 2013 10th IEEE Interna-tional Conference and Workshops on Automatic Faceand Gesture Recognition, FG 2013, 2013, pp. 1–6. doi:10.1109/FG.2013.6553717 .[15] A. Davison, C. Lansley, N. Costen, K. Tan, M. H. Yap,Samm: A spontaneous micro-facial movement dataset,IEEE Transactions on Aﬀective Computing 9 (1) (2018)116–129. doi:10.1109/TAFFC.2016.2573832 .[16] H.-Y. Wu, M. Rubinstein, E. Shih, J. Guttag, F. Du-rand, W. Freeman, Eulerian video magniﬁcation for re-vealing subtle changes in the world, ACM Transactionson Graphics 31.[17] Z. Zhou, G. Zhao, M. Pietik¨ a inen, Towards a prac-tical lipreading system, in: Proceedings of the IEEEComputer Society Conference on Computer Vision andPattern Recognition, 2011, pp. 137–144. doi:10.1109/CVPR.2011.5995345 .[18] W. Peng, X. Hong, Y. Xu, G. Zhao, A boost in reveal-ing subtle facial expressions: A consolidated eulerianframework, in: 14th IEEE International Conference onAutomatic Face and Gesture Recognition (FG 2019),2019. doi:10.1109/FG.2019.8756541 .[19] L. Ngo, A.C., Phan, R.CW., S. J, Spontaneous subtleexpression recognition: Imbalanced databases and solu-tions, in: Asian Conference on Computer Vision, 2015,pp. 33–48. doi:10.1007/978-3-319-16817-3_3 .[20] D. Kim, W. Baddar, Y. Ro, Micro-expression recogni-tion with expression-state constrained spatio-temporalfeature representations, in: Acm on Multimedia Con-ference, 2016, pp. 382–386. doi:10.1145/2964284.2967247 .[21] X. Jia, X. Ben, H. Yuan, K. Kpalma, W. Meng, Macro-to-micro transformation model for micro-expressionrecognition, Journal of Computational Science (2017)289–297 doi:10.1016/j.jocs.2017.03.016 .[22] Y. Zong, X. Huang, W. Zheng, Z. Cui, G. Zhao,Learning from hierarchical spatiotemporal descriptorsfor micro-expression recognition, IEEE Transactions onMultimedia 20 (2018) 3160–3172. doi:10.1109/TMM.2018.2820321 .[23] S. Zhang, B. Feng, Z. Chen, X. Huang, Micro-Expression Recognition by Aggregating Local Spatio-Temporal Patterns, Springer International Publishing,2017. doi:10.1007/978-3-319-51811-4_52 .[24] T. Ojala, M. Pietik¨ a inen, D. Harwood, A comparativestudy of texture measures with classiﬁcation based onfeature distributions, Pattern Recognition 29 (1996) 51–59. doi:10.1016/0031-3203(95)00067-4 .[25] G. Zhao, M. Pietik¨ a inen, Dynamic texture recognitionusing local binary patterns with an application to fa-cial expressions, IEEE Transactions on Pattern Anal-ysis and Machine Intelligence 29 (6) (2007) 915–928. doi:10.1109/TPAMI.2007.1110 .[26] Y. Wang, J. See, R. Phan, Y.-H. Oh, Lbp with six inter-section points: Reducing redundant information in lbp-top for micro-expression recognition, in: ACCV, Vol.9003, 2015. doi:10.1007/978-3-319-16865-4_34 .[27] Y. Wang, J. See, R. Phan, Y.-H. Oh, Eﬃcient spatio-temporal local binary patterns for spontaneous fa-cial micro-expression recognition, PloS one 10 (2015)e0124674. doi:10.1371/journal.pone.0124674 . [28] X. Huang, G. Zhao, X. Hong, W. Zheng,M. Pietik¨ a inen, Spontaneous facial micro-expressionanalysis using spatiotemporal completed local quan-tized patterns, Neurocomputing 175 (2015) 564–578. doi:10.1016/j.neucom.2015.10.096 .[29] X. Huang, S.-J. Wang, G. Zhao, M. Pietik¨ a inen, Facialmicro-expression recognition using spatiotemporal localbinary pattern with integral projection, in: 2015 IEEEInternational Conference on Computer Vision Work-shop, 2015, pp. 1–9. doi:10.1109/ICCVW.2015.10 .[30] X. Huang, S. Wang, X. Liu, G. Zhao, X. Feng,M. Pietik¨ a inen, Discriminative spatiotemporal localbinary pattern with revisited integral projection forspontaneous facial micro-expression recognition, IEEETransactions on Aﬀective Computing 10 (1) (2019) 32–47. doi:10.1109/TAFFC.2017.2713359 .[31] Y. Wang, J. See, Y.-H. Oh, R. C.-W. Phan, R. Rahu-lamathavan, H.-C. Ling, S.-W. Tan, X. Li, Ef-fective recognition of facial micro-expressions withvideo motion magniﬁcation, Multimedia Tools andApplications 76 (2017) 21665–21690. doi:10.1007/s11042-016-4079-6 .[32] B. Horn, B. Schunck, Determining optical ﬂow, Ar-tiﬁcial Intelligence 17 (1981) 185–203. doi:10.1016/0004-3702(81)90024-2 .[33] S.-T. Liong, R. Phan, J. See, Y.-H. Oh, K. Wong, Opti-cal strain based recognition of subtle emotions, in: 2014International Symposium on Intelligent Signal Process-ing and Communication Systems (ISPACS), 2015, pp.180–184. doi:10.1109/ISPACS.2014.7024448 .[34] S. Liong, J. See, R. Phan, A. C. Le Ngo, Y.-H. Oh,K. Wong, Subtle expression recognition using opti-cal strain weighted features, in: Computer Vision-ACCV 2014 Workshops, Revised Selected Papers, PartII, Vol. 9009, 2014, pp. 644–657. doi:10.1007/978-3-319-16631-5_47 .[35] S. Liong, J. See, R. Phan, K. Wong, Less is more:Micro-expression recognition from video using apexframe, Signal Processing: Image Communication 62(2016) 82–92. doi:10.1016/j.image.2017.11.006 .[36] F. Xu, J. Zhang, J. Wang, Microexpression identiﬁca-tion and categorization using a facial dynamics map,IEEE Transactions on Aﬀective Computing 8 (2017)254–267. doi:10.1109/TAFFC.2016.2518162 .[37] Y.-J. Liu, J.-K. Zhang, W.-J. Yan, S.-J. Wang, G. Zhao,X. Fu, A main directional mean optical ﬂow feature forspontaneous micro-expression recognition, IEEE Trans-actions on Aﬀective Computing 7 (4) (2016) 299–310. doi:10.1109/TAFFC.2015.2485205 .[38] Y.-J. Liu, B.-J. Li, Y.-K. Lai, Sparse mdmo: Learning adiscriminative feature for spontaneous micro-expressionrecognition, IEEE Transactions on Aﬀective ComputingPP (2018) 1–1. doi:10.1109/TAFFC.2018.2854166 .[39] S. L. Happy, A. Routray, Fuzzy histogram of opti-cal ﬂow orientations for micro-expression recognition,IEEE Transactions on Aﬀective Computing 10 (3)(2019) 394–406. doi:10.1109/TAFFC.2017.2723386 .[40] S. Polikovsky, Y. Kameda, Y. Ohta, Facial micro-expressions recognition using high speed camera and3d-gradient descriptor, in: 3rd International Conferenceon Imaging for Crime Detection and Prevention (ICDP2009), 2010, pp. 1 – 6. doi:10.1049/ic.2009.0244 .[41] X. Li, X. Hong, A. Moilanen, X. Huang, T. Pﬁs-ter, G. Zhao, M. Pietik¨ a inen, Towards reading hiddenemotions: A comparative study of spontaneous micro- xpression spotting and recognition methods, IEEETransactions on Aﬀective Computing 9 (4) (2018) 563–577. doi:10.1109/TAFFC.2017.2667642 .[42] Y. Li, X. Huang, G. Zhao, Can micro-expression berecognized based on single apex frame?, in: IEEE 201825th IEEE International Conference on Image Process-ing (ICIP), 2018, pp. 3094–3098. doi:10.1109/ICIP.2018.8451376 .[43] S.-J. Wang, B.-J. Li, Y.-J. Liu, W.-J. Yan, X. Ou,X. Huang, F. Xu, X. Fu, Micro-expression recognitionwith small sample size by transferring long-term con-volutional neural network, Neurocomputing 312 (2018)251–262. doi:10.1016/j.neucom.2018.05.107 .[44] S. J. L. W. Li Jing, Wang Yandan, Micro-expressionrecognition based on 3d ﬂow convolutional neural net-work, Pattern Analysis and Applications 22 (2018)1331–1339. doi:10.1007/s10044-018-0757-5 .[45] S. Liong, Y. Gan, J. See, H.-Q. Khor, Y.-C. Huang,Shallow triple stream three-dimensional cnn (ststnet)for micro-expression recognition, in: 2019 14th IEEEInternational Conference on Automatic Face and Ges-ture Recognition (FG 2019), 2019, pp. 1–5. doi:10.1109/FG.2019.8756567 .[46] M. Verma, S. Vipparthi, G. Singh, S. Murala, Learnetdynamic imaging network for micro expression recog-nition, IEEE Transactions on Image Processing (2019)P doi:10.1109/TIP.2019.2912358 .[47] H.-Q. Khor, J. See, S. Liong, R. Phan, W. Lin, Dual-stream shallow networks for facial micro-expressionrecognition, in: 2019 IEEE International Conferenceon Image Processing (ICIP), 2019, pp. 36–40. doi:10.1109/ICIP.2019.8802965 .[48] B. Song, K. Li, Y. Zong, Z. Jie, W. Zheng, J. Shi,L. Zhao, Recognizing spontaneous micro-expression us-ing a three-stream convolutional neural network, IEEEAccess PP (2019) 1–1. doi:10.1109/ACCESS.2019.2960629 .[49] K. Goh, C. Ng, L. Lim, U. Sheikh, Micro-expressionrecognition: an updated review of current trends, chal-lenges and solutions, The Visual Computer 36 (2018)1–1. doi:10.1007/s00371-018-1607-6 .[50] N. Dalal, B. Triggs, Histograms of oriented gradients forhuman detection, in: IEEE Conference on ComputerVision and Pattern Recognition (CVPR 2005), Vol. 1,2005, pp. 886–893. doi:10.1109/CVPR.2005.177 .[51] X. Hong, Y. Xu, G. Zhao, Lbp-top: a tensor un-folding revisit, in: ACCV Workshop on ”Sponta-neous Facial Behavior Analysis”, 2016. doi:10.1007/978-3-319-54407-6_34 .[52] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolutiongray-scale and rotation invariant texture classiﬁcationwith local binary patterns, IEEE Transactions on Pat-tern Analysis and Machine Intelligence 24 (7) (2002)971–987. doi:10.1109/tpami.2002.1017623 .[53] K. Fan, T. Hung, A novel local pattern descrip-tor—local vector pattern in high-order derivative spacefor face recognition, IEEE Transactions on Image Pro-cessing 23 (7) (2014) 2877–2891. doi:10.1109/TIP.2014.2321495 .[54] S. Chakraborty, S. K. Singh, P. Chakraborty, Perfor-mance enhancement of local vector pattern with gen-eralized distance local binary pattern for face recogni-tion, in: 2015 IEEE UP Section Conference on Elec-trical Computer and Electronics, 2015, pp. 1–5. doi:10.1109/UPCON.2015.7456681 . [55] X. Huang, G. Zhao, Spontaneous facial micro-expression analysis using spatiotemporal local radon-based binary pattern, in: International Conference onThe Frontiers and Advances in Data Science, 2016, pp.382–386. doi:10.1109/FADS.2017.8253219 .[56] H.-Q. Khor, J. See, R. Phan, W. Lin, Enriched long-term recurrent convolutional network for facial micro-expression recognition, in: 2018 13th IEEE Inter-national Conference on Automatic Face and GestureRecognition, 2018, pp. 667–674. doi:10.1109/FG.2018.00105 . Appendices (a) On the SMIC-HS

DT RR( R V , R ) DT RR( R V , R ) (1,2) (b) On the CASME II DT RR( R V , R ) DT RR( R V , R ) (8,5) Alpha values. (a)

Alpha =8 DT RR( R V , R ) DT RR( R V , R ) (3,3) (b) Alpha =9 DT RR( R V , R ) DT RR( R V , R ) (3,3) (c) Alpha =10

DT RR( R V , R ) DT RR( R V , R ) (3,3) (d) Alpha =11

DT RR( R V , R ) DT RR( R V , R ) (3,3) (e) Alpha =12

DT RR( R V , R ) DT RR( R V , R ) (3,3) Alpha values. (a)

Alpha =17

DT RR( R V , R ) DT RR( R V , R ) (4,8) (b) Alpha =20

DT RR( R V , R ) DT RR( R V , R ) (4,8) (c) Alpha =23

DT RR( R V , R ) DT RR( R V , R ) (5,8) (d) Alpha =26

DT RR( R V , R ) DT RR( R V , R ) (e) Alpha =29

DT RR( R V , R ) DT RR( R V , R ) (5,8) (5,8) Positive Negative SurprisePositiveNegativeSurprise (a) LVP-TOP.

Positive Negative SurprisePositiveNegativeSurprise (b) GDLBP-LVP-TOP

Positive Negative SurprisePositiveNegativeSurprise (c) LBP-SIP.

Positive Negative SurprisePositiveNegativeSurprise (d) SIP-SDG C Positive Negative SurprisePositiveNegativeSurprise (e) LBP-TOP.

Positive Negative SurprisePositiveNegativeSurprise (f) LBP-SDG

Figure 5: The confusion matrices of diﬀerent features on the SMIC-HS database.19 (a) LVP-TOP. (b) GDLBP-LVP-TOP

Happiness Disgust Surprise Repression OthersHappinessDisgustSurpriseRepressionOthers (c) LBP-SIP

Happiness Disgust Surprise Repression OthersHappinessDisgustSurpriseRepressionOthers (d) SIP-SDG C Happiness Disgust Surprise Repression OthersHappinessDisgustSurpriseRepressionOthers (e) LBP-TOP.

Happiness Disgust Surprise Repression OthersHappinessDisgustSurpriseRepressionOthers (f) LBP-SDG.(f) LBP-SDG.