Exploring Facial Expressions and Affective Domains for Parkinson Detection
Luis Felipe Gomez-Gomez, Aythami Morales, Julian Fierrez, Juan Rafael Orozco-Arroyave
MMANUSCRIPT, DECEMBER 2020 1
Exploring Facial Expressions and AffectiveDomains for Parkinson Detection
Luis Felipe G ´omez-G ´omez, Aythami Morales, Julian Fierrez, and Juan Rafael Orozco-Arroyave
Abstract —Parkinson’s Disease (PD) is a neurological disorder that affects facial movements and non-verbal communication. Patientswith PD present a reduction in facial movements called hypomimia which is evaluated in item 3.2 of the MDS-UPDRS-III scale. In thiswork, we propose to use facial expression analysis from face images based on affective domains to improve PD detection. We proposedifferent domain adaptation techniques to exploit the latest advances in face recognition and Face Action Unit (FAU) detection. Theprincipal contributions of this work are: (1) a novel framework to exploit deep face architectures to model hypomimia in PD patients; (2)we experimentally compare PD detection based on single images vs. image sequences while the patients are evoked various faceexpressions; (3) we explore different domain adaptation techniques to exploit existing models initially trained either for FaceRecognition or to detect FAUs for the automatic discrimination between PD patients and healthy subjects; and (4) a new approach touse triplet-loss learning to improve hypomimia modeling and PD detection. The results on real face images from PD patients show thatwe are able to properly model evoked emotions using image sequences (neutral, onset-transition, apex, offset-transition, and neutral)with accuracy improvements up to 5.5% (from 72.9% to 78.4%) with respect to single-image PD detection. We also show that ourproposed affective-domain adaptation provides improvements in PD detection up to 8.9% (from 78.4% to 87.3% detection accuracy).
Index Terms —Parkinson’s disease, Hypomimia, Facial expressions, Face Action Unit, Affective domains, Triplet loss. (cid:70)
NTRODUCTION P ARKINSON ’ S Disease (PD) is a neurological disordercharacterized by motor and non-motor impairmentsthat affects between 1 and 2 percent of people over 65years old [7]. Motor deficits include bradykinesia, rigidity,postural instability, tremor, and dysarthria; and non-motordeficits include depression, anxiety, sleep disorders, andslowing of thought. Besides the extensive list of symptoms,most patients with PD exhibit also difficulties to expressemotions or specific expressions on their faces. Possiblesigns of those abnormalities include less range of facialmuscle movement, wider opening of eyes, half-open mouth,and slower blinking. All of these phenomena in their fa-cial expression are grouped in the literature and calledhypomimia [5], which is the result of motor impairmentsat the facial muscles level. It is typically not noticed in earlystages of PD, but once there is a significant deterioration,orofacial movements are highly reduced which can resultin expressionless faces, with a very limited capability tosmile, to express other emotions or feelings like happiness,sadness, anger, fear, disgust, and surprise [13]. The maineffect of these impairments is in difficulties with non-verbalcommunication which also produces social isolation in amid to long term.Clinical evaluation of PD patients is mainly performedby expert neurologists according to the Movement DisorderSociety - Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [22]. This scale is the global standard for the clinicalevaluation of PD patients and it considers both motor andnon-motor symptoms. Items of the MDS-UPDRS scale rangebetween 0 and 4, where 0 means completely healthy and4 means completely impaired. Section III in MDS-UPDRShas a maximum value of 132 and covers motor examinationincluding facial expression in one item. According to theguidelines given by the Movement Disorder Society, the five levels of the item where hypomimia is evaluated canbe used to assess facial expressions in PD patients [22].The following list indicates the correspondence betweenpossible values of the item and their meaning in terms offacial expression evaluation:0) Normal: Normal facial expression.1) Slight: Minimal masked facies manifested only bydecreased frequency of blinking.2) Mild: In addition to decreased eye-blink frequency,masked facies present in the lower face as well,namely fewer movements around the mouth, suchas less spontaneous smiling, but lips not parted.3) Moderate: Masked facies with lips parted some ofthe time when the mouth is at rest.4) Severe: Masked facies with lips parted most of thetime when the mouth is at rest.Neurological evaluation highly depends on the clini-cian’s expertise, which causes variability and possible biasin the rating procedure. Therefore, the development ofcomputerized systems to objectively support the evalua-tion of the disease progression is now growing in impor-tance. There are several contributions in the state of theart where computerized systems are introduced to evaluatedifferent aspects of Parkinson’s patients including speech[35, 34], gait [20, 12], handwriting [45, 9, 16, 11], handsmovement [49], and facial expression [4]. Among all, facialexpression and hypomimia seem to be the least covered.Facial Expression Recognition (FER) refers to the evaluationof the capability of PD patients to effectively recognizedifferent expressions or emotions when watching at faces.Facial Expressivity Evaluation (FEE) refers to the capabilityof the patient to produce different facial expressions oremotions. Both aspects have a very important role in social a r X i v : . [ c s . C V ] D ec ANUSCRIPT, DECEMBER 2020 2 interaction and non-verbal communication. The first one hasbeen studied for several decades mainly by psychologistsin different works and the main findings are summarizedin a relatively recent study [3]. On the other hand, FEEhas become a popular field among engineers and computerscientists, which opens space for research in different appli-cations related to Affective computing.During the past two decades, the Affective computingcommunity has made great advances in developing noveltechnologies to model facial expressions and emotional in-formation [39, 32, 42]. One of the goals of affective tech-nologies is to create computational models with the abilityto recognize, interpret, and process human emotions, mak-ing human-computer interaction more useful. Sentimentanalysis and affective computing have been continuouslystudied since the 20th century, helping in the developmentof computer vision systems [10, 31, 26], in the creation ofentertainment [41], and in the development of systems toaid different areas of medicine including neurology [52, 28,37].Our work is focused on the study of FEE in PD patients,the main aim is to consider videos collected from patients toevaluate their capability to produce specific emotions and tocompare such a capability with respect to healthy subjectsusing recent advances in Affective domains.
ELATED W ORKS
One of the earliest studies about FEE in PD patients wasconducted in 2004 by Simons et al [46]. The authors evalu-ated the capability of 19 PD patients and 25 healthy subjectsto pose and imitate different facial expressions. Videos withsocial interactions were used to evoke emotional responsesin the patients faces. The videos were manually analyzedand the participants’ expressiveness was rated according tosubjective rating scales, objective facial measurements, andself-questionnaires. The objective measurement was basedon the facial action coding system presented in [19], wherethe facial expression is decomposed according to specific fa-cial muscle movements like rising eyebrows and wrinklingthe nose. The results of the study indicated that patientswith PD have reduced capability to produce spontaneousfacial expressions in all experimental situations. Two yearslater in [6], the authors presented a work where expressivityand bradykinesia were studied. The authors hypothesizedthat intentional facial expressions are slowed (bradykinetic)and with less movement in PD patients than in healthycontrols. This hypothesis was basically inspired in other in-tentional movements performed by PD patients, e.g., walk-ing, where bradykinesia is also observed. Digitized videoswere evaluated frame-by-frame and the entropy in temporalchanges of pixel intensity was measured [44]. The authorsfound that PD patients had reduced entropy compared tohealthy controls, and were significantly slower in reachinga peak expression ( p < . ), which is directly associatedto bradykinesia.In 2016 Almutiry et al. [2] presented perhaps the onlylongitudinal study about FEE in PD patients. A total of 8subjects (4 PD and 4 healthy controls) participated in thestudy. Patients were recorded for five days per week (onceper day) during six weeks while controls were recorded for five days within one week. Participants were requested toproduce specific facial expressions while being recorded.The authors used two classical feature extraction methods tolocalise 27 facial features: Active Appearance Model (AAM)and Constrained Local Model (CLM). The results suggestedthat PD patients exhibit less movement than controls, whichconfirms the observations made ten years earlier by Bowerset al. [6].In 2017, Gunnery et al. [27] studied the coordinationof movements across regions of the face in 8 PD patients(4 female). They used the facial action coding system [19,14] to measure spontaneous facial expressions. The numberof activated frames per action unit and their intensity wasmanually labeled. Correlations were computed for activa-tion values obtained across different regions of the face.The results showed that as severity of facial expressiondeficit increased, there was a decrease in number, duration,intensity, and co-activation of facial muscle action. In thesame year, Bandini et al. [4] classified emotions expressed by17 PD patients (13 male) and equal number of healthy con-trols (6 male). Different emotions were evaluated includinghappiness, anger, disgust, and sadness. Different areas of theface were modeled with 49 landmarks [50, 24], including:eyes, eyebrows, mouth, and nose. A total of 20 featureswere extracted to define a linear combination of specificreference points. Acted and imitated facial expressions wereconsidered. An SVM was trained to automatically detectdifferent emotions expressed by participants. The resultswith imitated expressions showed higher accuracies forhealthy controls in most of the emotions. The only casewhere the PD patients displayed an expression better thanthe healthy subjects was sadness. When acted expressionswere evaluated, the authors found also higher accuraciesfor healthy subjects than for PD patients.Other contributions in the topic of FEE in PD includethe study of Kang et al [30]. The authors evaluated whetherdeficiencies in the orofacial movements of PD patients oc-cur in spontaneous and voluntary expressions. Muscularactivation (related with specific regions in the face) werestudied considering electro-myography signals. Data fromthe East Asian Dynamic Facial Expression Stimuli (EAD-FES) database was used [33]. A group with 20 PD patientsand 20 healthy controls was evaluated; the authors reportlimitations of patients to express emotions spontaneously,although the observed dynamics in the movement of theface are similar across all subjects. The study also high-lighted the deterioration in the patient’s quality of lifedue to the presence of “masked face”, affecting social andpsychological aspects and increasing their risk to developdepression-related symptoms.More recently, in another line of work, Gram-matikopoulou et al. [25] analyzed facial expressions fromimages captured with smartphones. Geometric features ofthe face were extracted and stored in the cloud. A total of34 participants were recruited, 23 with PD and 11 healthycontrols. Patients were divided into three groups accordingto the facial expression score of the MDS-UPDRS-III scale.The authors extracted two feature sets: one by using theGoogle Face API and the other one using the MicrosoftFace API [23]. The feature sets were composed by referencepoints on the faces, then two linear regression models were ANUSCRIPT, DECEMBER 2020 3 developed (one per feature set) to estimate two differentvalues of the Hypomimia Severity index, namely HSi1 andHSi2. These two indexes were used to classify betweenParkinson’s patients and healthy people. The reported sensi-tivity and specificity values were . and . , respectivelyfor HSi1 while . and . for HSi2. In 2020 Sonawaneand Sharma [48] presented a review of automatic techniquesand the use of machine learning in detecting emotionalfacial expressions in PD patients. The authors show that theuse of deep learning in this field has not been adequatelyaddressed yet in the classification between healthy peopleand PD patients. Also, they conducted a pilot experimentbased on the use of one CNN from scratch for masked facesdetection. The pilot experiment shows that deep learning-based models can be very useful to perform the classifica-tion. As shown in the literature review, there is a lack of workin the field of FEE for modeling hypomimia in Parkinson’sDisease (PD) patients with latest affective models includ-ing deep learning techniques. One of the main reasonsfor this lack of deep approaches is the absence of largescale databases with Parkinson’s Disease patients. In con-trast, Face Recognition and Affective Computing researchcommunities have made great efforts to release databaseswith millions of samples. In this work, we propose to usefacial expression analysis and Affective domains to improvethe PD detection. We propose different domain adaptationtechniques [18, 47] to exploit the latest developments in facerecognition and Face Action Unit (FAU) detection [43]. Themain contributions of this paper are: (1) a novel frameworkto exploit deep face architectures to model hypomimia inPD patients; (2) we experimentally compare PD detectionbased on single images vs. image sequences while thepatients are evoked various face expressions; (3) we exploredifferent domain adaptation techniques to exploit existingmodels initially trained either for Face Recognition or todetect FAUs for the automatic discrimination between PDpatients and healthy subjects; and (4) a new approach to usetriplet-loss learning to improve hypomimia modeling andPD detection.
XPERIMENTAL F RAMEWORK
Let’s assume that w FR is a model trained for Face Recog-nition tasks and the representation x FR is a feature vectorgenerated by the model (typically from the last layers of aConvolutional Neural Network) from an input face image.This representation x FR is learned to describe the face imagein a projected space where faces from the same personremain closer than faces from different persons. Similarly,models and representations can be trained for differenttasks such as Affect recognition ( w AF ) (e.g., in the formof facial gestures) or Parkinson’s Disease detection ( w PD ).Domain adaptation refers to methods that serve to adapta representation x A trained for the domain A to a newdomain B (typically a domain with similar characteristics toA but less information to train). The resulting representation x B , adapted from x A , is expected to perform better than arepresentation trained from scratch for the domain B. We propose an experimental framework where Affectivefeatures are explored at different levels (or domains). The listof domains and the corresponding underlying hypotheses tobe explored are presented below. (See also Figure 1.) Face Recognition Domain (Level 1).
Our acquisition proto-col introduces emotional tasks including evoked emotionalstates (smiling, anger, and surprise) and coordinated facegestures (right eye wink, left eye wink): • Hypothesis (H1): evoked responses intensify the fea-tures necessary to model hypomimia in Parkinson’spatients. The representation x FR can be improvedby incorporating different facial gestures during theacquisition protocol. • Experiment: we evaluate the performance of PD de-tection for different sequences of face gestures usingpre-trained Face Recognition models ( w FR trainedwith VGGFace2 [8]). Affective Domain (Level 2).
We propose to improve thelearned Face Recognition representations ( x FR ) for ParkinsonDetection by incorporating an Affective domain adaptation w AF training process: • Hypothesis (H2): automatic detection of hypomimia isimproved when features from the emotion domainare incorporated to the representations. The repre-sentation x AF performs better for Parkinson Detec-tion than the representation x FR . • Experiment: the pre-trained models ( w FR ) are adaptedto the Affective domain ( w AF ) using the EmotioNetdatabase [15] and FAU detection. Both, the perfor-mance of x FR and x AF are evaluated for ParkinsonDetection. Parkinson Domain (Level 3).
We evaluate the performanceobtained by representations x PD trained with Healthy andParkinson patients and the Triplet Loss function: • Hypothesis (H3.1): similarity learning functions de-signed to enhance the Parkinson features can serveto improve the capability to detect hypomimia.
Ex-periment: the Affective model ( w AF ) is adapted to theParkinson domain using the Triplet Loss functionand the FacePark-GITA database (see Section 3.1.3for details).The best performing facial representations are also usedto classify patients with different levels of neurologicalimpairment according to the MDS-UPDRS-III scores: • Hypothesis (H3.2): facial representations learned inprevious models have information to identify andevaluate different levels of neurological impairmentin Parkinson’s Disease patients. • Experiment: models created to represent hypomimiaare used to evaluate three different neurologicalstates according to the MDS-UPDRS-III scores.Details of the methods implemented to validate all hy-potheses are presented in Section 3.2.
ANUSCRIPT, DECEMBER 2020 4 𝐱 AF Model 𝐰 PD VGGFace2 ~3M images
EmotioNet ~1M images
Model 𝐰 AF Model 𝐰 FR SVM 1
Face Input:
BlankExpressionFace Input: EvokedEmotion
FacePark
TEST/INFER ~1K images SVM 2 SVM 3.1SVM 3.2
FacePark
DEV/LEARN ~1K images * * *** TEST/INFERDEV/LEARN
PARKINSON
DOMAIN
AFFECTIVE DOMAINFACE RECOGNITION
DOMAININPUT (IMAGE)
DOMAIN LEVEL 1 LEVEL 2 LEVEL 3 𝐱 FR 𝐱 PD 𝐱 AF Figure 1: Experimental framework proposed for the development of this work. * SVM 1, SVM 2, and SVM 3.1 classifybetween PD and Healthy Control (HC); ** SVM 3.2 classify PD patients in 3 impairment levels: PD-1, PD-2, and PD-3.
Three different databases are considered in this work. VG-GFace2 [8] and EmotioNet [15] which are popular for FaceRecognition and Face Action Unit detection, respectively.The third one is a new database composed by PD patientsand healthy subjects. It contains face videos of patientssuffering from Parkinson’s disease and age-matched healthycontrols. This new corpus is called FacePark-GITA. Detailsof each database are presented below.
This database comprises more than . million faces from , different subjects. An average of . images persubject are included [8]. The images were downloaded fromGoogle Image Search. The corpus has large variations inpose, age, lighting, ethnicity, and profession. This databaseis popular in the Face Recognition community and it hasbeen extensively used to train competitive recognition mod-els [40, 29]. This database was originally introduced by researchersfrom the Ohio State University who released the
EmotioNetChallenge in 2017 [15]. This database contains one millionfacial expression images collected from the Internet. Atotal of , images were annotated by the automaticAction Unit (AU) detection model presented in [15], andthe remaining , images were manually annotated byexperts. A total of AUs are included in the corpus.
The database was created by GITA Lab. The recording ofpatients is still ongoing and the most updated version of thecorpus contains video recordings of 24 healthy participantsand 30 PD patients. The videos were recorded at 30 framesper second in non-controlled environment conditions, i.e.,light conditions and the background were not controlledprior the recording and differ among participants. PD pa-tients were diagnosed by a neurologist expert and were Table 1: Demographic and clinical information of the partic-ipants included in the FacePark-GITA database.
PD patients Healthy participants
Men Women Men Women ± ± ± ± t [years] 8.7 ± ± t range [years] 2 – 20 1 – 45 — —MDS-UPDRS-III 35.4 ± ± ± ± t : Years since diagnosis evaluated according to the MDS-UPDRS-III scale and theHoehn and Yahr scale (H&Y) [21]. A summary of the clinicaland demographic information is presented in Table 1. Allparticipants gave written informed consent. The study is inaccordance with the Declaration of Helsinki and it was ap-proved by the Ethical Research Committee at the Universityof Antioquia.The participants of this study were asked to producedifferent facial expressions while being recorded. A total offive video-task recordings are included: right eye wink, lefteye wink, smile, anger, and surprise. The average durationof each video is seconds. Patients have an average age of69 years old and healthy subjects were chosen with a similarrange of age. Possible bias introduced by age or gender werediscarded via a chi-square statistical test ( p = 0 . ) and aWelch’s t-test ( p = 0 . ), respectively. In this work we employ the ResNet50 architecture [29], with50 layers and . M parameters. This model is used togenerate an initial face representation. The ResNet50 model
ANUSCRIPT, DECEMBER 2020 5 was originally proposed for general image recognition tasksand later it was retrained with the VGGFace2 database [8]for Face Recognition. The model is used as feature extractorby removing the final decision layer. For each face image,the model generates a × feature vector.In our experiments we apply Transfer Learning (TL)[38] to adapt from one domain to another (e.g. from FaceRecognition to the Affective domain). TL are methods whereweights from a model originally learned for one task areused as initialization before adjusting the model for a dif-ferent task. One of the transfer learning techniques consistsin freezing intermediate and initial layers to retain their ca-pability to extract general characteristics and retrain the lastlayers closer to the network output. Re-training of those lastlayers allows to adapt the original feature space for the newtask. These methods are suitable for problems where datais scarce and end-to-end learning approaches fail to findthe optimal feature space. The number and size of availabledatabases to model hypomimia in patients suffering fromPD are very small (typically less than subjects and lessthan , images in total), so we expect that TL techniqueswill be very useful here to adapt to the Parkinson domainfrom the Face Recognition domain, where massive datasetsare available for learning (millions of images). In addition to the ResNet50 Face Recognition model, inthis work we employ two deep neural networks trainedfrom scratch for Face Action Unit (FAU) detection. Thearchitectures employed are based on the popular VGG andResNet models [53, 43]. The details of the two models aredescribed below:
VGG-8 : This model contains 8 convolutional layers di-vided into groups of 2 layers. Each group is followed bya Max pooling layer. Convolutional layers apply a varietyof filters to the images and Max-Pooling layers reduce thesize of the filtered images. Additionally, dropout is used inthe regularization layers to randomly discard neurons in themodel and make it less prone to overfitting. The final part ofthe architecture has a total of six convolutional layers (fully-connected) before the decision layer. The number of neuronsper layer is , , , , , and . The number ofparameters of this model is , . ResNet-7 : The ResNet model is composed of a total of 7residual blocks. Each block can be defined as an identity-block or a conv-block. The identity-blocks are the standardblocks used in ResNet, they have a set of convolutional fil-ters and a shortcut connection which bypasses these blocks.This block has the same input and output dimensions.Conv-blocks are the block types where the input and outputdimensions do not match. The difference with the identity-block is a convolutional layer in the shortcut to the output.The benefit of these architectures is that in traditional archi-tectures by having a high amount of layers in the training,the problem of error degradation appears. ResNet modelswith their previous layer shortcut connections are effectivein solving this problem [29]. The number of parameters ofthis model is , . Due to the limited number of samples in the FacePark-GITAdatabase, for the Parkinson domain adaptation we opted fora Triplet Loss learning approach. The Triplet Loss functionconsists in applying a linear transformation over the databefore taking the distance among samples. Given a trainingdata set S = ( x i , y i ) with inputs x i ∈ R d and discrete classlabels y i ∈ Z , the goal is to find a transformation to the inputdata such that reduces the distance between pairs fromthe same class while increases the distance between pairsfrom different classes. The Mahalanobis distance defined inEquation 1 is the similarity measure used in this work. d M ( x i , x j ) = ( x i − x j ) T M ( x i − x j ) (1)where M is a positive semi-defined symmetric matrix thatcan be decomposed as M = T T T , where T denotes a lineartransformation matrix. Equation 1 can be rewritten as: d M ( x i , x j ) = ( T ( x i − x j )) T T ( x i − x j ) (2) = (cid:107) T ( x i − x j ) (cid:107) = (cid:107) x i (cid:48) − x j (cid:48) (cid:107) (3)The linear transformation T can be generalized as Φ( x i ) ,where Φ indicates a kernel function. The resulting distancemetric is as follows: d M ( x i , x j ) = (cid:107) Φ( x i ) − Φ( x j ) (cid:107) (4)The process to determine the transformed vector Φ( x ) ,requires to find a transformation that makes the intra-classdistance smaller than the inter-class distance. The generalrule which is applied over the data set consists in thefollowing triplets S T : S T = { ( x a , y a ) , ( x n , y n ) , ( x p , y p ) | y a = y p , y a (cid:54) = y n } (5)where a, p are samples belonging to the same class, and n isa sample from a different class. In our Parkinson detectionexperiments, the number of classes is two (healthy andParkinson). However, we propose to introduce an additionalrestriction in the triplet. In our experiments, a, p belongto the same class, but present different face expression.in this way, we introduce facial gestures into the learningobjective. The generation of the triplet S T can be seen as adata augmentation technique. The high number of possiblecombinations of three elements in a dataset enriches thetraining process, especially when low number of samplesare available. The triple loss function to be minimized isdefined as: L = (cid:88) S T [ d M ( x a , x p ) − d M ( x a , x n ) + α ] + (6)where [ z ] + = max( z, , and α ≥ is the minimum marginrequired between classes. The automatic classification between healthy people andPD patients is performed using Support Vector Ma-chines (SVMs). The classification of patients with dif-ferent degree of impairment is performed using SVMs
ANUSCRIPT, DECEMBER 2020 6 optimized in a one vs. all strategy. In the binaryclassification experiments with SVMs, linear and Gaus-sian kernels are considered. The optimization of hyper-parameters is performed in a search grid up to pow-ers of ten with C ∈ { − , − , − , . . . , , } and γ ∈ { − , − , − , . . . , } for the Gaussian kernel,and for the linear kernel the search considered C ∈{ − , , , , , } . In the multi-class classificationonly linear kernels were considered. Optimization and eval-uation of the models is performed following a 5-folds cross-validation strategy. Results of the binary classification arereported in terms of accuracy (Acc), sensitivity (Sens), speci-ficity (Spec), F1-Score (F1), and Area Under the receiveroperating characteristic Curve (AUC). Results of the multi-class classification are reported in terms of accuracy (Acc),F1-Score (F1), Kappa coefficient ( κ ), and confusion matrix. Inall of the cases, results include values of the optimal hyper-parameters which are found as the mode along the parame-ters considered along the test folds of each experiment. XPERIMENTS AND R ESULTS
FacePark-GITA includes 5 videos for each participant. Eachvideo corresponds to a different facial expression: smile,anger, surprise, left eye wink, and right eye wink. Fiveframes per video-task were extracted with the softwareAffectiva . The curve of valence provided by the softwareis used as the criterion to select the following sequenceof five images/frames per participant on each expression:(i) Neutral; (ii) transition from Neutral to the Apex (i.e.,onset); (iii) Apex; (iv) transition from the Apex to Neutral(i.e., offset); and (v) Neutral. The sequence of images andtheir direct relation with the valence curve are illustrated inFigure 2. Individual frames corresponding to each valence levelshown in Figure 2 are considered to evaluate whetherspecific frames provide relevant information to discriminatebetween PD patients and healthy subjects. Feature vectorsare obtained from the last layer of the ResNet50 model (seeSection 3.2.1). Table 2 summarizes the results.Table 2: Results of classification using a single image fromthe extracted image sequence.
E.S. Kernel* Acc[%] Sens[%] Spec[%] F1[%]
Neutral C =1e+01; γ =1e-04 69.0 ± ± ± ± C =1e+01; γ =1e-04 70.0 ± ± ± ± C =1e+01; γ =1e-04 71.4 ± ± ± ± C =1e+01; γ =1e-04 71.6 ± ± ± ± C =1e-03 70.8 ± ± ± ± C =1e-03 70.8 ± ± ± ± Onset C =1e-02 72.9 ± ± ± ± Offset C =1e-01 72.8 ± ± ± ± E.S. : Expression stage. First three rows: Gaussian kernel. Last three rows: Linear kernel.*Column with optimal hyper-parameters.
Note that there is almost no difference among the ac-curacies obtained with the frames of each expression stage.
Table 3: Results of the classification using different combi-nations of the extracted frames sequences
Sequences Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+02; γ =1e-04 77.4 ± ± ± ± C =1e+01; γ =1e-04 76.3 ± ± ± ± C =1e+01; γ =1e-04 77.2 ± ± ± ± C =1e-03 78.2 ± ± ± ± C =1e-03 77.8 ± ± ± ± NOnAOffN C =1e-03 78.4 ± ± ± ± First three rows: Gaussian kernel. Last three rows: Linear kernel.*Column with optimal hyper-parameters.
Perhaps the only thing to highlight is the high sensitivity(88,6%) of the Onset stage, which likely indicates that thisstage is maybe a good choice to model hypomimia inspecific frames within a video. This preliminary observationwill be further elaborated in the next experiments.
Given the small amount of information provided by individ-ual frames, we evaluate the use of multi-frame sequences ina simple information fusion architecture based score fusion[17] as a way to capture changes during the production offacial expressions. The general idea was already studiedin [36] for speech signals, where the author hypothesisedthat PD patients have more difficulties to start or stop themovement of muscles and limbs during speech production.The idea was later extended to other movements like hand-writing and gait [51].As in the case of speech, gait, and handwriting, we be-lieve that the same hypothesis holds during the productionof facial expressions. Thus, the analysis of multiple-framesduring the production of facial expressions should provideuseful information to discriminate between PD patients andhealthy subjects. The following multi-frame sequences areconsidered: • NOnA: Neutral, Onset, and Apex. • AOffN: Apex, Offset, and Neutral. • NOnAOffN: Neutral, Onset, Apex, Offset, and Neu-tral.Table 3 shows the results obtained when the changesin the production of facial expressions are incorporated byfeature vectors extracted from multi-frame sequences.The results obtained by the affective sequences are betterthan those obtained with individual frames. The improve-ment is around 7% and the best result is obtained with thetwo cases where the sequence NOnA is included, which isfocused on modeling information in the transition betweenneutral and the production of a certain expression. It isalso worth to highlight that sensitivity is near 90% in allof the cases, while specificity is rather low (around 64%).This indicates that the proposed approach is good to detectpatients but not as good to detect healthy controls.This result validates the hypothesis H1 about the exis-tence of useful information related to hypomimia in theevoked facial expressions. Given this clear improvement, thenext experiments will include only feature vectors extractedfrom multi-frame sequences. ANUSCRIPT, DECEMBER 2020 7
Figure 2: Emotion stages according to the evoked valence measured with the Affectiva tool. (left) Healthy woman 63 yearsold; (right) Woman with Parkinson’s disease, 67 years old, facial expression item = 2.
AU1: Inner BrownRaiser AU2: Outer BrownRaiserAU4: Brow Lowerer AU5: Upper LidRaiserAU6: Check Raiser AU12: Lip CornerPullerAU25: Lips Part AU26: Jaw Drop
Figure 3: Action Units defined for the Experiment 2. Source:[19].
This experiment intends to incorporate information fromthe Affective domain to improve Parkinson’s Disease (PD)detection. In this case the EmotioNet database is used to cre-ate an appropriate facial representation space. The first stepconsists in selecting AUs that provide suitable informationto perform the automatic classification between PD patientsand healthy subjects. We selected a subset of AUs accordingto [14] adequate for the facial expressions included in therecording tasks of the FacePark-GITA database. Figure 3shows the set of selected AUs.
The process to adapt the convolutional models from onedomain to another consists in freezing different percentagesof the layers and retraining the remaining portion. The datawith the selected AUs from the EmotioNet dataset are usedhere to retrain the models. In this case we evaluate three percentages of layers frozen during the retraining of theResNet50 model (originally trained for Face Recognition):freezing 50% (Freeze 50), freezing 75% (Freeze 75), andfreezing 100%. Note that the freezing 100% model is taken asthe Baseline and corresponds to the case where no affectiveinformation is incorporated ( x FR ). After the convolutionallayers, a fully connected layer is added for the classificationof the 8 selected AUs (see Figure 3). The result of theretraining process and its performance to classify the AUsis shown in Table 4 in terms of AUC and EER values. Theaccuracy varies depending of the FAU and the percentageof layers frozen. The FAUs numbers , , and reachedaccuracies around , while the rest of the FAUs achievedperformances around .The representations x AF obtained by the retrained mod-els are further used to classify between PD patients andhealthy subjects of the FacePark-GITA corpus. The resultsobtained with the Freeze 75 and Freeze 50 models are shownin Table 5 and Table 6, respectively. The results for theBaseline model correspond to those previously shown inTable 3. Optimal hyper-parameters found in the 5-fold cross-validation process are also included in every experiment.Note that the Freeze 75 exhibits higher accuracies thanthe Freeze 50, indicating that considerable information fromthe Face Recognition domain is still useful to obtain goodresults in the classification between PD patients and healthysubjects. More interestingly, note that the best accuracyobtained with the Freeze 75 model in Table 5 ( . ) is . higher than the best result obtained when only aFace Recognition model is considered (Table 3). This resultsupports our second hypothesis ( H2 ), the idea of incorpo-rating information from the Affective domain to the FaceRecognition domain to improve detection of hypomimia inPD patients. The benefits of including information of theAffective domain are also shown in Figure 4, where the ROCcurves obtained with the Freeze 75, Freeze 50, and Baselinemodels are presented.Note that the models used until this point of the studyare based on architectures originally trained for Face Recog-nition tasks (ResNet50). Now we want to evaluate theimportance of this initialization based on a Face Recognitiontraining processes. ANUSCRIPT, DECEMBER 2020 8
Table 4: FAU detection results of the VGGFace2 model after retraining with the EmotioNet database.
Models Metrics AU 1 AU 2 AU 4 AU 5 AU 6 AU 12 AU 25 AU 26
Baseline ( x FR ) AUC 0.83 0.83 0.87 0.80 0.94 0.95 0.92 0.80EER [%] 24.58 23.78 21.01 27.13 12.82 12.11 15.38 27.32Freeze 75 ( x AF ) AUC 0.84 0.84 0.86 0.84 0.92 0.93 0.95 0.85EER [%] 21.84 20.80 19.90 21.65 14.34 10.42 8.63 22.48Freeze 50 ( x AF ) AUC 0.84 0.87 0.87 0.87 0.93 0.95 0.90 0.83EER [%] 20.56 19.29 18.92 19.53 13.22 10.58 10.99 24.32 Baseline (a) NOnA.
Baseline (b) AOffN.
Baseline (c) NOnAOffN.
Figure 4: PD classification ROC curves obtained from the different input sequences in the retrained Freeze models.Table 5: PD classification results using the Freeze 75 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+01; γ =1e-04 84.2 ± ± ± ± C =1e+02; γ =1e-04 81.6 ± ± ± ± C =1e+02; γ =1e-04 86.7 ± ± ± ± C =1e-01 84.7 ± ± ± ± C =1e-01 82.6 ± ± ± ± NOnAOffN C =1e-01 87.3 ± ± ± ± First three rows: Gaussian kernel. Last three rows: Linear kernel.*Column with optimal hyper-parameters.
Table 6: PD classification results using the Freeze 50 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]NOnA C =1e+01; γ =1e-04 83.1 ± ± ± ± AOffN C =1e+01; γ =1e-04 81.3 ± ± ± ± C =1e+00; γ =1e-04 81.9 ± ± ± ± C =1e-01 82.1 ± ± ± ± C =1e-01 80.0 ± ± ± ± C =1e-01 80.2 ± ± ± ± The previous scenario studied the performance of pre-trained models with high number of parameters learnedfrom the Face Recognition domain after adaptation to theAffective domain. In this section we will train FAU detectionmodels from scratch. ResNet50 requires to optimize morethan M parameters. Conversely, the VGG-8 and ResNet-7 architectures proposed in Section 3.2.2 require the opti-mization of , and , parameters respectively. Table 7: FAU detection results of the VGG-8 and ResNet-7training with EmotioNet database Models Metrics AU 1 AU 2 AU 4 AU 5 AU 6 AU 12 AU 25 AU 26ResNet-7
AUC 0.92 0.93 0.91 0.91 0.96 0.97 0.97 0.91EER [%] 15.25 14.21 16.20 13.58 10.05 8.42 7.39 16.32
VGG-8
AUC 0.89 0.87 0.89 0.90 0.96 0.96 0.96 0.90EER [%] 16.59 16.08 16.88 14.87 9.51 8.11 7.83 16.55
Table 8: PD classification results using the VGG-8 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+01; γ =1e-02 58.3 ± ± ± ± C =1e+01; γ =1e-03 65.6 ± ± ± ± C =1e+01; γ =1e-04 62.7 ± ± ± ± C =1e-02 67.4 ± ± ± ± AOffN C =1e-02 67.6 ± ± ± ± NOnAOffN C =1e-02 64.9 ± ± ± ± These reduced architectures are trained with the same dataas those considered previously to retrain the Freeze 50and Freeze 75 models. Table 7 shows the results with theAUC values obtained when the different AUs are detected.Note that these results are higher than those reported inTable 4 where greater number of parameters are optimized.However, the ResNet50 was originally trained for FaceRecognition tasks, where face gestures are features to beexcluded from the representation space. This result indicatesthat a simpler model might provide high enough AUsdiscrimination performance to be used in the classificationbetween PD patients and healthy controls.
ANUSCRIPT, DECEMBER 2020 9
Figure 5: Comparison between PD classification ROC curvesobtained using the NOnAOffN sequence in the Freeze 75,ResNet-7 and VGG-8.Table 9: PD classification results using the ResNet-7 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+03; γ =1e-04 73.0 ± ± ± ± C =1e+01; γ =1e-02 73.4 ± ± ± ± NOnAOffN C =1e+03; γ =1e-04 78.8 ± ± ± ± NOnA C =1e-02 74.1 ± ± ± ± C =1e-02 72.4 ± ± ± ± C =1e-01 78.3 ± ± ± ± Table 8 and Table 9 show the results obtained whenthe aforementioned models, created with the reduced ar-chitectures, are used to discriminate between PD patientsand healthy subjects. Note that no additional training isperformed with data from Parkinson’s disease patients. Thebest results are obtained when the ResNet-7 architectureis considered with features extracted from the NOnAOffNsequence. Although . could be considered a goodaccuracy, it is still far from the best result obtained withthe ResNet50 Freeze 75 model ( . in Table 5), indicatingthat the FAU domain is missing certain features present inthe Face Recognition domain.Figure 5 shows three ROC curves where results withFreeze 75, ResNet-7, and VGG-8 are compared. The superi-ority of the Freeze 75 model is clearly observed, supportingthe advantages of initializing the models using the FaceRecognition domain. The triplet loss function is explored for learning in thisexperiment with the aim to evaluate whether the classi-fication performance of PD patients vs. Healthy Control(HC) subjects can be improved with respect to previousexperiments. The triplet loss function modifies the originalrepresentation space such that the inter-class separability isincreased while the intra-class separability is reduced. Themodified feature vectors are called embedded vectors . Table 10: PD classification results of classification with theTriplet 75 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+01; γ =1e-04 85.2 ± ± ± ± C =1e+01; γ =1e-04 86.0 ± ± ± ± C =1e+01; γ =1e-04 86.0 ± ± ± ± C =1e-01 84.4 ± ± ± ± C =1e-01 85.0 ± ± ± ± NOnAOffN C =1e-01 86.1 ± ± ± ± First three rows: Gaussian kernel. Last three rows: Linear kernel.*Column with optimal hyper-parameters.
Table 11: PD classification results of classification with theTriplet 50 model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+01; γ =1e-04 78.9 ± ± ± ± C =1e+03; γ =1e-04 73.2 ± ± ± ± C =1e+02; γ =1e-04 75.8 ± ± ± ± NOnA C =1e-01 80.7 ± ± ± ± AOffN C =1e-01 76.3 ± ± ± ± C =1e-01 77.1 ± ± ± ± The Freeze 75 and Freeze 50 models are trained with thetriplet loss function strategy and two new models are ob-tained, namely Triplet 75 and Triplet 50, respectively. TheFacePark-GITA database is divided into a 5-fold partitionfor the training of each Triplet model and the SVM classifier.The classification results obtained when using the embed-ded vectors are shown in Table 10 for the Triplet 75 model,and in Table 11 for the Triplet 50 model.Note that the Triplet 75 model exhibits better accuracy( . ) than the Triplet 50 ( . ). Since the best accu-racies in the previous experiments with the Freeze 75 andFreeze 50 models were . and . , these new resultsobtained with the triplet loss strategy likely indicate thatthe embedding approach does not provide advantages overthe use of transfer learning and freezing of layers. Thisobservation is also supported in the fact that the numberof parameters to be optimized has not been reduced, soin principle, there is no reason for using the triplet lossfunction in these two scenarios. In this experiment the VGG-8 and ResNet-7 models areretrained considering the triplet loss function, creating twonew models, namely Triplet-VGG8 and Triplet-ResNet7,respectively. These new models are used to extract embed-ded vectors for further classification between PD patientsand healthy subjects. The results obtained with the Triplet-VGG8 and Triplet-ResNet7 embedded vectors are shown inTable 12 and Table 13, respectively.Note that there is an improvement in both modelscompared to those based on VGG-8 and ResNet-7 wherethe triplet loss function was not applied. In the first casethe improvement is around . (from . to . )and in the second case is around . (from . to ANUSCRIPT, DECEMBER 2020 10 (a) ResNet50 ( x FR ) (b) ResNet50+FAU ( x AF ) (c) Triplet-ResNet7 ( x PD ) Figure 6: (Up) Principal components spaces generated from the features of the different models and (Bottom) scoredistributions of PD patients and Healthy Control (HC) subjects obtained by the SVM classifier.Table 12: PD classification results using the Triplet-VGG8model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+01; γ =1e-04 71.2 ± ± ± ± C =1e+03; γ =1e-03 69.9 ± ± ± ± C =1e+00; γ =1e-03 66.0 ± ± ± ± NOnA C =1e-02 72.7 ± ± ± ± AOffN C =1e+01 70.3 ± ± ± ± C =1e+01 65.3 ± ± ± ± Table 13: PD classification results using the Triplet-ResNet7model.
Sequence Kernel* Acc[%] Sens[%] Spec[%] F1[%]
NOnA C =1e+03; γ =1e-04 82.1 ± ± ± ± C =1e+02; γ =1e-03 78.2 ± ± ± ± C =1e-01; γ =1e-03 69.9 ± ± ± ± NOnA C =1e-01 82.4 ± ± ± ± AOffN C =1e-01 76.2 ± ± ± ± C =1e-02 79.6 ± ± ± ± . ). These results partially validates our third hypoth-esis ( H3 ) indicating that loss functions designed to learnfrom the PD domain serve to improve the performance ofPD classification. It is not only interesting to highlight theimprovement achieved when using the triplet loss function,but also to note that the best result obtained with theTriplet-ResNet7 model is competitive compared to the bestaccuracy previously obtained with the Freeze 75 model.Although the accuracy in the second one is . above thefirst one, Freeze 75 requires , , more parameters to be optimized than Triplet-ResNet7, which might indicate abetter generalization capability. Further experiments withadditional data are required to validate this hypothesis.PCA is now used to create a 2D representation of thefeature spaces learned in previous experiments. Figure 6shows the feature spaces and the distribution of the clas-sification scores. The figure shows a superior discriminationcapability of the x AF feature space (ResNet50 adapted to theFAU domain). The representation obtained by the Triplet-ResNet7 model shows a larger margin between classes butthe misclassification errors decrease the performance. Given the promising results obtained with the above pre-sented experiments in the automatic discrimination be-tween PD patients and healthy subjects, especially with theFreeze 75 and the Triplet-ResNet7 models, with accuraciesof . and . , respectively, we want to evaluate inthis section the suitability of those models to discriminatebetween three different degrees of impairment: mild (PD-1),intermediate (PD-2), and severe (PD-3). These three groupsare defined considering the scores of the MDS-UPDRS-III provided by the expert neurologist. The mild groupincludes patients with scores in the range from 0 to 23,the intermediate group is defined for patients with scoresbetween 23 and 33, and the severe group for patients withscores greater than 33. Figure 7 shows the distribution of theMDS-UPDRS-III scores for the three groups of patients.The tri-class classification experiments are performedconsidering the feature vectors extracted with the Freeze75 model on the NOnAOffN sequence, and the Triplet-ResNet7 model on the NOnA sequence. Optimization ofhyper-parameters is performed as indicated in Section 3.3. ANUSCRIPT, DECEMBER 2020 11
Figure 7: Distribution of the neurological state of the Parkin-son’s patients according to their score in the MDS-UPDRS-III scale.Table 14: Confusion matrix for the classification of PDpatients with different degree of impairments using theFreeze 75 model with feature vectors extracted from theNOnAOffN sequence.
SVM: C =1e-03PD-1 PD-2 PD-3PD-1 45.80 35.30 18.90PD-2 33.47 42.26 24.27PD-3 15.45 39.49 45.06SVM: Acc= 44%, F1= 0.45, κ = 0.17. The confusion matrices and results obtained with the Freeze75 and the Triplet-ResNet7 models are shown in Table 14and Table 15, respectively. Values of accuracy, F1 score, and κ index are included in the bottom part of each table. Theseresults show that the Freeze 75 model is better than theTriplet-ResNet7. There is a difference of points in theaccuracy, and . in the F1 score when using the SVM.The accuracies obtained are around for a prob-lem with three classes ( random chance). These resultsare far to be optimal but suggest there is certain usefulinformation in the models that can help to estimate thedegree of impairment (our fourth hypothesis H4 ). Figure 8shows the projected space. Note that there is a relativelyclear separability between mild and severe patients. In themiddle between these two groups there are samples of theintermediate group, which are not accurately classified butclearly appear in between, which likely indicates that theproposed approach is able to find a trend regarding theneurological state of the patients. ISCUSSION AND C ONCLUSION
This study presents a novel approach where deep learningmethods are used to model hypomimia in PD patients.Videos with the face of people while evoking emotions areconsidered for the study. Frames of the recorded videosare segmented in different stages during the productionof evoked emotions: neutral, onset-transition, apex, offset-transition, and neutral. This approach exhibits improve- Table 15: Confusion matrix for the classification of PDpatients with different degree of impairments using theTriplet-ResNet7 model with feature vectors extracted fromthe NOnA sequence.
SVM: C =1e-03PD-1 PD-2 PD-3PD-1 30.07 39.60 30.33PD-2 34.60 20.53 44.87PD-3 10.28 18.62 71.10SVM: Acc= 40%, F1= 0.38, κ = 0.11. Figure 8: Principal component space created with featurevectors of the Freeze 75 model of PD patients of three groupswith different degree of impairment.ments of up to 5.5% in accuracy (from 72.9% to 78.4%)with respect to classical approaches where single frames areconsidered. These results suggest that dynamics informationis more suitable to model hypomimia in PD patients. Weare aware of the fact that the presented approach does notcompletely exploit the video dynamics; however, the incor-poration of frames in different stages during the productionof emotions shows to be a good and computationally afford-able approach.Later, information from the Affective domain is incorpo-rated in the model by means of transfer learning methods.Transfer learning was performed considering the completearchitecture of a base model previously trained with mas-sive data and then freezing some layers to fine-tune theremaining layers with the smaller emotion data. Resultsfreezing 75% and 50% of the layers are reported. The resultsshow that the Affective domain adaptation provides animprovement of 8.9%, from 78.4% to 87.3% of accuracy inPD detection. These results confirm that domain adaptationvia transfer learning methods is a good strategy to modelhypomimia in PD patients. Considering the good resultsand also the fact that only up to four images per participantare considered in the experiments, we believe that this workis a step forward in the development of inexpensive compu-tational systems suitable to model and quantify problems ofPD patients to express emotions.With the aim of finding lighter approaches suitable to be
ANUSCRIPT, DECEMBER 2020 12 used in portable devices, other experiments with reducedarchitectures like VGG-8 and ResNet7 were also addressed,however, the results were not as good as before, the maxi-mum achieved accuracies in this case were 67.6% and 78.8%,respectively. These results were further improved up to72.7% and 82.4% when the triplet loss strategy is consideredfor training the VGG-8 and ResNet7 models, respectively.Finally, the neurological state of the patients was eval-uated considering the best approach found in the classifi-cation experiments. The patients were grouped into threegroups according to their MDS-UPDRS-III scores and a tri-class classification strategy showed a maximum classifica-tion accuracy of 44% (F1=0.45).Future work includes more sophisticate methods to inte-grate the information provided by the full video sequences,including video tracking of facial features. We will alsoinvestigate multiple classifier approaches to combine theinformation provided by face videos for PD detection andPD impairment estimation with other sources of informa-tion [17], like speech, gait, handwriting [16], and human-computer interaction signals [1]. A CKNOWLEDGEMENTS
The authors would like to thank the patients of the Parkin-son’s Foundation in Medell´ın, Colombia (Fundalianza ) fortheir cooperation during the development of this study. Thestudy was partially funded by CODI at Universidad deAntioquia grants COMPLIANCE WITH ETHICAL STANDARDS
Ethical approval
All of the signals considered in this work were collected incompliance with the Helsinki Declaration and the procedurewas approved by the Ethics Committee at the University ofAntioquia in Medell´ın, Colombia. Written informed consentwas signed by each participant. R EFERENCES [1] Alejandro Acien et al. “Smartphone Sensors For Mod-eling Human-Computer Interaction: General OutlookAnd Research Datasets For User Authentication”. In:
IEEE Conf. on Computers, Software, and Applications(COMPSAC) . July 2020.[2] R. Almutiry et al. “Facial Behaviour Analysis inParkinson’s Disease”. In:
Lecture Notes in ComputerScience
Movement Disor-ders [4] A. Bandini, S. Orlandi, H. J. Escalante, F. Giovannelli,et al. “Analysis of facial expressions in Parkinson’sdisease through video-based automatic methods”. In:
Journal of Neuroscience Methods
281 (2017), pp. 7–20.[5] M. Bologna et al. “Facial bradykinesia”. In:
J NeurolNeurosurg Psychiatry
Journal of the Interna-tional Neuropsychological Society
12 (2006), pp. 765–773.[7] R. Cacabelos. “Parkinson’s disease: from pathogene-sis to pharmacogenomics”. In:
International Journal ofMolecular Sciences . IEEE. 2018, pp. 67–74.[9] R. Castrillon et al. “Characterization of the Handwrit-ing Skills as a Biomarker for Parkinson Disease”. In:
IEEE Intl. Conf. on Automatic Face and Gesture Recogni-tion (FG) . May 2019.[10] O. Celiktutan, E. Skordos, and H. Gunes. “Multimodalhuman-human-robot interactions (mhhri) dataset forstudying personality and engagement”. In:
IEEETransactions on Affective Computing (2017).[11] C. De Stefano et al. “Handwriting analysis to supportneurodegenerative diseases diagnosis: A review”. In:
Pattern Recognition Letters
121 (2019), pp. 37–45.[12] V. Dentamaro, D. Impedovo, and G. Pirlo. “Gait Anal-ysis for Early Neurodegenerative Diseases Classifica-tion Through the Kinematic Theory of Rapid HumanMovements”. In:
IEEE Access
Psychological Bulletin
115 2 (1994), pp. 268–87.[14] P. Ekman, W. V. Friesen, and J. C. Hager. “Facialaction coding system: The manual on CD ROM”. In:
A Human Face, Salt Lake City (2002), pp. 77–254.[15] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M.Martinez. “Emotionet: An accurate, real-time algo-rithm for the automatic annotation of a million facialexpressions in the wild”. In:
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition .2016, pp. 5562–5570.[16] M. Faundez-Zanuy et al. “Handwriting Biometrics:Applications and Future Trends in e-Security and e-Health”. In:
Cognitive Computation (Aug. 2020).[17] J. Fierrez, A. Morales, R. Vera-Rodriguez, and D.Camacho. “Multiple Classifiers in Biometrics. Part 1:Fundamentals and Review”. In:
Information Fusion
Information Fusion
Facial Action Coding System:A technique for the measurement of facial movement.
ANUSCRIPT, DECEMBER 2020 13
Parkinson’s Disease”. In:
Journal of Parkinson’s Disease
Movement Disorders
Movement Disorders
IEEE Trans. on InformationForensics and Security
IEEEIntelligent Systems
Proceedings of the 12th ACMInternational Conference on PErvasive Technologies Re-lated to Assistive Environments . 2019, pp. 517–522.[26] S. S. Guill´en, L. L. Iacono, and C. Meder. “AffectiveRobots: Evaluation of Automatic Emotion RecognitionApproaches on a Humanoid Robot towards Emotion-ally Intelligent Machines”. In:
Energy
Cogent Psychology
Journal of Neuroscience Methods
Proceedings of IEEEComputer Society Conference on Computer Vision andPattern Recognition . 2016, pp. 770–778.[30] J. Kang, D. Derva, D. Y. Kwon, and C. Wallraven.“Voluntary and spontaneous facial mimicry towardother’s emotional expression in patients with Parkin-son’s disease”. In:
PloS One
IEEE Access
IEEE Transactions on AffectiveComputing
Mental Well-Being . Springer, 2013, pp. 91–109. [34] L. Moro-Velazquez et al. “Phonetic relevance andphonemic grouping of speech in the automatic detec-tion of Parkinson’s Disease”. In:
Scientific Reports
Digital Signal Processing
77 (2018), pp. 207–221.[36] J.R. Orozco-Arroyave.
Analysis of speech of people withParkinson’s disease . Logos-Verlag, Berlin, 2016.[37] A. Pampouchidou et al. “Automatic assessment of de-pression based on visual cues: A systematic review”.In:
IEEE Transactions on Affective Computing (2017).[38] S. J. Pan and Q. Yang. “A survey on transfer learning”.In:
IEEE Transactions on Knowledge and Data Engineer-ing
Image andVision Computing
British Machine Vision Conference(BMVC) . Swansea, UK, 2015.[41] A. Parnandi and R. Gutierrez-Osuna. “Visual biofeed-back and game adaptation in relaxation skill transfer”.In:
IEEE Transactions on Affective Computing
IAPR Intl. Conf. on Pattern Recognition (ICPR) . Jan.2021.[43] R. Ranjan et al. “Deep learning for understandingfaces: Machines may be just as good, or better, thanhumans”. In:
IEEE Signal Processing Magazine
Neuropsy-chologia
38 (2000), pp. 1026–1037.[45] C. D. Rios-Urrego et al. “Analysis and evaluation ofhandwriting in patients with Parkinson’s disease us-ing kinematic, geometrical, and non-linear features”.In:
Computer Methods and Programs in Biomedicine
Jour-nal of the International Neuropsychological Society
DomainAdaptation for Visual Understanding . Springer, 2020.[48] B. Sonawane and P. Sharma. “Review of automatedemotion-based quantification of facial expression inParkinson’s patients”. In:
The Visual Computer (June2020).[49] S. Spasojevi´c et al. “Quantitative Assessment of theArm/Hand Movements in Parkinson’s Disease Usinga Wireless Armband Device”. In:
Frontiers in Neurology
Forensic ScienceInternational
233 (2013), pp. 75–83.
ANUSCRIPT, DECEMBER 2020 14 [51] J.C. V´asquez-Correa et al. “Multimodal assessment ofParkinson’s disease: a deep learning approach”. In:
IEEE Journal of Biomedical and Health Informatics
Computational and Mathematical Methods in Medicine (2014).[53] M. Yan et al. “Vargfacenet: An efficient variable groupconvolutional neural network for lightweight facerecognition”. In:
Proceedings of the IEEE InternationalConference on Computer Vision Workshops . 2019.
Luis Felipe G ´omez-G ´omez receivedthe B.S. degree in TelecommunicationsEngineering from Universidad de An-tioquia, Medellin, Colombia in 2018.Currently, he is a Master student at theGITA Lab from the Universidad de An-tioquia. He has performed research ac-tivities related to signal processing andmachine learning for biometric applica-tions during the last three years, both with academic andindustrial partners. His research interests include imageprocessing, signal processing, pattern recognition, machinelearning, deep learning, biometrics signal processing andtheir applications in health-care.
Aythami Morales received the M.Sc.degree in Telecommunication Engineer-ing from the Universidad de Las Palmasde Gran Canaria, Spain in 2006 and thePh.D. degree from La Universidad deLas Palmas de Gran Canaria in 2011.Since 2017, he is Associate Professor af-filiated to the Biometrics and Data Pat-tern Analytics - BiDA Lab at the Universidad Aut ´onoma deMadrid. His research interests are focused on pattern recog-nition, computer vision, machine learning, and biometricssignal processing.
Juli´an Fierrez received the M.Sc.and Ph.D. degrees in telecommunica-tions engineering from the UniversidadPolit´ecnica de Madrid, Spain, in 2001and 2006, respectively. Since 2004 he isaffiliated to the Biometrics and Data Pat-tern Analytics - BiDA Lab at the Univer-sidad Aut ´onoma de Madrid where he isAssociate Professor . His research interests include signaland image processing, pattern recognition, and biometrics,with an emphasis on multibiometrics, biometric evaluation,system security, forensics, and mobile applications of bio-metrics.