Sharifa Alghowinem
Australian National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sharifa Alghowinem.
Journal on Multimodal User Interfaces | 2013
Jyoti Joshi; Roland Goecke; Sharifa Alghowinem; Abhinav Dhall; Michael Wagner; Julien Epps; Gordon Parker; Michael Breakspear
Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework’s effectiveness in depression analysis.
affective computing and intelligent interaction | 2013
Sharifa Alghowinem; Roland Goecke; Michael Wagner; Gordon Parkerx; Michael Breakspear
Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also their families, friends and the economy overall. Our ultimate aim is to develop an automatic objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. Here, we analyse the performance of head pose and movement features extracted from face videos using a 3D face model projected on a 2D Active Appearance Model (AAM). In a binary classification task (depressed vs. non-depressed), we modelled low-level and statistical functional features for an SVM classifier using real-world clinically validated data. Although the head pose and movement would be used as a complementary cue in detecting depression in practice, their recognition rate was impressive on its own, giving 71.2% on average, which illustrates that head pose and movement hold effective cues in diagnosing depression. When expressing positive and negative emotions, recognising depression using positive emotions was more accurate than using negative emotions. We conclude that positive emotions are expressed less in depressed subjects at all times, and that negative emotions have less discriminatory power than positive emotions in detecting depression. Analysing the functional features statistically illustrates several behaviour patterns for depressed subjects: (1) slower head movements, (2) less change of head position, (3) longer duration of looking to the right, (4) longer duration of looking down, which may indicate fatigue and eye contact avoidance. We conclude that head movements are significantly different between depressed patients and healthy subjects, and could be used as a complementary cue.
international conference on acoustics, speech, and signal processing | 2013
Sharifa Alghowinem; Roland Goecke; Michael Wagner; Julian Epps; Tamas Gedeon; Michael Breakspear; Gordon Parker
Accurate detection of depression from spontaneous speech could lead to an objective diagnostic aid to assist clinicians to better diagnose depression. Little thought has been given so far to which classifier performs best for this task. In this study, using a 60-subject real-world clinically validated dataset, we compare three popular classifiers from the affective computing literature - Gaussian Mixture Models (GMM), Support Vector Machines (SVM) and Multilayer Perceptron neural networks (MLP) - as well as the recently proposed Hierarchical Fuzzy Signature (HFS) classifier. Among these, a hybrid classifier using GMM models and SVM gave the best overall classification results. Comparing feature, score, and decision fusion, score fusion performed better for GMM, HFS and MLP, while decision fusion worked best for SVM (both for raw data and GMM models). Feature fusion performed worse than other fusion methods in this study. We found that loudness, root mean square, and intensity were the voice features that performed best to detect depression in this dataset.
Archive | 2014
Sharifa Alghowinem; Majdah Alshehri; Roland Goecke; Michael Wagner
The automatic detection of human emotional states has been of great interest lately for its applications not only in the Human-Computer Interaction field, but also for its applications in psychological studies. Using an emotion elicitation paradigm, we investigate whether eye activity holds discriminative power for detecting affective states. Our emotion elicitation paradigm includes induced emotions by watching emotional movie clips and spontaneous emotions elicited by interviewing participants about emotional events in their life. To reduce gender variability, the selected participants were 60 female native Arabic speakers (30 young adults, and 30 mature adults). In general, the automatic classification results using eye activity were reasonable, giving 66% correct recognition rate on average. Statistical measures show statistically significant differences in eye activity patterns between positive and negative emotions. We conclude that eye activity, including eye movement, pupil dilation and pupil invisibility could be used as a complementary cues for the automatic recognition of human emotional states.
IEEE Transactions on Affective Computing | 2016
Sharifa Alghowinem; Roland Goecke; Michael Wagner; Julien Epps; Matthew P. Hyett; Gordon Parker; Michael Breakspear
An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.
conference of the international speech communication association | 2016
Sharifa Alghowinem; Roland Goecke; Julien Epps; Michael Wagner; Jeffrey F. Cohn
No studies have investigated cross-cultural and cross-language characteristics of depressed speech. We investigated the generalisability of a vocal biomarker-based approach to depression detection in clinical interviews recorded in three countries (Australia, the USA and Germany), two languages (German and English) and different accents (Australian and American). Several approaches to training and testing within and between datasets were evaluated. Using the same experimental protocol separately within each dataset, (cross-classification) accuracy was high. Combining datasets, high accuracy was high again and consistent across language, recording environment, and culture. Training and testing between datasets, however, attenuated accuracy. These finding emphasize the importance of heterogeneous training sets for robust depression detection.
affective computing and intelligent interaction | 2013
Sharifa Alghowinem
Clinical depression is a critical public health problem, with high costs associated to a persons functioning, mortality, and social relationships, as well as the economy overall. Currently, there is no dedicated objective method to diagnose depression. Rather, its diagnosis depends on patient self-report and the clinicians observation, risking a range of subjective biases. Our aim is to develop an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. In this PhD work, my approach is based on multimodal analysis, i.e. combinations of vocal affect, head pose and eye movement from a video-audio real-world clinically validated data. In addition, this work will investigate the cross-cultural generalization of depression characteristics from different languages and countries.
international conference on human-computer interaction | 2014
Sharifa Alghowinem; Sarah Alghuwinem; Majdah Alshehri; Areej Al-Wabil; Roland Goecke; Michael Wagner
The automatic detection of human affective states has been of great interest lately for its applications not only in the field of Human-Computer Interaction, but also for its applications in physiological, neurobiological and sociological studies. Several standardized techniques to elicit emotions have been used, with emotion eliciting movie clips being the most popular. To date, there are only four studies that have been carried out to validate emotional movie clips using three different languages (English, French, Spanish) and cultures (French, Italian, British / American). The context of language and culture is an underexplored area in affective computing. Considering cultural and language differences between Western and Arab countries, it is possible that some of the validated clips, even when dubbed, will not achieve similar results. Given the unique and conservative cultures of the Arab countries, a standardized and validated framework for affect studies is needed in order to be comparable with current studies of different cultures and languages. In this paper, we describe a framework and its prerequisites for eliciting emotions that could be used for affect studies on an Arab population. We present some aspects of Arab culture values that might affect the selection and acceptance of emotion eliciting video clips. Methods for rating and validating Arab emotional clips are presented to derive at a list of clips that could be used in the proposed emotion elicitation framework. A pilot study was conducted to evaluate a basic version of our framework, which showed great potential to succeed in eliciting emotions.
international conference on cross-cultural design | 2015
Nawal Al-Mutairi; Sharifa Alghowinem; Areej Al-Wabil
To study the variation in emotional responses to stimuli, different methods have been developed to elicit emotions in a replicable way. Using video clips has been shown to be the most effective stimuli. However, the differences in cultural backgrounds lead to different emotional responses to the same stimuli. Therefore, we compared the emotional response to a commonly used emotion eliciting video clips from the Western culture on Saudi culture with an initial selection of emotion eliciting Arabic video clips. We analysed skin physiological signals in response to video clips from 29 Saudi participants. The results of the validated English video clips and the initial Arabic video clips are comparable, which suggest that a universal capability of the English set to elicit target emotions in Saudi sample, and that a refined selection of Arabic emotion elicitation clips would improve the capability of inducing the target emotions with higher levels of intensity.
the florida ai research society | 2012
Sharifa Alghowinem; Roland Goecke; Michael Wagner; Julien Epps; Michael Breakspear; Gordon Parker