Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roland Goecke is active.

Publication


Featured researches published by Roland Goecke.


ieee international conference on automatic face gesture recognition | 2011

Emotion recognition using PHOG and LPQ features

Abhinav Dhall; Akshay Asthana; Roland Goecke; Tamas Gedeon

We propose a method for automatic emotion recognition as part of the FERA 2011 competition. The system extracts pyramid of histogram of gradients (PHOG) and local phase quantisation (LPQ) features for encoding the shape and appearance information. For selecting the key frames, K-means clustering is applied to the normalised shape vectors derived from constraint local model (CLM) based face tracking on the image sequences. Shape vectors closest to the cluster centers are then used to extract the shape and appearance features. We demonstrate the results on the SSPNET GEMEP-FERA dataset. It comprises of both person specific and person independent partitions. For emotion classification we use support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compare our results to the pre-computed FERA 2011 emotion challenge baseline.


international conference on computer vision | 2011

Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark

Abhinav Dhall; Roland Goecke; Simon Lucey; Tamas Gedeon

Quality data recorded in varied realistic environments is vital for effective human face related research. Currently available datasets for human facial expression analysis have been generated in highly controlled lab environments. We present a new static facial expression database Static Facial Expressions in the Wild (SFEW) extracted from a temporal facial expressions database Acted Facial Expressions in the Wild (AFEW) [9], which we have extracted from movies. In the past, many robust methods have been reported in the literature. However, these methods have been experimented on different databases or using different protocols within the same databases. The lack of a standard protocol makes it difficult to compare systems and acts as a hindrance in the progress of the field. Therefore, we propose a person independent training and testing protocol for expression recognition as part of the BEFIT workshop. Further, we compare our dataset with the JAFFE and Multi-PIE datasets and provide baseline results.


international conference on computer vision | 2007

A Nonlinear Discriminative Approach to AAM Fitting

Jason M. Saragih; Roland Goecke

The Active Appearance Model (AAM) is a powerful generative method for modeling and registering deformable visual objects. Most methods for AAM fitting utilize a linear parameter update model in an iterative framework. Despite its popularity, the scope of this approach is severely restricted, both in fitting accuracy and capture range, due to the simplicity of the linear update models used. In this paper, we present an new AAM fitting formulation, which utilizes a nonlinear update model. To motivate our approach, we compare its performance against two popular fitting methods on two publicly available face databases, in which this formulation boasts significant performance improvements.


affective computing and intelligent interaction | 2009

Evaluating AAM fitting methods for facial expression recognition

Akshay Asthana; Jason M. Saragih; Michael Wagner; Roland Goecke

The human face is a rich source of information for the viewer and facial expressions are a major component in judging a persons affective state, intention and personality. Facial expressions are an important part of human-human interaction and have the potential to play an equally important part in human-computer interaction. This paper evaluates various Active Appearance Model (AAM) fitting methods, including both the original formulation as well as several state-of-the-art methods, for the task of automatic facial expression recognition. The AAM is a powerful statistical model for modelling and registering deformable objects. The results of the fitting process are used in a facial expression recognition task using a region-based intermediate representation related to Action Units, with the expression classification task realised using a Support Vector Machine. Experiments are performed for both person-dependent and person-independent setups. Overall, the best facial expression recognition results were obtained by using the Iterative Error Bound Minimisation method, which consistently resulted in accurate face model alignment and facial expression recognition even when the initial face detection used to initialise the fitting procedure was poor.


acm multimedia | 2013

Diagnosis of depression by behavioural signals: a multimodal approach

Nicholas Cummins; Jyoti Joshi; Abhinav Dhall; Vidhyasaharan Sethu; Roland Goecke; Julien Epps

Quantifying behavioural changes in depression using affective computing techniques is the first step in developing an objective diagnostic aid, with clinical utility, for clinical depression. As part of the AVEC 2013 Challenge, we present a multimodal approach for the Depression Sub-Challenge using a GMM-UBM system with three different kernels for the audio subsystem and Space Time Interest Points in a Bag-of-Words approach for the vision subsystem. These are then fused at the feature level to form the combined AV system. Key results include the strong performance of acoustic audio features and the bag-of-words visual features in predicting an individuals level of depression using regression. Interestingly, in the context of the small amount of literature on the subject, is that our feature level multimodal fusion technique is able to outperform both the audio and visual challenge baselines.


Journal on Multimodal User Interfaces | 2013

Multimodal assistive technologies for depression diagnosis and monitoring

Jyoti Joshi; Roland Goecke; Sharifa Alghowinem; Abhinav Dhall; Michael Wagner; Julien Epps; Gordon Parker; Michael Breakspear

Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework’s effectiveness in depression analysis.


international conference on acoustics, speech, and signal processing | 2002

Noisy audio feature enhancement using audio-visual speech data

Roland Goecke; Gerasimos Potamianos; Chalapathy Neti

We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speakers face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by mean square error estimation of the clean audio features in a training stage. The performance of the enhanced audio features is evaluated on two ASR tasks: A connected digits task and speaker-independent, large-vocabulary, continuous speech recognition. In both cases and at sufficiently low signal-to-noise ratios (SNRs), ASR trained on the enhanced audio features significantly outperforms ASR trained on the noisy audio, achieving for example a 46% relative reduction in word error rate on the digits task at −3.5 dB SNR. However, the method fails to capture the full visual modality benefit to ASR, as demonstrated by its comparison to discriminant audio-visual feature fusion introduced in previous work.


ieee international conference on automatic face gesture recognition | 2013

Can body expressions contribute to automatic depression analysis

Jyoti Joshi; Roland Goecke; Gordon Parker; Michael Breakspear

Depression is one of the most common mental health disorders with strong adverse effects on personal and social functioning. The absence of any objective diagnostic aid for depression leads to a range of subjective biases in initial diagnosis and ongoing monitoring. Psychologists use various visual cues in their assessment to quantify depression such as facial expressions, eye contact and head movements. This paper studies the contribution of (upper) body expressions and gestures for automatic depression analysis. A framework based on space-time interest points and bag of words is proposed for the analysis of upper body and facial movements. Salient interest points are selected using clustering. The major contribution of this paper lies in the creation of a bag of body expressions and a bag of facial dynamics for assessing the contribution of different body parts for depression analysis. Head movement analysis is performed by selecting rigid facial fiducial points and a new histogram of head movements is proposed. The experiments are performed on real-world clinical data where video clips of patients and healthy controls are recorded during interactive interview sessions. The results show the effectiveness of the proposed system to evaluate the contribution of various body parts in depression analysis.


ieee intelligent vehicles symposium | 2007

Visual Vehicle Egomotion Estimation using the Fourier-Mellin Transform

Roland Goecke; Akshay Asthana; Niklas Pettersson; Lars Petersson

This paper is concerned with the problem of estimating the motion of a single camera from a sequence of images, with an application scenario of vehicle egomotion estimation. Egomotion estimation has been an active area of research for many years and various solutions to the problem have been proposed. Many methods rely on optical flow or local image features to establish the spatial relationship between two images. A new method of egomotion estimation is presented which makes use of the Fourier-Mellin Transform for registering images in a video sequence, from which the rotation and translation of the camera motion can be estimated. The Fourier-Mellin Transform provides an accurate and efficient way of computing the camera motion parameters. It is a global method that takes the contributions from all pixels into account. The performance of the proposed approach is compared to two variants of optical flow methods and results are presented for a real-world video sequence taken from a moving vehicle.


international conference on acoustics, speech, and signal processing | 2013

Detecting depression: A comparison between spontaneous and read speech

Sharifa Alghowinem; Roland Goecke; Michael Wagner; Julien Epps; Michael Breakspear; Gordon Parker

Major depressive disorders are mental disorders of high prevalence, leading to a high impact on individuals, their families, society and the economy. In order to assist clinicians to better diagnose depression, we investigate an objective diagnostic aid using affective sensing technology with a focus on acoustic features. In this paper, we hypothesise that (1) classifying the general characteristics of clinical depression using spontaneous speech will give better results than using read speech, (2) that there are some acoustic features that are robust and would give good classification results in both spontaneous and read, and (3) that a `thin-slicing approach using smaller parts of the speech data will perform similarly if not better than using the whole speech data. By examining and comparing recognition results for acoustic features on a real-world clinical dataset of 30 depressed and 30 control subjects using SVM for classification and a leave-one-out cross-validation scheme, we found that spontaneous speech has more variability, which increases the recognition rate of depression. We also found that jitter, shimmer, energy and loudness feature groups are robust in characterising both read and spontaneous depressive speech. Remarkably, thin-slicing the read speech, using either the beginning of each sentence or the first few sentences performs better than using all reading task data.

Collaboration


Dive into the Roland Goecke's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Akshay Asthana

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Julien Epps

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Jyoti Joshi

University of Canberra

View shared research outputs
Top Co-Authors

Avatar

Michael Breakspear

QIMR Berghofer Medical Research Institute

View shared research outputs
Top Co-Authors

Avatar

Tamas Gedeon

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Gordon Parker

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Sharifa Alghowinem

Australian National University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge