Barry-John Theobald | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barry-John Theobald is active.

Explore More

Publication

Featured researches published by Barry-John Theobald.

international conference on multimodal interfaces | 2007

The painful face: pain expression recognition using active appearance models

Ahmed Bilal Ashraf; Simon Lucey; Jeffrey F. Cohn; Tsuhan Chen; Zara Ambadar; Kenneth M. Prkachin; Patty Solomon; Barry-John Theobald

Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or not even possible, as in young children or the severely ill. Behavioral scientists have identified reliable and valid facial indicators of pain. Until now they required manual measurement by highly skilled observers. We developed an approach that automatically recognizes acute pain. Adult patients with rotator cuff injury were video-recorded while a physiotherapist manipulated their affected and unaffected shoulder. Skilled observers rated pain expression from the video on a 5-point Likert-type scale. From these ratings, sequences were categorized as no-pain (rating of 0), pain (rating of 3, 4, or 5), and indeterminate (rating of 1 or 2). We explored machine learning approaches for pain-no pain classification. Active Appearance Models (AAM) were used to decouple shape and appearance parameters from the digitized face images. Support vector machines (SVM) were used with several representations from the AAM. Using a leave-one-out procedure, we achieved an equal error rate of 19% (hit rate = 81%) using canonical appearance and shape features. These findings suggest the feasibility of automatic pain detection from video.

symposium on computer animation | 2012

Dynamic units of visual speech

Sarah Taylor; Moshe Mahler; Barry-John Theobald; Iain A. Matthews

We present a new method for generating a dynamic, concatenative, unit of visual speech that can generate realistic visual speech animation. We redefine visemes as temporal units that describe distinctive speech movements of the visual speech articulators. Traditionally visemes have been surmized as the set of static mouth shapes representing clusters of contrastive phonemes (e.g. /p, b, m/, and /f, v/). In this work, the motion of the visual speech articulators are used to generate discrete, dynamic visual speech gestures. These gestures are clustered, providing a finite set of movements that describe visual speech, the visemes. Dynamic visemes are applied to speech animation by simply concatenating viseme units. We compare to static visemes using subjective evaluation. We find that dynamic visemes are able to produce more accurate and visually pleasing speech animation given phonetically annotated audio, reducing the amount of time that an animator needs to spend manually refining the animation.

international conference on automatic face and gesture recognition | 2006

Evaluating error functions for robust active appearance models

Barry-John Theobald; Iain A. Matthews; Simon Baker

Active appearance models (AAMs) are generative parametric models commonly used to track faces in video sequences. A limitation of AAMs is they are not robust to occlusion. A recent extension reformulated the search as an iteratively re-weighted least-squares problem. In this paper we focus on the choice of error function for use in a robust AAM search. We evaluate eight error functions using two performance metrics: accuracy of occlusion detection and fitting robustness. We show for any reasonable error function the performance in terms of occlusion detection is the same. However, this does not mean that fitting performance is the same. We describe experiments for measuring fitting robustness for images containing real occlusion. The best approach assumes the residuals at each pixel are Gaussianally distributed, then estimates the parameters of the distribution from images that do not contain occlusion. In each iteration of the search, the error image is used to sample these distributions to obtain the pixel weights

Speech Communication | 2004

Near-videorealistic synthetic talking faces: implementation and evaluation

Barry-John Theobald; Ja Bangham; Iain A. Matthews; Gavin C. Cawley

The application of two-dimensional (2D) shape and appearance models to the problem of creating realistic synthetic talking faces is presented. A sample-based approach is adopted, where the face of a talker articulating a series of phonetically balanced training sentences is mapped to a trajectory in a low-dimensional model-space that has been learnt from the training data. Segments extracted from this trajectory corresponding to the synthesis units (e.g. triphones) are temporally normalised, blended, concatenated and smoothed to form a new trajectory, which is mapped back to the image domain to provide a natural, realistic sequence corresponding to the desired (arbitrary) utterance. The system has undergone early subjective evaluation to determine the naturalness of this synthesis approach. Described are tests to determine the suitability of the parameter smoothing method used to remove discontinuities introduced during synthesis at the concatenation boundaries, and tests used to determine how well long term coarticulation effects are reproduced during synthesis using the adopted unit selection scheme. The system has been extended to animate the face of a 3D virtual character (avatar) and this is also described.

international conference on multimodal interfaces | 2007

Real-time expression cloning using appearance models

Barry-John Theobald; Iain A. Matthews; Jeffrey F. Cohn; Steven M. Boker

Active Appearance Models (AAMs) are generative parametric models commonly used to track, recognise and synthesise faces in images and video sequences. In this paper we describe a method for transferring dynamic facial gestures between subjects in real-time. The main advantages of our approach are that: 1) the mapping is computed automatically and does not require high-level semantic information describing facial expressions or visual speech gestures. 2) The mapping is simple and intuitive, allowing expressions to be transferred and rendered in real-time. 3) The mapped expression can be constrained to have the appearance of the target producing the expression, rather than the source expression imposed onto the target face. 4) Near-videorealistic talking faces for new subjects can be created without the cost of recording and processing a complete training corpus for each. Our system enables face-to-face interaction with an avatar driven by an AAM of an actual person in real-time and we show examples of arbitrary expressive speech frames cloned across different subjects.

Language and Speech | 2009

Mapping and Manipulating Facial Expression

Barry-John Theobald; Iain A. Matthews; Michael Mangini; Jeffrey R. Spies; Timothy R. Brick; Jeffrey F. Cohn; Steven M. Boker

Nonverbal visual cues accompany speech to supplement the meaning of spoken words, signify emotional state, indicate position in discourse, and provide back-channel feedback. This visual information includes head movements, facial expressions and body gestures. In this article we describe techniques for manipulating both verbal and nonverbal facial gestures in video sequences of people engaged in conversation. We are developing a system for use in psychological experiments, where the effects of manipulating individual components of nonverbal visual behavior during live face-to-face conversation can be studied. In particular, the techniques we describe operate in real-time at video frame-rate and the manipulation can be applied so both participants in a conversation are kept blind to the experimental conditions.

international conference on computer vision | 2009

Robust facial feature tracking using selected multi-resolution linear predictors

Eng-Jon Ong; Yuxuan Lan; Barry-John Theobald; Richard W. Harvey; Richard Bowden

This paper proposes a learnt data-driven approach for accurate, real-time tracking of facial features using only intensity information. Constraints such as a-priori shape models or temporal models for dynamics are not required or used. Tracking facial features simply becomes the independent tracking of a set of points on the face. This allows us to cope with facial configurations not present in the training data. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel-level information to tracked feature position displacements. To improve on this, a novel and robust biased linear predictor is proposed in this paper. Multiple linear predictors are grouped into a rigid flock to increase robustness. To further improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multi-resolution LP model. Experimental results also show that this method performs more robustly and accurately than AAMs, without any a priori shape information and with minimal training examples.

international conference on multimedia and expo | 2012

View Independent Computer Lip-Reading

Yuxuan Lan; Barry-John Theobald; Richard W. Harvey

Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.

international conference on acoustics, speech, and signal processing | 2012

Insights into machine lip reading

Yuxuan Lan; Richard W. Harvey; Barry-John Theobald

Computer lip-reading is one of the great signal processing challenges. Not only is the signal noisy, it is variable. However it is almost unknown to compare the performance with human lip-readers. Partly this is because of the paucity of human lip-readers and partly because most automatic systems only handle data that are trivial and therefore not representative of human speech. Here we generate a multiview dataset using connected words that can be analysed by an automatic system, based on linear predictive trackers and active appearance models, and human lip-readers. The automatic system we devise has a viseme accuracy of ≈ 46% which is comparable to poor professional human lip-readers. However, unlike human lip-readers our system is good at guessing its fallibility.

Optics and Photonics for Counterterrorism, Crime Fighting and Defence IX; and Optical Materials and Biomaterials in Security and Defence Systems Technology X | 2013

Recent developments in automated lip-reading

Richard Bowden; Stephen J. Cox; Richard W. Harvey; Yuxuan Lan; Eng-Jon Ong; Gari Owen; Barry-John Theobald

Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem.

Explore More