Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuxuan Lan is active.

Publication


Featured researches published by Yuxuan Lan.


international conference on computer vision | 2009

Robust facial feature tracking using selected multi-resolution linear predictors

Eng-Jon Ong; Yuxuan Lan; Barry-John Theobald; Richard W. Harvey; Richard Bowden

This paper proposes a learnt data-driven approach for accurate, real-time tracking of facial features using only intensity information. Constraints such as a-priori shape models or temporal models for dynamics are not required or used. Tracking facial features simply becomes the independent tracking of a set of points on the face. This allows us to cope with facial configurations not present in the training data. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel-level information to tracked feature position displacements. To improve on this, a novel and robust biased linear predictor is proposed in this paper. Multiple linear predictors are grouped into a rigid flock to increase robustness. To further improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multi-resolution LP model. Experimental results also show that this method performs more robustly and accurately than AAMs, without any a priori shape information and with minimal training examples.


international conference on multimedia and expo | 2012

View Independent Computer Lip-Reading

Yuxuan Lan; Barry-John Theobald; Richard W. Harvey

Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.


international conference on acoustics, speech, and signal processing | 2012

Insights into machine lip reading

Yuxuan Lan; Richard W. Harvey; Barry-John Theobald

Computer lip-reading is one of the great signal processing challenges. Not only is the signal noisy, it is variable. However it is almost unknown to compare the performance with human lip-readers. Partly this is because of the paucity of human lip-readers and partly because most automatic systems only handle data that are trivial and therefore not representative of human speech. Here we generate a multiview dataset using connected words that can be analysed by an automatic system, based on linear predictive trackers and active appearance models, and human lip-readers. The automatic system we devise has a viseme accuracy of ≈ 46% which is comparable to poor professional human lip-readers. However, unlike human lip-readers our system is good at guessing its fallibility.


international conference on acoustics, speech, and signal processing | 2016

Improved speaker independent lip reading using speaker adaptive training and deep neural networks

Ibrahim Almajai; Stephen J. Cox; Richard W. Harvey; Yuxuan Lan

Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.


Optics and Photonics for Counterterrorism, Crime Fighting and Defence IX; and Optical Materials and Biomaterials in Security and Defence Systems Technology X | 2013

Recent developments in automated lip-reading

Richard Bowden; Stephen J. Cox; Richard W. Harvey; Yuxuan Lan; Eng-Jon Ong; Gari Owen; Barry-John Theobald

Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem.


international symposium on visual computing | 2014

Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?

Helen L. Bear; Richard W. Harvey; Barry-John Theobald; Yuxuan Lan

A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.


international conference on image processing | 2014

Resolution limits on visual speech recognition

Helen L. Bear; Richard W. Harvey; Barry-John Theobald; Yuxuan Lan

Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.


Image and Vision Computing | 2010

Finding stable salient contours

Yuxuan Lan; Richard W. Harvey; Jose Roberto Perez Torres

Methods for generating maximally stable extremal regions are generalized to make intensity trees. Such trees may be computed quickly, but they are large so there is a need to select useful nodes within the tree. Methods for simplifying the tree are developed and it is shown that standard confidence tests may be applied to regions identified as parent and child nodes in the tree. These tests provide a principled way to edit the tree and hence control its size. One of the algorithms for simplifying trees is able to reduce the tree size by at least 90% while retaining important nodes. Furthermore the tree can be parsed to identify salient contours which are presented as generalisations of maximally stable extremal regions.


Optics and Photonics for Counterterrorism, Crime Fighting, and Defence VIII | 2012

Is automated conversion of video to text a reality

Richard Bowden; Stephen J. Cox; Richard W. Harvey; Yuxuan Lan; Eng-Jon Ong; Gari Owen; Barry-John Theobald

A recent trend in law enforcement has been the use of Forensic lip-readers. Criminal activities are often recorded on CCTV or other video gathering systems. Knowledge of what suspects are saying enriches the evidence gathered but lip-readers, by their own admission, are fallible so, based on long term studies of automated lip-reading, we are investigating the possibilities and limitations of applying this technique under realistic conditions. We have adopted a step-by-step approach and are developing a capability when prior video information is available for the suspect of interest. We use the terminology video-to-text (V2T) for this technique by analogy with speech-to-text (S2T) which also has applications in security and law-enforcement.


international conference on scale space and variational methods in computer vision | 2007

Salient regions from scale-space trees

Jose Roberto Perez Torres; Yuxuan Lan; Richard W. Harvey

Extracting regions that are noticeably different from their surroundings, so called salient regions, is a topic of considerable interest for image retrieval. There are many current techniques but it has been shown that SIFT and MSER regions are among the best. The SIFT methods have their basis in linear scale-space but less well known is that MSERs are based on a non-linear scale-space. We demonstrate the connection between MSERs and morphological scale-space. Using this connection, MSERs can be enhanced to form a saliency tree which we evaluate via its effectiveness at a standard image retrieval task. The tree out-performs scale-saliency methods. We also examine the robustness of the tree using another standard task in which patches are compared across images transformations such as illuminant change, perspective transformation and so on. The saliency tree is one of the best performing methods.

Collaboration


Dive into the Yuxuan Lan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stephen J. Cox

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Helen L. Bear

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Gari Owen

United Kingdom Ministry of Defence

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Barry Theobald

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Gavin C. Cawley

University of East Anglia

View shared research outputs
Researchain Logo
Decentralizing Knowledge