John S. D. Mason | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John S. D. Mason is active.

Explore More

Publication

Featured researches published by John S. D. Mason.

multimedia signal processing | 1999

Lip signatures for automatic person recognition

John S. D. Mason; Jason Brand; Roland Auckenthaler; Farzin Deravi; Claude C. Chibelushi

This paper evaluates lip features for person recognition, and compares the performance with that of the acoustic signal. Recognition accuracy is found to be equivalent in the two domains, agreeing with the findings of Chibelushi (1997). The optimum dynamic window length for both acoustic and visual modalities is found to be about 100 ms. Recognition performance of the upper lip is considerably better than the lower lip, achieving 15% and 35% identification error rates respectively, using a single digit test and training token.

international conference on image processing | 2001

Skin probability map and its use in face detection

Jason Brand; John S. D. Mason

This paper is in two parts. The first part quantatively assesses an approach to skin segmentation. The second part describes the development and quantitative assessment of an approach to face detection (FD), with the application of content-based image retrieval in mind. Skin detection is introduced as a front-end to an earlier approach to FD by Huang (1994). The baseline approach searches grey scale images only, and is found to be susceptible to variations in lighting conditions and complex backgrounds. It is hypothesised that by integrating colour information into Huangs approach, the number of false faces can be reduced. A skin probability map (SPM) is generated from a large quantity of labeled data (530 images containing faces and 714 images that do not) and is used to pre-process colour test images. Image regions are then ranked in terms of their skin content, thus removing improbable face regions. The performance improvements are shown in terms of false acceptance (FA) and false rejection (FR) scores. As a front-end to Huangs approach, the benefits of skin segmentation can be seen by a reduction in the FA score from 79% to 15% with a negligible impact on FR.

international conference on acoustics, speech, and signal processing | 2001

Language dependency in text-independent speaker verification

Roland Auckenthaler; Michael J. Carey; John S. D. Mason

Applying speech technology in appliances available around the world cannot restrict the functionality to a certain language. However, most of todays text-independent verification systems based on Gaussian mixture models, GMMs, use an adaptive approach for training the speaker model. This assumes that the world model incorporates the same language as that of the target speaker. We investigate language mismatches between the target speaker and the world model in a GMM speaker verification system. Experiments performed with different world model languages showed major degradations, in particular for Mandarin and Vietnamese when the target speakers spoke American English. Experiments with world models trained on data pooled from different languages revealed only minor performance degradations.

international conference on spoken language processing | 1996

On-line incremental adaptation for speaker verification using maximum likelihood estimates of CDHMM parameters

Kin Yu; John S. D. Mason

Investigates two approaches to the online incremental adaptation of continuous-density hidden Markov model (CDHMM) parameters. First, the popular maximum a-posteriori (MAP) approach is examined, highlighting difficulties in automatically setting the adaptation rate. To overcome these problems, we introduce a new approach, based on the multi-observation estimation equations of the forward-backward algorithm, called the cumulative likelihood estimate (CLE). Experimental results using these two approaches are compared with and without the use of a speech model for enrolment on isolated-word speaker models. In both enrolment procedures, the CLE approach can achieve an equal error rate (EER) of approximately 1% for six adaptation sequences using a single-digit test token.

international conference on acoustics, speech, and signal processing | 2000

Speaker-centric score normalisation and time pattern analysis for continuous speaker verification

Roland Auckenthaler; Michael J. Carey; John S. D. Mason

In this paper we introduce the concept of a continuous speaker verification system in a mobile phone environment. The system verifies the speaker during the phone call. We discuss speaker-centric score normalisation and time pattern analysis which extends a speaker verification system to allow continuous verification. Experiments showed that speaker-centric score normalisation improved performance. Moreover, it standardises the target score distribution to allow the accurate prediction of miss probabilities. The performance evaluation of time pattern analysis revealed an impostor locking of 95% while the target speakers remained unlocked.

Lecture Notes in Computer Science | 2001

Camera Motion Extraction Using Correlation for Motion-Based Video Classification

Pierre Martin-Granel; Matthew Roach; John S. D. Mason

This paper considers camera motion extraction with application to automatic video classification. Video motion is subdivided into 3 components, one of which, camera motion, is considered here. The extraction of the camera motion is based on correlation. Both subjective and objective measures of the performance of the camera motion extraction are presented. This approach is shown to be simple but efficient and effective. This form is separated and extracted as a discriminant for video classification. In a simple classification experiment it is shown that sport and non-sport videos can be classified with an identification rate of 80%. The system is shown to be able to verify the genre of a short sequence (only 12 seconds), for sport and non-sport, with a false acceptance rate of 10% on arbitrarily chosen test sequences.

conference of the international speech communication association | 2001