Andrew C. Morris | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew C. Morris is active.

Explore More

Publication

Featured researches published by Andrew C. Morris.

Computer Speech & Language | 2005

Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR

Astrid Hagen; Andrew C. Morris

Abstract In this article we review several successful extensions to the standard hidden-Markov-model/artificial neural network (HMM/ANN) hybrid, which have recently made important contributions to the field of noise robust automatic speech recognition. The first extension to the standard hybrid was the “multi-band hybrid”, in which a separate ANN is trained on each frequency sub-band, followed by some form of weighted combination of ANN state posterior probability outputs prior to decoding. However, due to the inaccurate assumption of sub-band independence, this system usually gives degraded performance, except in the case of narrow-band noise. All of the systems which we review overcome this independence assumption and give improved performance in noise, while also improving or not significantly degrading performance with clean speech. The “all-combinations multi-band” hybrid trains a separate ANN for each sub-band combination. This, however, typically requires a large number of ANNs. The “all-combinations multi-stream” hybrid trains an ANN expert for every combination of just a small number of complementary data streams. Multiple ANN posteriors combination using maximum a-posteriori (MAP) weighting gives rise to the further successful strategy of hypothesis level combination by MAP selection. An alternative strategy for exploiting the classification capacity of ANNs is the “tandem hybrid” approach in which one or more ANN classifiers are trained with multi-condition data to generate discriminative and noise robust features for input to a standard ASR system. The “multi-stream tandem hybrid” trains an ANN for a number of complementary feature streams, permitting multi-stream data fusion. The “narrow-band tandem hybrid” trains an ANN for a number of particularly narrow frequency sub-bands. This gives improved robustness to noises not seen during training. Of the systems presented, all of the multi-stream systems provide generic models for multi-modal data fusion. Test results for each system are presented and discussed.

Proceedings of SPIE, the International Society for Optical Engineering | 2006

Nonintrusive multibiometrics on a mobile device: a comparison of fusion techniques

Lorene Allano; Andrew C. Morris; Harin Sellahewa; Sonia Garcia-Salicetti; Jacques Koreman; Sabah Jassim; Bao Ly-Van; Dalei Wu; Bernadette Dorizzi

In this article we test a number of score fusion methods for the purpose of multimodal biometric authentication. These tests were made for the SecurePhone project, whose aim is to develop a prototype mobile communication system enabling biometrically authenticated users to deal legally binding m-contracts during a mobile phone call on a PDA. The three biometrics of voice, face and signature were selected because they are all traditional non-intrusive and easy to use means of authentication which can readily be captured on a PDA. By combining multiple biometrics of relatively low security it may be possible to obtain a combined level of security which is at least as high as that provided by a PIN or handwritten signature, traditionally used for user authentication. As the relative success of different fusion methods depends on the database used and tests made, the database we used was recorded on a suitable PDA (the Qtek2020) and the test protocol was designed to reflect the intended application scenario, which is expected to use short text prompts. Not all of the fusion methods tested are original. They were selected for their suitability for implementation within the constraints imposed by the application. All of the methods tested are based on fusion of the match scores output by each modality. Though computationally simple, the methods tested have shown very promising results. All of the 4 fusion methods tested obtain a significant performance increase.

non linear speech processing | 2005

MLP internal representation as discriminative features for improved speaker recognition

Dalei Wu; Andrew C. Morris; Jacques Koreman

Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of speaker sub-units to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.

Mobile multimedia / image processing for military and security applications. Conference | 2006

Multimodal person authentication on a smartphone under realistic conditions

Andrew C. Morris; Sabah Jassim; Harin Sellahewa; Lorene Allano; Johan Ehlers; Dalei Wu; Jacques Koreman; Sonia Garcia-Salicetti; Bao Ly-Van; Bernadette Dorizzi

Verification of a persons identity by the combination of more than one biometric trait strongly increases the robustness of person authentication in real applications. This is particularly the case in applications involving signals of degraded quality, as for person authentication on mobile platforms. The context of mobility generates degradations of input signals due to the variety of environments encountered (ambient noise, lighting variations, etc.), while the sensors lower quality further contributes to decrease in system performance. Our aim in this work is to combine traits from the three biometric modalities of speech, face and handwritten signature in a concrete application, performing non intrusive biometric verification on a personal mobile device (smartphone/PDA). Most available biometric databases have been acquired in more or less controlled environments, which makes it difficult to predict performance in a real application. Our experiments are performed on a database acquired on a PDA as part of the SecurePhone project (IST-2002-506883 project Secure Contracts Signed by Mobile Phone). This database contains 60 virtual subjects balanced in gender and age. Virtual subjects are obtained by coupling audio-visual signals from real English speaking subjects with signatures from other subjects captured on the touch screen of the PDA. Video data for the PDA database was recorded in 2 recording sessions separated by at least one week. Each session comprises 4 acquisition conditions: 2 indoor and 2 outdoor recordings (with in each case, a good and a degraded quality recording). Handwritten signatures were captured in one session in realistic conditions. Different scenarios of matching between training and test conditions are tested to measure the resistance of various fusion systems to different types of variability and different amounts of enrolment data.

Speaker Classification I | 2007

Enhancing Speaker Discrimination at the Feature Level

Jacques C. Koreman; Dalei Wu; Andrew C. Morris

This chapter describes a method for enhancing the differences between speaker classes at the feature level (feature enhancement) in an automatic speaker recognition system. The original Mel-frequency cepstral coefficient (MFCC) space is projected onto a new feature space by a neural network trained on a subset of speakers which is representative for the whole target population. The new feature space better discriminates between the target classes (speakers) than the original feature space. The chapter focuses on the method for selecting a representative subset of speakers, comparing several approaches to speaker selection. The effect of feature enhancement is tested both for clean and various noisy speech types to evaluate its applicability under practical conditions. It is shown that the proposed method leads to a substantial improvement in speaker recognition performance. The method can also be applied to other automatic speaker classification tasks.

Speaker Classification I | 2007

Frame Based Features

Stefan Schacht; Jacques Koreman; Christoph Lauer; Andrew C. Morris; Dalei Wu; Dietrich Klakow

In this chapter we will discuss feature extraction methods for speaker classification. We introduce linear predictive coding, mel frequency cepstral coefficients and wavelets and perform experimental studies on AURORA and TIMIT data. For the speaker identification task, we can show that wavelets are beneficial.

Mobile multimedia / image processing for military and security applications. Conference | 2006

Comparison of weighting strategies in early and late fusion approaches to audio-visual person authentication

Harin Sellahewa; Naseer Al-Jawad; Andrew C. Morris; Dalei Wu; Jacques Koreman; Sabah Jassim

Person authentication can be strongly enhanced by the combination of different modalities. This is also true for the face and voice signals, which can be obtained with minimal inconvenience for the user. However, features from each modality can be combined at various different levels of processing and for face and voice signals the advantage of fusion depends strongly on the way they are combined. The aim of the work presented is to investigate the optimal strategy for combining voice and face modalities for signals of varying quality. The experimental data are taken from a newly acquired database using a PDA, which contains audio-visual recordings in different conditions. Voice features use mel-frequency cepstral coefficients, while the face signal is parameterised using wavelet coefficients in certain subbands. Results are presented for both early (feature-level) and late (score-level) fusion. At each level different fixed and variable weightings are used, both to weight between frames within each modality and to weight between modalities, where weights are based on some measure of signal reliability, such as the accuracy of automatic face detection or the audio signal to noise ratio. In addition, the contribution to authentication of information from different areas of the face is explored to determine a regional weighting for the face coefficients.

international conference on security and cryptography | 2018