Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Phil D. Green is active.

Publication


Featured researches published by Phil D. Green.


Speech Communication | 2001

Robust automatic speech recognition with missing and unreliable acoustic data

Martin Cooke; Phil D. Green; Ljubomir Josifovski; Ascension Vizinho

Abstract Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally occurring. In these conditions, state-of-the-art automatic speech recognition (ASR) technology fails. This paper describes an approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise. For the purposes of recognition, these regions are treated as missing or unreliable. The primary advantage of this viewpoint is that it makes minimal assumptions about any noise background. Instead, reliable regions are identified, and subsequent decoding is based on this evidence. We introduce two approaches for dealing with unreliable evidence. The first – marginalisation – computes output probabilities on the basis of the reliable evidence only. The second – state-based data imputation – estimates values for the unreliable regions by conditioning on the reliable parts and the recognition hypothesis. A further source of information is the bounds on the energy of any constituent acoustic source in an additive mixture. This additional knowledge can be incorporated into the missing data framework. These approaches are applied to continuous-density hidden Markov model (HMM)-based speech recognisers and evaluated on the TIDigits corpus for several noise conditions. Two criteria which use simple noise estimates are employed as a means of identifying reliable regions. The first treats regions which are negative after spectral subtraction as unreliable. The second uses the estimated noise spectrum to derive local signal-to-noise ratios, which are then thresholded to identify reliable data points. Both marginalisation and state-based data imputation produce a substantial performance advantage over spectral subtraction alone. The use of energy bounds leads to a further increase in performance for both approaches. While marginalisation outperforms data imputation, the latter technique allows the technique to act as a preprocessor for conventional recognisers, or in speech-enhancement applications.


Computer Speech & Language | 2013

The PASCAL CHiME speech separation and recognition challenge

Jon Barker; Emmanuel Vincent; Ning Ma; Heidi Christensen; Phil D. Green

Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of the multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluations a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations.


international conference on acoustics, speech, and signal processing | 1997

Missing data techniques for robust speech recognition

Martin Cooke; Andrew C. Morris; Phil D. Green

In noisy listening conditions, the information available on which to base speech recognition decisions is necessarily incomplete: some spectro-temporal regions are dominated by other sources. We report on the application of a variety of techniques for missing data in speech recognition. These techniques may be based on marginal distributions or on reconstruction of missing parts of the spectrum. Application of these ideas in the resource management task shows a performance which is robust to random removal of up to 80% of the frequency channels, but falls off rapidly with deletions which more realistically simulate masked speech. We report on a vowel classification experiment designed to isolate some of the RM problems for more detailed exploration. The results of this experiment confirm the general superiority of marginals-based schemes, demonstrate the viability of shared covariance statistics, and suggest several ways in which performance improvements on the larger task may be obtained.


Clinical Linguistics & Phonetics | 2006

Automatic Speech Recognition and Training for Severely Dysarthric Users of Assistive Technology: The STARDUST Project.

Mark Parker; Stuart P. Cunningham; Pam Enderby; Mark Hawley; Phil D. Green

The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to “normal” articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.


Endeavour | 1993

Computational auditory scene analysis: listening to several things at once

Martin Cooke; Guy J. Brown; Malcolm Crawford; Phil D. Green

The problem of distinguishing particular sounds, such as conversation, against a background of irrelevant noise is a matter of common experience. Psychologists have studied it for some 40 years, but it is only comparatively recently that computer modelling of the phenomenon has been attempted. This article reviews progress made, possible practical applications, and prospects for the future.


international conference on acoustics, speech, and signal processing | 2004

Speech enhancement with missing data techniques using recurrent neural networks

Shahla Parveen; Phil D. Green

This paper presents an application of missing data techniques in speech enhancement. The enhancement system consists of two stages: the first stage uses a recurrent neural network, which is supplied with noisy speech and produces enhanced speech; whereas the second stage uses missing data techniques to further improve the quality of enhanced speech. The results suggest that combining missing data technique with RNN enhancement is an effective enhancement scheme resulting in a 16 dB background noise reduction for all input signal to noise ratio (SNR) conditions from -5 to 20 dB, improved spectral quality and robust automatic speech recognition performance.


IEEE Transactions on Neural Systems and Rehabilitation Engineering | 2013

A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment

Mark Hawley; Stuart P. Cunningham; Phil D. Green; Pam Enderby; Rebecca Palmer; Siddharth Sehgal; Peter O'Neill

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.


Speech Communication | 2013

Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing

Robin Hofe; Stephen R. Ell; Michael J. Fagan; James M. Gilbert; Phil D. Green; Roger K. Moore; S. I. Rybchenko

This paper reports on word recognition experiments using a silent speech interface based on magnetic sensing of articulator movements. A magnetic field was generated by permanent magnet pellets fixed to relevant speech articulators. Magnetic field sensors mounted on a wearable frame measured the fluctuations of the magnetic field during speech articulation. These sensor data were used in place of conventional acoustic features for the training of hidden Markov models. Both small vocabulary isolated word recognition and connected digit recognition experiments are presented. Their results demonstrate the ability of the system to capture phonetic detail at a level that is surprising for a device without any direct access to voicing information.


Augmentative and Alternative Communication | 2011

Reconstructing the voice of an individual following laryngectomy.

Zahoor Ahmad Khan; Phil D. Green; Sarah Creer; Stuart P. Cunningham

This case study describes the generation of a synthetic voice resembling that of an individual before she underwent a laryngectomy. Recordings of this person (6–7 min) speaking prior to the operation were used to create the voice. Synthesis was based on statistical speech models and this method allows models pre-trained on many speakers to be adapted to resemble an individual voice. The results of a listening test in which participants were asked to judge the similarity of the synthetic voice to the pre-operation (target) voice are reported. Members of the patients family were asked to make a similar judgment. These experiments show that, for most listeners, the voice is quite convincing despite the low quality and small quantity of adaptation data.


annual meeting of the special interest group on discourse and dialogue | 2015

Knowledge transfer between speakers for personalised dialogue management

Iñigo Casanueva; Thomas Hain; Heidi Christensen; Ricard Marxer; Phil D. Green

Model-free reinforcement learning has been shown to be a promising data driven approach for automatic dialogue policy optimization, but a relatively large amount of dialogue interactions is needed before the system reaches reasonable performance. Recently, Gaussian process based reinforcement learning methods have been shown to reduce the number of dialogues needed to reach optimal performance, and pre-training the policy with data gathered from different dialogue systems has further reduced this amount. Following this idea, a dialogue system designed for a single speaker can be initialised with data from other speakers, but if the dynamics of the speakers are very different the model will have a poor performance. When data gathered from different speakers is available, selecting the data from the most similar ones might improve the performance. We propose a method which automatically selects the data to transfer by defining a similarity measure between speakers, and uses this measure to weight the influence of the data from each speaker in the policy model. The methods are tested by simulating users with different severities of dysarthria interacting with a voice enabled environmental control system.

Collaboration


Dive into the Phil D. Green's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin Cooke

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jon Barker

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ning Ma

University of Sheffield

View shared research outputs
Researchain Logo
Decentralizing Knowledge