Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ben Milner is active.

Publication


Featured researches published by Ben Milner.


IEEE Transactions on Speech and Audio Processing | 1997

Noise compensation methods for hidden Markov model speech recognition in adverse environments

Saeed Vaseghi; Ben Milner

Several noise compensation schemes for speech recognition in impulsive and nonimpulsive noise are considered. The noise compensation schemes are spectral subtraction, HMM-based Wiener (1949) filters, noise-adaptive HMMs, and a front-end impulsive noise removal. The use of the cepstral-time matrix as an improved speech feature set is explored, and the noise compensation methods are extended for use with cepstral-time features. Experimental evaluations, on a spoken digit database, in the presence of ear noise, helicopter noise, and impulsive noise, demonstrate that the noise compensation methods achieve substantial improvement in recognition across a wide range of signal-to-noise ratios. The results also show that the cepstral-time matrix is more robust than a vector of identical size, which is composed of a combination of cepstral and differential cepstral features.


ACM Transactions on Speech and Language Processing | 2006

Acoustic environment classification

Ling Ma; Ben Milner; Dan J. Smith

The acoustic environment provides a rich source of information on the types of activity, communication modes, and people involved in many situations. It can be accurately classified using recordings from microphones commonly found in PDAs and other consumer devices. We describe a prototype HMM-based acoustic environment classifier incorporating an adaptive learning mechanism and a hierarchical classification model. Experimental results show that we can accurately classify a wide variety of everyday environments. We also show good results classifying single sounds, although classification accuracy is influenced by the granularity of the classification.


international conference on acoustics, speech, and signal processing | 2000

Robust speech recognition over IP networks

Ben Milner; Shahram Semnani

This work looks at the issues involved in performing robust speech recognition over a packet-based network such as the IP network. This involves the combination of robust speech recognition together with a reliable method of sending speech data over the IP network. The format in which the speech is sent over the network is considered and results show that much better robustness is achieved when the front-end features are transmitted directly rather than encoding the speech with a codec. The problem of packet loss is addressed and a novel detection and estimation scheme for missing frames is introduced to overcome this problem. This is shown to recover performance with 50% packet loss from 33% to 90% which is only 3% below the no loss case.


international conference on acoustics, speech, and signal processing | 2002

A comparison of front-end configurations for robust speech recognition

Ben Milner

This paper presents a comparative analysis of the processing stages involved in feature extraction for speech recognition. Feature extraction is considered as comprising three different processing stages; namely static feature extraction, normalisation and inclusion of temporal information. In each stage a comparison of techniques is made, both theoretically and in terms of their comparative performance. The analysis shows that while some techniques may appear significantly different, upon analysis the effect they have on the signal can be similar. Comparative studies include MFCC and PLP analysis, RASTA filtering and cepstral mean normalisation, and temporal derivatives and cepstral-time matrices. Experimental results, on an unconstrained monophone task, compare recognition performance using different front-end configurations.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

Ben Milner; Xu Shao

This work proposes a method for predicting the fundamental frequency and voicing of a frame of speech from its mel-frequency cepstral coefficient (MFCC) vector representation. This information is subsequently used to enable a speech signal to be reconstructed solely from a stream of MFCC vectors and has particular application in distributed speech recognition systems. Prediction is achieved by modeling the joint density of fundamental frequency and MFCCs. This is first modeled using a Gaussian mixture model (GMM) and then extended by using a set of hidden Markov models to link together a series of state-dependent GMMs. Prediction accuracy is measured on unconstrained speech input for both a speaker-dependent system and a speaker-independent system. A fundamental frequency prediction error of 3.06% is obtained on the speaker-dependent system in comparison to 8.27% on the speaker-independent system. On the speaker-dependent system 5.22% of frames have voicing errors compared to 8.82% on the speaker-independent system. Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Visually Derived Wiener Filters for Speech Enhancement

Ibrahim Almajai; Ben Milner

The aim of this work is to examine whether visual speech information can be used to enhance audio speech that has been contaminated by noise. First, an analysis of audio and visual speech features is made, which identifies the pair with highest audio-visual correlation. The study also reveals that higher audio-visual correlation exists within individual phoneme sounds rather than globally across all speech. This correlation is exploited in the proposal of a visually derived Wiener filter that obtains clean speech and noise power spectrum statistics from visual speech features. Clean speech statistics are estimated from visual features using a maximum a posteriori framework that is integrated within the states of a network of hidden Markov models to provide phoneme localization. Noise statistics are obtained through a novel audio-visual voice activity detector which utilizes visual speech features to make robust speech/nonspeech classifications. The effectiveness of the visually derived Wiener filter is evaluated subjectively and objectively and is compared with three different audio-only enhancement methods over a range of signal-to-noise ratios.


international conference on acoustics, speech, and signal processing | 2001

Robust speech recognition in burst-like packet loss

Ben Milner

Analysis into the effect of packet loss on speech recognition performance shows that both the burst length and the overall proportion of packets lost contribute to a deterioration in accuracy. To combat this burst-like packet loss several methods are compared for estimating the value of missing feature vectors. Three forms of interleaver are then compared which distribute long duration bursts of packet loss into a series of smaller bursts in the feature vector stream. Experimental results are presented on a range of channel conditions and demonstrate that substantial accuracy gains can be achieved using estimation techniques provided burst lengths are short. For longer burst lengths interleaving is necessary to maintain performance. For example at a packet loss rate of 50% and average burst length 20 packets (which represents 40 feature vectors or 400ms) performance is increased from 49.6% with no compensation to 86% with interleaving and cubic interpolation.This paper examines problems associated with performing speech recognition over mobile and IP networks. The main problems are identified as codec-based distortion and from speech vectors being lost from packet loss in the network. A realistic model for packet loss is developed, based on a three state Markov model and is shown to be capable of simulating the burst-like nature of packet loss. A two stage packet loss detection and estimation scheme is proposed and is shown to improve the recognition performance in the event of feature vectors being lost. Results from the Aurora database show that burst-like packet loss reduces the digit accuracy from 99% to 57% at 50% packet loss. Estimation of the lost packets recovers the performance to 77%.


database and expert systems applications | 2003

Environmental Noise Classification for Context-Aware Applications

Ling Ma; Dan J. Smith; Ben Milner

Context-awareness is essential to the development of adaptive information systems. Much work has been done on developing technologies and systems that are aware of absolute location in space and time; other aspects of context have been relatively neglected. We describe our approach to automatically sensing and recognising environmental noise as a contextual cue for context-aware applications. Environmental noise provides much valuable information about a user’s current context. This paper describes an approach to classifying the noise context in the typical environments of our daily life, such as the office, car and city street. In this paper we present our hidden Markov model based noise classifier. We describe the architecture of our system, the experimental results, and discuss the open issues in environmental noise classification for mobile computing.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Robust speech recognition over mobile and IP networks in burst-like packet loss

Ben Milner; Alastair Bruce James

This paper addresses the problem of achieving robust distributed speech recognition in the presence of burst-like packet loss. To compensate for packet loss a number of techniques are investigated to provide estimates of lost vectors. Experimental results on both a connected digits task and a large vocabulary continuous speech recognition task show that simple methods, such as repetition, are not as effective as interpolation methods which are better able to preserve the dynamics of the feature vector stream. Best performance is given by maximum a-posteriori (MAP) estimation of lost vectors which utilizes statistics of the feature vector stream. At longer burst lengths the performance of these compensation techniques deteriorates as the temporal correlation in the received feature vector stream reduces. To compensate for this interleaving is proposed which aims to disperse bursts of loss into a series of unconnected smaller bursts. Results show substantial gains in accuracy, to almost that of the no loss condition, when interleaving is combined with estimation techniques, although this is at the expense of introducing delay. This leads to the proposal that, for a distributed speech recognition application, it is more beneficial to trade delay for accuracy rather than trading bit-rate for accuracy as in forward error correction schemes.


international conference on acoustics, speech, and signal processing | 2004

An analysis of interleavers for robust speech recognition in burst-like packet loss

Alastair Bruce James; Ben Milner

An analysis into the effect of packet loss shows that a speech recogniser is able to tolerate large percentages of packet loss provided that burst lengths are relatively small. This leads to the analysis of three types of interleaver for distributing long bursts of packet loss into a series of shorter bursts. Cubic interpolation is then used to estimate lost feature vectors. Experimental results are presented for a range of channel conditions and demonstrate that interleaving offers significant increases in recognition accuracy under burst-like packet loss. Of the interleavers tested, decorrelated interleaving gives superior recognition performance and has the lowest delay. For example at a packet loss rate of 50% and average burst length 20 packets (40 vectors or 400ms) performance is increased from 49.6% with no compensation to 86% with interleaving and cubic interpolation.

Collaboration


Dive into the Ben Milner's collaboration.

Top Co-Authors

Avatar

Saeed Vaseghi

Brunel University London

View shared research outputs
Top Co-Authors

Avatar

Jonathan Darch

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Xu Shao

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ibrahim Almajai

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Dan J. Smith

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Danny Websdale

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Philip Harding

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Qin Yan

Brunel University London

View shared research outputs
Researchain Logo
Decentralizing Knowledge