Martin J. Russell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin J. Russell is active.

Explore More

Publication

Featured researches published by Martin J. Russell.

international conference on acoustics, speech, and signal processing | 1985

Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition

Martin J. Russell; Roger K. Moore

Semi-Markov models have been proposed as a mechanism for overcoming some of the limitations inherent in first-order Markov modelling of speech signals. Results have been presented which show that these models provide an appropriate framework for modelling durational structure and can lead to significant improvements in recognition accuracy.

international conference on acoustics speech and signal processing | 1996

Integrating audio and visual information to provide highly robust speech recognition

Michael J. Tomlinson; Martin J. Russell; N. M. Brooke

There is a requirement in many human machine interactions to provide accurate automatic speech recognition in the presence of high levels of interfering noise. The the paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speakers lip images. We describe the combination of the audio and visual data in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components. These ideas are applied to a speaker dependent recognition task involving a small vocabulary and subject to interfering noise. The recognition results obtained using composite vectors and cross-product models are compared with those based on an audio-only feature vector. The benefit of this approach is shown to be an increased performance over a very wide range of noise levels.

Computer Speech & Language | 2000

The STAR system

Martin J. Russell; Robert W. Series; Julie Lynne Wallace; Catherine Brown; Adrian Skilling

Between 1990 and 1998, the Speech Research Unit at the Defence Evaluation and Research Agency (DERA), and Hereford and Worcester County Council Education Department, a U.K. local education authority, conducted research into the use of speech recognition technology in an interactive computer-based pronunciation tutor for 5?7 year-old primary school children. The goal of the project was to develop a robust, autonomous system that would enable a child to practice the pronunciation of a given set of words by speaking them to a computer, which provided immediate feedback on whether the pronunciation was acceptable. This paper describes the development of the underlying speech recognition technology, the prototype real-time system which was developed, and the results of pilot trials of the system in a U.K. primary school.

international conference on acoustics, speech, and signal processing | 1993

A segmental HMM for speech pattern modelling

Martin J. Russell

A simple segmental hidden Markov model (HMM) which addresses some of the limitations of conventional HMM-based methods is proposed. The important features of this approach are the use of an underlying semi-Markov process, in which state transitions are segment-synchronous rather than frame-synchronous and state duration is modeled explicitly, and a state segment model in which separate statistical processes are used to characterize extra-segmental and intra-segmental variability. A basic mathematical analysis of Gaussian segmental HMMs is presented, and model parameter reestimation equations are derived. The relationship between the new type of model and variable frame rate analysis and conventional Gaussian mixture based HMMs is explained.<<ETX>>

international conference on acoustics, speech, and signal processing | 1997

Modelling asynchrony in speech using elementary single-signal decomposition

Michael J. Tomlinson; Martin J. Russell; Roger K. Moore; Andrew P. Buckland; Martin A. Fawley

Although the possibility of asynchrony between different components of the speech spectrum has been acknowledged, its potential effect on automatic speech recogniser performance has only recently been studied. This paper presents the results of continuous speech recognition experiments in which such asynchrony is accommodated using a variant of HMM decomposition. The paper begins with an investigation of the effects of partitioning the speech spectrum explicitly into subbands. Asynchrony between these sub-bands is then accommodated, resulting in a significant decrease in word errors. The same decomposition technique has previously been used successfully to compensate for asynchrony between the two input streams in an audiovisual speech recognition system.

international conference on acoustics, speech, and signal processing | 2001

Text-dependent speaker verification under noisy conditions using parallel model combination

Lit Ping Wong; Martin J. Russell

In real speaker verification applications, additive or convolutive noise creates a mismatch between training and recognition environments, degrading performance. Parallel model combination (PMC) has been used successfully to improve the noise robustness of hidden Markov model (HMM) based speech recognisers. The paper presents the results of applying PMC to compensate for additive noise in HMM-based text-dependent speaker verification. Speech and noise data were obtained from the YOHO and NOISEX-92 databases respectively. Speaker recognition, equal error rates (EER) are presented for noise-contaminated speech at different signal-to-noise ratios (SNRs) and different noise sources. For example, average EER for speech in operations room noise at 6 dB SNR dropped from approximately 20% un-compensated to less than 5% using PMC. Finally, it is shown that speaker recognition performance is relatively insensitive to the exact value of the parameter that determines the relative amplitudes of the speech and noise components of the PMC model.

IEEE Signal Processing Letters | 1997

Linear trajectory segmental HMMs

Martin J. Russell; Wendy J. Holmes

Much of the progress in automatic speech recognition is attributable to the use of hidden Markov models (HMMs) to characterize acoustic speech patterns. Despite their success, HMMs make little use of knowledge about the speech signal, and variation that may be explicable in terms of the physical properties of the human speech production system is treated as random. There is, therefore, a need to develop a speech modeling paradigm which reflects human speech processes more closely. Segmental hidden Markov models (HMMs) are extended versions of conventional HMMs in which states are associated with sequences of observation vectors rather than individual vectors. By treating a segment as a homogeneous unit, dependencies between vectors within a segment can be modeled explicitly. This letter describes a segmental HMM in which a segment is modeled as a noisy function of a linear trajectory. The basic theory of the model is presented, together with formulae for model parameter optimization.

Computer Speech & Language | 2005

A multiple-level linear/linear segmental HMM with a formant-based intermediate layer

Martin J. Russell; Philip J. B. Jackson

Abstract A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate ‘articulatory’ representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based ‘articulatory’ parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters.

international conference on acoustics, speech, and signal processing | 1983

The discriminative network: A mechanism for focusing recognition in whole-word pattern matching

Roger K. Moore; Martin J. Russell; Michael J. Tomlinson

Whole-word pattern matching using dynamic time-warping (DTW) has achieved considerable success as an algorithm for automatic speech recognition. However, the performance of such an algorithm is ultimately limited by its inability to discriminate between similar sounding words. The problem arises because all differences between speech patterns are treated as being equally important, hence the algorithm is particularly susceptible to confusions caused by irrelevant differences. This paper presents an alternative DTW approach which is able to focus its attention on those parts of a speech pattern which serve to distinguish it from similar patterns. A network-type data structure is derived from reference speech patterns, and the separate paths through the network determine the regions where recognition takes place. Results indicate that discrimination between similar sounding words can be greatly improved.

international conference on acoustics, speech, and signal processing | 1982

Locally constrained dynamic programming in automatic speech recognition

Roger K. Moore; Martin J. Russell; Michael J. Tomlinson

Recent years have seen the emergence of dynamic programming as one of the most important tools available for overcoming temporal variability problems in automatic speech recognition. Considerable research effort has been devoted to the investigation of different dynamic programming algorithms, yet most work has been concerned with global, rather than local, constraints upon temporal variation. Clearly the likelihood of timescale distortion is not constant for the entire duration of an utterance; variability must be, at least to some extent, data dependent. It is therefore desirable that information to this effect should be made available to the dynamic time warping process. This paper describes a technique for obtaining an estimate of local timescale variability based on a bi-directional dynamic programming algorithm and basic fuzzy set theory. Results are presented which indicate that this technique will lead to improved discrimination, especially if the differences between classes are mainly due to temporal structure.

Explore More