Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark J. F. Gales is active.

Publication


Featured researches published by Mark J. F. Gales.


Computer Speech & Language | 1998

Maximum likelihood linear transformations for HMM-based speech recognition☆

Mark J. F. Gales

Abstract This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Only model-based linear transforms are considered, since, for linear transforms, they subsume the appropriate feature–space transforms. The paper compares the two possible forms of model-based transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform. Re-estimation formulae for all appropriate cases of transform are given. This includes a new and efficient full variance transform and the extension of the constrained model–space transform from the simple diagonal case to the full or block–diagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two model–space transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained model–space transform for speaker adaptive training are detailed.


IEEE Transactions on Speech and Audio Processing | 1996

Robust continuous speech recognition using parallel model combination

Mark J. F. Gales; Steve J. Young

This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the parallel model combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on small, 10-50 word vocabulary systems. Experiments on the Resource Management (RM) database, a 1000 word continuous speech recognition task, reveal compensation requirements not highlighted by the smaller vocabulary tasks. In particular, that it is necessary to compensate the dynamic parameters as well as the static parameters to achieve good recognition performance. The database used for these experiments was the RM speaker independent task with either Lynx Helicopter noise or Operation Room noise from the NOISEX-92 database added. The experiments reported here used the HTK RM recognizer developed at CUED modified to include PMC based compensation for the static, delta and delta-delta parameters. After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB. However, using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.


Computer Speech & Language | 1996

Mean and variance adaptation within the MLLR framework

Mark J. F. Gales; Philip C. Woodland

Abstract One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker-dependent (SD) performance with only small amounts of speaker-specific data, and are often based on initial speaker-independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case an SI recognition system trained in, typically, a clean acoustic environment is adapted to operate in a new, noise-corrupted, acoustic environment. This paper examines the maximum likelihood linear regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of model parameters to maximize the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture-Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and re-estimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over mean-only MLLR adaptation.


IEEE Transactions on Speech and Audio Processing | 2000

Cluster adaptive training of hidden Markov models

Mark J. F. Gales

When performing speaker adaptation, there are two conflicting requirements. First, the speaker transform must be powerful enough to represent the speaker. Second, the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or the acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set.


international conference on acoustics, speech, and signal processing | 1992

An improved approach to the hidden Markov model decomposition of speech and noise

Mark J. F. Gales; Steve J. Young

The author addresses the problem of automatic speech recognition in the presence of interfering noise. The novel approach described decomposes the contaminated speech signal using a generalization of standard hidden Markov modeling, while utilizing a compact and effective parametrization of the speech signal. The technique is compared to some existing noise compensation techniques, using data recorded in noise, and is found to have improved performance compared to existing model decomposition techniques. Performance is comparable to existing noise subtraction techniques, but the technique is applicable to a wider range of noise environments and is not dependent on an accurate endpointing of the speech.<<ETX>>


Speech Communication | 1993

Cepstral parameter compensation for HMM recognition in noise

Mark J. F. Gales; Steve J. Young

Abstract This paper describes a method of adapting a continuous density HMM recogniser trained on clean cepstral speech data to make it robust to noise. The technique is based on parallel model combination (PMC) in which the parameters of corresponding pairs of speech and noise states are combined to yield a set of compensated parameters. It improves on earlier cepstral mean compensation methods in that it also adapts the variances and as a result can deal with much lower SNRs. The PMC method is evaluated on the NOISEX-92 noise database and shown to work well down to 0 dB SNR and below for both stationary and non-stationary noises. Furthermore, for relatively constant noise conditions, there is no additional computational cost at run-time.


Computer Speech & Language | 1995

Robust speech recognition in additive and convolutional noise using parallel model combination

Mark J. F. Gales; Steve J. Young

Abstract The method of Parallel Model Combination (PMC) has been shown to be a powerful technique for compensating a speech recognizer for the effects of additive noise. In this paper, the PMC scheme is extended to include the effects of convolutional noise. This is done by introducing a modified “mismatch” function which allows an estimate to be made of the difference in channel conditions or tilt between training and test environments. Having estimated this tilt, Maximum Likelihood (ML) estimates of the corrupted speech model may then be obtained in the usual way. The scheme is evaluated using the NOISEX-92 database where the performance in the presence of both interfering additive noise and convolutional noise shows only slight degradation compared with that obtained when no convolutional noise is present.


international conference on acoustics, speech, and signal processing | 2007

Consensus Network Decoding for Statistical Machine Translation System Combination

Khe Chai Sim; William Byrne; Mark J. F. Gales; Hichem Sahbi; Philip C. Woodland

This paper presents a simple and robust consensus decoding approach for combining multiple machine translation (MT) system outputs. A consensus network is constructed from an N-best list by aligning the hypotheses against an alignment reference, where the alignment is based on minimising the translation edit rate (TER). The minimum Bayes risk (MBR) decoding technique is investigated for the selection of an appropriate alignment reference. Several alternative decoding strategies proposed to retain coherent phrases in the original translations. Experimental results are presented primarily based on three-way combination of Chinese-English translation outputs, and also presents results for six-way system combination. It is shown that worthwhile improvements in translation performance can be obtained using the methods discussed.


international conference on spoken language processing | 1996

Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS

Kate Knill; Mark J. F. Gales; Steve J. Young

This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMM-based systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a /spl times/3 reduction in likelihood computation. To explain this degradation, this paper investigates the trade-offs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster enabled the recognition accuracy on a 5k speaker-independent task to be maintained up to a /spl times/5 reduction in likelihood computation.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Heiga Zen; Norbert Braunschweiler; Sabine Buchholz; Mark J. F. Gales; Katherine Mary Knill; Sacha Krstulovic; Javier Latorre

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

Collaboration


Dive into the Mark J. F. Gales's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xunying Liu

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Anton Ragni

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Kate Knill

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Xie Chen

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Khe Chai Sim

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chao Zhang

University of Cambridge

View shared research outputs
Researchain Logo
Decentralizing Knowledge