Colin C. Goodyear | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin C. Goodyear is active.

Explore More

Publication

Featured researches published by Colin C. Goodyear.

Journal of the Acoustical Society of America | 1993

On the use of neural networks in articulatory speech synthesis

Mazin G. Rahim; Colin C. Goodyear; W. Bastiaan Kleijn; Juergen Schroeter; Man Mohan Sondhi

A long‐standing problem in the analysis and synthesis of speech by articulatory description is the estimation of the vocal tract shape parameters from natural input speech. Methods to relate spectral parameters to articulatory positions are feasible if a sufficiently large amount of data is available. This, however, results in a high computational load and large memory requirements. Further, one needs to accommodate ambiguities in this mapping due to the nonuniqueness problem (i.e., several vocal tract shapes can result in identical spectral envelopes). This paper describes the use of artificial neural networks for acoustic to articulatory parameter mapping. Experimental results show that a single feed‐forward neural net is unable to perform this mapping sufficiently well when trained on a large data set. An alternative procedure is proposed, based on an assembly of neural networks. Each network is designated to a specific region in the articulatory space, and performs a mapping from cepstral values into ...

international conference on spoken language processing | 1996

Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis

Mat P. Pollard; Barry M. G. Cheetham; Colin C. Goodyear; Mike D. Edgington; A. Lowry

To preserve shape invariance when performing pitch or time-scale modification of sinusoidally-modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated excitation points. Previous methods achieved this by estimating excitation phases at synthesis frame boundaries, disregarding the frequency modulation that may occur between the frame boundary and the nearest modified excitation point. This approximation can produce a significant misalignment of the excitation phases, leading to distortion of the temporal structure of the synthetic speech. In this paper, a shape-invariant technique is proposed which aligns the excitation phases at excitation points, whilst allowing for variations in the frequency of the sinusoidal components.

international conference on acoustics, speech, and signal processing | 1991

A CELP codebook and search technique using a Hopfield net

M. G. Easton; Colin C. Goodyear

A ternary excitation codebook with a special structure for CELP (code-excited linear prediction) coding has been devised in which the main codebook is divided into several sub-codebooks. It is shown how a modified Hopfield neural net is capable of searching such a codebook and selecting a near-optimum code vector. Hence an algorithm is derived which offers considerable computational savings over previously proposed codebook methods. Results are presented demonstrating the performance of the net and the quality of the coded speech.<<ETX>>

Speech Communication | 1990

Estimation of vocal tract filter parameters using a neural net

Mazin G. Rahim; Colin C. Goodyear

Abstract A multilayer perceptron has been trained to perform an analogue mapping from the power spectra of vowels and nasal consonants, spoken by a single speaker, to the control parameters of a speech synthesiser based on an acoustic tube model. The model represents the vocal tract by ten lossless sections, whose areas are adjustable, coupled to a lossy nasal tract whose areas are fixed, except for the first area, which controls the degree of nasal coupling. The outputs of the neural network control these eleven areas, while its inputs are samples of the power spectrum which the synthesised speech spectrum is intended to copy. During training, the synthesiser is driven using exemplar sets of areas and the resulting synthetic speech provides the input spectra for the net. After training, natural speech, with this restricted phoneme set and by the same speaker, can be synthesised with good intelligibility.

international conference on acoustics, speech, and signal processing | 1989

Articulatory synthesis with the aid of a neural net

M.G. Rahim; Colin C. Goodyear

The authors describe the training and use of a multilayer perceptron (MLP) which performs a mapping from the spectra of vowels and nasal consonants, using examples spoken by a single speaker, to sets of area parameters for use in the vocal-tract-modelling filter of a speech synthesizer. Different MLP structures have been investigated using as input data either PARCOR coefficients or sample values of the spectrum. The trained MLP can be used to estimate the driving parameters for speech synthesis from natural utterances using this restricted phoneme set.<<ETX>>

international conference on acoustics speech and signal processing | 1996

Articulatory copy synthesis using a nine-parameter vocal tract model

Colin C. Goodyear; Dongbing Wei

The parameters of a nine-parameter vocal tract model, in conjunction with a time-domain articulatory synthesiser, have been optimised to fit, for nine English vowels, the measured vocal tract shapes and the first three formant frequencies for a particular speaker. Techniques have also been developed for interpolating each of the parameters among these nine vowel points in the f1,f2 plane. The method effectively defines a two-dimensional subspace of the parameter space, which is accessed by f1 and f2. Segments of speech consisting of vowels or vowel-vowel diphones may then be synthesised using formant tracks to determine the selection of the parameters of the model. Parameter values for selected consonants have also been obtained, based on magnetic resonance images. A three-formant interpolation technique allows transitions for the corresponding VC and CV diphones to be copied from natural speech.

Computer Speech & Language | 2000

Incorporating lip protrusion and larynx lowering into a time domain model for articulatory speech synthesis

Colin C. Goodyear

Acoustic transmission in the vocal tract may be simulated in the time domain using the model of Kelly and Lochbaum. A disadvantage of this simulation is that a fixed number of fixed length sections must be used, so that it cannot be used to model variability in vocal tract length, caused by lip protrusion or larynx lowering. This paper describes a simple modification in which digital filters, derived from transmission line T -sections and including glottal and lip impedance models, are appended at each end of a Kelly?Lochbaum filter. The lengths of the sections may be made continuously variable, allowing the lip and larynx segments of the model to be varied, while maintaining a fixed sampling rate. This new technique is compared with the earlier method due to Strube and is found capable of longer extensions and reduced spectral amplitude distortion.

international conference on acoustics, speech, and signal processing | 1997

Shape-invariant pitch and time-scale modification of speech by variable order phase interpolation

Mat P. Pollard; Barry M. G. Cheetham; Colin C. Goodyear; Mike D. Edgington

To preserve the waveform shape and perceived quality of pitch and time-scale modified sinusoidally modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated pitch pulse locations. The glottal excitation is therefore made to resemble a pseudoperiodic impulse train, a quality essential for shape-invariance. Conventional methods attempt to maintain the coherence once per synthesis frame by interpolating the phase through a single modified pitch pulse location, a time where all excitation phases are assumed to be integer multiples of 2/spl pi/. Whilst this is adequate for small degrees of modification, the coherence is lost when the required amount of modification is increased. This paper presents a technique which is capable of better preserving the impulse-like nature of the glottal excitation whilst allowing its phases to evolve slowly through time.

international symposium on circuits and systems | 1997

All-pass excitation phase modelling for low bit-rate speech coding

Barry M. G. Cheetham; H.B. Choi; Xiaoqin Sun; Colin C. Goodyear; Fabrice Plante; W.T.K. Wong

This paper is concerned with the phase spectra of voiced speech segments digitised at low bit-rates using sinusoidal interpolative coding techniques. Such phase spectra are either disregarded at the decoder or regenerated from the magnitude spectra according to assumptions about the human speech production mechanism. The inaccuracy of the commonly used minimum phase assumption is demonstrated and a means of correcting the phase thus obtained using a second order all-pass filter is presented. The application of the all-pass filter to three speech coding techniques, i.e. STC, PWI and IMBE is demonstrated.

international conference on acoustics, speech, and signal processing | 1997

Experiments in female voice speech synthesis using a parametric articulatory model

Dongbing Wei; Colin C. Goodyear

A parametric vocal tract model and a two dimensional articulatory parametric subspace for a female voice are presented. The parameters of the model, which determine the vocal tract shape, can be found uniquely for VV transitions by mapping directly from f1 and f2 onto this subspace, while a modified technique involving f3 is available for voiced VC and CV diphones. The area functions of the vocal tract, generated by these parameters, are used to drive a time-domain synthesiser. Synthesis to give female speech, copied from either male or female natural speech, may be performed.

Explore More