Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where W.H. Holmes is active.

Publication


Featured researches published by W.H. Holmes.


international conference on acoustics, speech, and signal processing | 1993

Use of an auditory model to improve speech coders

Deep Sen; D.H. Irving; W.H. Holmes

A method for incorporating an auditory masking model in a speech coder using traditional articulatory models is presented. The auditory model attempts to model the frequency selectivity and masking properties of the human cochlea. Coding gain is achieved by analyzing the perceptual content of each sample in the spectrum. The scheme is thus able to introduce selective distortion that is a direct function of human hearing perception and is thus optimally matched to the hearing process. It is shown that good coding gain can be obtained with excellent speech quality. The algorithm can be used on its own or as a front end for traditional vocoders. it can also be implemented with very little computational overhead and low coding delay.<<ETX>>


international conference on acoustics, speech, and signal processing | 2003

Subband noise estimation for speech enhancement using a perceptual Wiener filter

L. Lin; W.H. Holmes; Eliathamby Ambikairajah

The paper proposes a fast noise estimation algorithm for speech enhancement using a perceptual Wiener filter. The noisy speech is decomposed using a critical-band-rate filterbank so that a perceptual modification of Wiener filtering can be applied in speech denoising. The subband noise estimate is updated by adaptively smoothing the noisy signal power. The smoothing parameter is chosen as a function of the estimated signal-to-noise ratio. This noise estimation technique gives accurate results even at very low signal-to-noise ratios, and works continuously, even in the presence of speech. It is effective for both non-stationary and coloured noise. Enhanced speech of good quality is obtained by the perceptual Wiener filter.


international symposium on circuits and systems | 2001

Auditory filter bank inversion

L. Lin; W.H. Holmes; Eliathamby Ambikairajah

Models of auditory filtering using the Gammatone filter bank are useful tools in speech processing. A perceptually accurate auditory inversion model has applications in speech and audio coding. This paper proposes a new auditory filter bank inversion method using a least squares optimization technique. The proposed method is computationally efficient and its low delay makes it suitable for frame-by-frame processing. Three other approaches to Gammatone analysis/synthesis filter bank implementations are compared with the proposed method.


asia pacific conference on circuits and systems | 2002

Speech enhancement for nonstationary noise environment

L. Lin; Eliathamby Ambikairajah; W.H. Holmes

A speech denoising technique based on an auditory filterbank is proposed in this paper. The noisy speech is first decomposed by the auditory filterbank into critical band signals. The denoising gain is calculated from the variance estimate for the subband signal and noise. The time varying subband noise variance is estimated by tracking the minimum variance of the subband noisy signal. This noise reduction technique is suitable for colored and nonstationary noise environment. It results in naturally sounding speech with a very low level of musical noise.


international conference on acoustics, speech, and signal processing | 1995

Low cost vector quantization methods for spectral coding in low rate speech coders

H.R.S. Mohammadi; W.H. Holmes

In low rate speech coders based on the linear prediction method, the quality of synthesized speech can be improved by enhancement of the short-term spectrum quantization stage. In this study, we propose two new efficient methods for coding the spectral parameters, namely sorted codebook vector quantization (SCVQ) and fine-coarse vector quantization (FCVQ). The principles of these methods are presented along with the methods of training and optimizing the related codebooks. The performance of the new schemes is compared experimentally with other efficient methods, such as tree-searched vector quantization (TSVQ) and multi-stage vector quantization (MSVQ). We demonstrate that the new methods offer significant cost reduction whilst achieving superior quality.


international conference on acoustics, speech, and signal processing | 2001

Log-magnitude modelling of auditory tuning curves

L. Lin; Eliathamby Ambikairajah; W.H. Holmes

We propose the novel application of a technique for filter design that can accurately fit measured tuning curves for the auditory fibres in the log-magnitude domain. This method provides pole-zero filters with guaranteed stability, and its log-magnitude domain criterion allows tuning curves with very steep slopes to be accurately modelled with an 8/sup th/ to 10/sup th/ order pole-zero filter. Thus, this technique can also be used to design a new set of critical band filters with superior frequency domain characteristics compared with the well-known gammatone filter bank. The filter bank designed using this technique has applications in auditory-based speech and audio analysis.


international conference on acoustics, speech, and signal processing | 1994

Comparison of ARMA modelling methods for low bit rate speech coding

S. Yim; Deep Sen; W.H. Holmes

There are two main parts of parametric speech coding algorithms such as codebook-excited linear prediction (CELP): the determination of the vocal tract filter parameters and the selection of the excitation signals based on a perceptual error criterion. The vocal tract includes the oral and nasal cavities depending on the type of speech segments (e.g. nasals and unvoiced fricatives). The contribution from the nasal tract suggests the need for an ARMA (or pole-zero) model instead of the conventional AR (pole only) model. The paper compares the performance of several ARMA modelling techniques in estimating the vocal tract filter parameters. The best method in terms of spectral fit and computational complexity is then applied to a CELP-type speech coding algorithm, with results which are superior to conventional AR models.<<ETX>>


Archive | 2002

Wideband Speech and Audio Coding in the Perceptual Domain

L. Lin; Eliathamby Ambikairajah; W.H. Holmes

A new critical band auditory filterbank with superior auditory masking properties is proposed and is applied to wideband speech and audio coding. The analysis and synthesis are performed in the perceptual domain using this filterbank. The outputs of the analysis filters are processed to obtain a series of pulse trains that represent neural firing. Simultaneous and temporal masking models are applied to reduce the number of pulses in order to achieve a compact time-frequency parameterization. The pulse amplitudes and positions are then coded using a run-length coding algorithm. The new speech and audio coder produces high quality coded speech and audio, with both temporal and spectral fidelity.


international conference on acoustics, speech, and signal processing | 2000

Encoding sinusoidal amplitudes with a minimum phase rational model

N. Malik; W.H. Holmes

In this paper we present an algorithm for encoding sine wave amplitudes with a minimum phase rational model suitable for harmonic coding applications. Voiced speech frames from different male and female speakers show a very large variation in the number of sinusoids that is present, from more than forty to less than ten in 4 kHz speech. The method we propose involves minimization of the squared error, on a log scale, between this highly variable number of spectral amplitudes and the corresponding values of the magnitude response of a parametric pole-zero model. A weighted iterative approach is used to get very low distortion solutions to this nonlinear problem. This new variable-to-fixed encoding algorithm gives especially impressive results in the case of high pitched female voices, where the number of spectral harmonics is less than the number of parameters in the rational model.


international conference on acoustics speech and signal processing | 1999

Log amplitude modeling of sinusoids in voiced speech

N. Malik; W.H. Holmes

We present an algorithm for all-pole (envelope) modeling of the amplitudes of sinusoids present in voiced speech segments which works even when the number of sinusoids is very small, as occurs with high-pitched speakers. In contrast to previous methods, this algorithm minimizes a squared error criterion in the log amplitude domain rather than the amplitude domain, and so is better matched to the properties of the human auditory system. A weighted iterative approach is used to get near optimal solutions to this otherwise nonlinear problem. This new frequency domain log amplitude modeling (LAM) algorithm gives impressive results, especially in the case of high pitched female voices where conventional linear prediction methods are inadequate. The algorithm can easily be generalized to develop pole-zero models.

Collaboration


Dive into the W.H. Holmes's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

L. Lin

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Deep Sen

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

N. Malik

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

D.H. Irving

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

G.C. Hurst

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

H.R.S. Mohammadi

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

R.A. Zakarevicius

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

S. Yim

University of New South Wales

View shared research outputs
Researchain Logo
Decentralizing Knowledge