Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ben James Shannon is active.

Publication


Featured researches published by Ben James Shannon.


Speech Communication | 2011

The importance of phase in speech enhancement

Kuldip Kumar Paliwal; Kamil Wojcicki; Ben James Shannon

Typical speech enhancement methods, based on the short-time Fourier analysis-modification-synthesis (AMS) framework, modify only the magnitude spectrum and keep the phase spectrum unchanged. In this paper our aim is to show that by modifying the phase spectrum in the enhancement process the quality of the resulting speech can be improved. For this we use analysis windows of 32ms duration and investigate a number of approaches to phase spectrum computation. These include the use of matched or mismatched analysis windows for magnitude and phase spectra estimation during AMS processing, as well as the phase spectrum compensation (PSC) method. We consider four cases and conduct a series of objective and subjective experiments that examine the importance of the phase spectrum for speech quality in a systematic manner. In the first (oracle) case, our goal is to determine maximum speech quality improvements achievable when accurate phase spectrum estimates are available, but when no enhancement is performed on the magnitude spectrum. For this purpose speech stimuli are constructed, where (during AMS processing) the phase spectrum is computed from clean speech, while the magnitude spectrum is computed from noisy speech. While such a situation does not arise in practice, it does provide us with a useful insight into how much a precise knowledge of the phase spectrum can contribute towards speech quality. In this first case, matched and mismatched analysis window approaches are investigated. Particular attention is given to the choice of analysis window type used during phase spectrum computation, where the effect of spectral dynamic range on speech quality is examined. In the second (non-oracle) case, we consider a more realistic scenario where only the noisy spectra (observable in practice) is available. We study the potential of the mismatched window approach for speech quality improvements in this non-oracle case. We would also like to determine how much room for improvement exists between this case and the best (oracle) case. In the third case, we use the PSC algorithm to enhance the phase spectrum. We compare this approach with the oracle and non-oracle matched and mismatched window techniques investigated in the preceding cases. While in the first three cases we consider the usefulness of various approaches to phase spectrum computation within the AMS framework when noisy magnitude spectrum is used, in the fourth case we examine the usefulness of these techniques when enhanced magnitude spectrum is employed. Our aim (in the context of traditional magnitude spectrum-based enhancement methods) is to determine how much benefit in terms of speech quality can be attained by also processing the phase spectrum. For this purpose, the minimum mean-square error (MMSE) short-time spectral amplitude (STSA) estimates are employed instead of noisy magnitude spectra. The results of the oracle experiments show that accurate phase spectrum estimates can considerably contribute towards speech quality, as well as that the use of mismatched analysis windows (in the computation of the magnitude and phase spectra) provides significant improvements in both objective and subjective speech quality - especially, when the choice of analysis window used for phase spectrum computation is carefully considered. The mismatched window approach was also found to improve speech quality in the non-oracle case. While the improvements were found to be statistically significant, they were only modest compared to those observed in the oracle case. This suggests that research into better phase spectrum estimation algorithms, while a challenging task, could be worthwhile. The results of the PSC experiments indicate that the PSC method achieves better speech quality improvements than the other non-oracle methods considered. The results of the MMSE experiments suggest that accurate phase spectrum estimates have a potential to significantly improve performance of existing magnitude spectrum-based methods. Out of the non-oracle approaches considered, the combination of the MMSE STSA method with the PSC algorithm produced significantly better speech quality improvements than those achieved by these methods individually.


Speech Communication | 2006

Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition

Ben James Shannon; Kuldip Kumar Paliwal

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower-time lags, while the higher-lag autocorrelation coefficients are least affected, this method discards the lower-lag autocorrelation coefficients and uses only the higher-lag autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-lag autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further (like the well-known Mel frequency cepstral coefficient (MFCC) procedure) by the Mel filter bank, log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the autocorrelation Mel frequency cepstral coefficients (AMFCCs). We evaluate the speech recognition performance of the AMFCC features on the Aurora and the resource management databases and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech. Finally, we show that the AMFCC features perform better than the features derived from the robust linear prediction-based methods for noisy speech.


international conference on acoustics, speech, and signal processing | 2005

Influence of autocorrelation lag ranges on robust speech recognition

Ben James Shannon; Kuldip Kumar Paliwal

It is generally believed that the lower-lag autocorrelation coefficients carry information about the spectral envelope and the higher-lag autocorrelation coefficients are more related to pitch information. In this paper, we use lower-lag and higher-lag ranges of the autocorrelation function separately for deriving speech recognition features, and investigate their role in terms of speech recognition performance. The state-of-the-art MFCC (mel frequency cepstral coefficient) features use the whole autocorrelation function in their computation and are used here as a benchmark in our experiments. Our recognition results from the Aurora II corpus show that the higher-lag autocorrelation coefficients perform as well as the whole autocorrelation function for clean speech, and provide better performance for noisy speech, while lower-lag autocorrelation coefficients are not as effective in this aspect.


information sciences, signal processing and their applications | 2005

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Ben James Shannon; Kuldip Kumar Paliwal

In this paper, we introduce a noise robust spectral estimation technique for speech signals that is derived from a windowed one-sided higher-lag autocorrelation sequence. We also introduce a new high dynamic range window design method, and utilise both techniques in a modied Mel Frequency Cepstral Coefcient (MFCC) algorithm to produce noise robust speech recognition features. We call the new features Autocorrelation Mel Frequency Cepstral Coefcients (AMFCCs). We compare the recognition performance of AMFCCs to MFCCs for a range of stationary and non-stationary noises on the Aurora II database. We show that the AMFCC features perform as well as MFCCs in clean conditions and have higher noise robustness in noisy conditions.


international conference on acoustics, speech, and signal processing | 2007

Effect of Speech and Noise Cross Correlation on AMFCC Speech Recognition Features

Ben James Shannon; Kuldip Kumar Paliwal

When designing noise robust speech recognition feature extraction algorithms, it is common to assume that the noise and speech signal are uncorrelated. This assumption allows the cross correlation terms to be ignored in the equations that describe the operation of these algorithms, thus making the mathematics more tractable. In this paper, we investigate the validity of this assumption in the context of the autocorrelation mel frequency cepstral coefficient (AMFCC) feature extraction algorithm. To carry out the investigation, we designed a modified AMFCC algorithm that forces the cross terms in the noisy signal autocorrelation equation to be zero. We then compared the performance of the modified algorithm to the un-modified algorithm in recognition experiments performed using the AURORA II database. From these evaluations, we show that the assumption is fair in 5 out of six tested noise cases. The difference in recognition accuracy between the AMFCC and modified AMFCC for these five noises was less than 5%.


Archive | 2003

A Comparative Study of Filter Bank Spacing for Speech Recognition

Ben James Shannon; Kuldip Kumar Paliwal


conference of the international speech communication association | 2006

Role of Phase Estimation in Speech Enhancement

Ben James Shannon; Kuldip Kumar Paliwal


conference of the international speech communication association | 2004

MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition

Ben James Shannon; Kuldip Kumar Paliwal


Eleventh Australasian International Conference on Speech Science and Technology | 2006

Spectral Subtraction With Variance Reduced Noise Spectrum Estimates

Kamil Wojcicki; Ben James Shannon; Kuldip Kumar Paliwal


conference of the international speech communication association | 2006

Speech Enhancement Based on Spectral Estimation from Higher-lag Autocorrelation

Ben James Shannon; Kuldip Kumar Paliwal; Climent Nadeu

Collaboration


Dive into the Ben James Shannon's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Climent Nadeu

Polytechnic University of Catalonia

View shared research outputs
Researchain Logo
Decentralizing Knowledge