Chai Kiat Yeo
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chai Kiat Yeo.
Speech Communication | 1998
Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo
Abstract This paper illustrates the advantages of using the Discrete Cosine Transform (DCT) as compared to the standard Discrete Fourier Transform (DFT) for the purpose of removing noise embedded in a speech signal. The derivation of the Minimum Mean Square Error (MMSE) filter based on the statistical modelling of the DCT coefficients is shown. Also shown is the derivation of an over-attenuation factor based on the fact that speech energy is not always present in the noisy signal at all times or in all coefficients. This over-attenuation factor is useful in suppressing any musical residual noise which may be present. The proposed methods are evaluated against the noise reduction filter proposed by Y. Ephraim and D. Malah (1984), using both Gaussian distributed white noise as well as recorded fan noise, with favourable results.
Signal Processing | 1999
Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo
In this paper, two estimators of the probability of speech absence are derived using the common assumption that the Fourier coe
Speech Communication | 2009
Huijun Ding; Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo
cients of a frame of speech and noise samples are statistically independent Gaussian random variables (Ephraim and Malah, 1984; McAulay and Malpass, 1980). The estimators are obtained directly from the noisy speech itself. The rst estimator is obtained by binary classication of the received spectral amplitude into speech present or speech absent state. The second estimator is obtained by deriving the conditional probability of speech absence given the received spectral amplitude. Each of the time-adaptive estimators produces an estimate of the probability of speech absence for each spectral frequency. The estimated probability will be higher during the speech period and lower during the silence period. The estimated probability can be fed directly to any lter which requires such an estimate, e.g. the Ephraim and Malah noise suppressor (Ephraim and Malah, 1984), and the modied power subtraction method (Scalart and Vieira Filho, 1996), with signicant improvements for various noise types. ( 1999 Elsevier Science B.V. All rights reserved.
ieee region 10 conference | 1997
Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo
It is well known that speech enhancement using spectral filtering will result in residual noise. Residual noise which is musical in nature is very annoying to human listeners. Many speech enhancement approaches assume that the transform coefficients are independent of one another and can thus be attenuated separately, thereby ignoring the correlations that exist between different time frames and within each frame. This paper, proposes a single channel speech enhancement system which exploits such correlations between the different time frames to further reduce residual noise. Unlike other 2D speech enhancement techniques which apply a post-processor after some classical algorithms such as spectral subtraction, the proposed approach uses a hybrid Wiener spectrogram filter (HWSF) for effective noise reduction, followed by a multi-blade post-processor which exploits the 2D features of the spectrogram to preserve the speech quality and to further reduce the residual noise. This results in pleasant sounding speech for human listeners. Spectrogram comparisons show that in the proposed scheme, musical noise is significantly reduced. The effectiveness of the proposed algorithm is further confirmed through objective assessments and informal subjective listening tests.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Huijun Ding; Ing Yann Soon; Chai Kiat Yeo
This paper presents the use of the wavelet transform for noise reduction in noisy speech signals. The use of different wavelets and different orders have been evaluated for their suitability as a transform for speech noise removal. The wavelets evaluated are the biorthogonal wavelets, Daubechies wavelets, coiflets as well as symlets. Also two different means of filtering the noise in the transformed coefficients are presented. The first method is based on magnitude subtraction while the second method is based on the Wiener filter with a priori signal to noise ratio estimation.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Huijun Ding; Ing Yann Soon; Chai Kiat Yeo
Discrete cosine transform (DCT) has been proven to be a good approximation to the Karhunen-Loeve Transform (KLT) and has similar properties to the discrete Fourier transform (DFT). It also possesses a better energy compaction capability which is advantageous for speech enhancement. However, frame to frame variations of DCT coefficients even for a perfectly stationary signal can be observed. Therefore a DCT-based speech enhancement system with pitch synchronous analysis is proposed to overcome this problem. It reduces the drawbacks of fixed window shift and the amount of shift in the analysis window is now based on the pitch period, thus increasing the inter-frame similarities. Furthermore, a Wiener filter using the a priori signal-to-noise ratio (SNR) with an adaptive parameter is also derived and implemented as an advanced noise reduction filter. This proposed speech enhancement system is evaluated in terms of several objective measures and the experimental results demonstrate the good performance of the proposed system.
international conference on information and communication security | 2009
Peng Dai; Ing Yann Soon; Chai Kiat Yeo
Despite the quality improvement of the speech signal with most traditional noise reduction (TNR) algorithms, the output is always distorted to some extent due to the over-attenuation of speech components. Weak speech components are usually regarded as noise in noise reduction processing and are therefore highly suppressed. In this paper, we propose a postprocessing technique which is based on the regeneration of both the voiced and unvoiced speech in the entire frequency domain to reduce this problem. A nonlinear transform is first applied to obtain the excitation signal, and a smooth envelope is then estimated. To utilize the information of the clean speech contained in the envelope, we combine the original TNR filter output with a weighted product of the excitation signal and the estimated envelope to generate the final synthesized speech. The synthesized speech is quite close to the clean speech and is more natural-sounding. Moreover, our algorithm can mask the residual musical noise effectively with the regenerated speech components. Experimental results demonstrate the excellent performance of our algorithm. In addition, we introduce two novel objective measures and further show the efficiency of our algorithm in maintaining the clean speech while reducing the noise as much as possible.
Speech Communication | 2015
Huijun Ding; Tan Lee; Ing Yann Soon; Chai Kiat Yeo; Peng Dai; Guo Dan
One of the weaknesses of speech recognition system is its lack of robustness to background noise as compared to human listeners under similarly conditions. This paper proposes a 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM). The proposed algorithm incorporates the properties of human auditory system and applies it to the speech recognition system to enhance its robustness. It integrates forward masking, lateral inhibition and Cepstral Mean Normalization into ordinary melfrequency cepstral coefficients (MFCC) feature extraction algorithm. Experiments carried out on AURORA2 database show that the word recognition rate can be improved significantly at low computational cost.
international conference on acoustics, speech, and signal processing | 2009
Huijun Ding; Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo
Three objective measures are proposed to separately evaluate the speech distortion, noise reduction and overall quality of the processed speech signal enhanced by single channel speech enhancement algorithms.The proposed measures are derived in both time and frequency domains.The high correlations between the measurement results of the proposed approaches and subjective ratings demonstrate the effectiveness of the proposed evaluation methodology.Some hints and links between signal parameters and perceptual judgements are revealed in our analysis on the evaluation process of the proposed objective measures with subjective ratings. Among all the existing objective measures, few are able to provide a clearly specific indication of speech distortion or noise reduction, which are the two key metrics to assess the performance of speech enhancement algorithms and evaluate the noise-suppressed speech quality. In this paper, new quantitative quality assessments are proposed to separately evaluate the capabilities of single channel speech enhancement algorithms in terms of maintaining the clean speech, noise reduction and overall performance. Based on these aspects, three evaluation results can be provided for any one test speech signal by analyzing the residual signal which is the difference between the clean speech and the processed speech. Several common speech enhancement algorithms are compared by these objective measures as well as subjective listening tests. High correlations between the scores of the objective measures and subjective ratings clearly show the effectiveness of the proposed evaluation methodologies on the different speech enhancement algorithms.
Electronics Letters | 2002
Ing Yann Soon; Soo Ngee Koh; Chai Kiat Yeo; W.H. Ngo
Despite the success of recent speech enhancement algorithms, the enhanced signals still suffer from undesirable speech distortion caused by over-attenuation of weak speech spectral components. In this paper, a post-processing technique based on the regeneration of both voiced and unvoiced speech is proposed to alleviate this problem. A non-linear transformation is first applied to aWiener filtered speech and the transformed signal is multiplied by a pre-estimated spectral envelop to form the regenerated speech. The resulting speech is then obtained using a weighted combination of the regenerated speech components and the filtered speech. This process significantly improves the resulting speech quality as compared to the original filtered version. It results in speech that sounds less lowpassed. Also, the residual musical noise is significantly masked by the regenerated speech components. Objective measures show that the quality of the resulting speech is much closer to the clean speech as compared to the original Wiener filtered speech.