Jeih-weih Hung
National Chi Nan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeih-weih Hung.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Jeih-weih Hung; Lin-Shan Lee
Linear discriminant analysis (LDA) has long been used to derive data-driven temporal filters in order to improve the robustness of speech features used in speech recognition. In this paper, we proposed the use of new optimization criteria of principal component analysis (PCA) and the minimum classification error (MCE) for constructing the temporal filters. Detailed comparative performance analysis for the features obtained using the three optimization criteria, LDA, PCA, and MCE, with various types of noise and a wide range of SNR values is presented. It was found that the new criteria lead to superior performance over the original MFCC features, just as LDA-derived filters can. In addition, the newly proposed MCE-derived filters can often do better than the LDA-derived filters. Also, it is shown that further performance improvements are achievable if any of these LDA/PCA/MCE-derived filters are integrated with the conventional approach of cepstral mean and variance normalization (CMVN). The performance improvements obtained in recognition experiments are further supported by analyses conducted using two different distance measures.
international conference on acoustics, speech, and signal processing | 2014
Hao-Teng Fan; Jeih-weih Hung; Xugang Lu; Syu-Siang Wang; Yu Tsao
The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components associated with the clean-speech spectral basis vectors are used to construct the updated magnitude spectrogram, producing an enhanced speech utterance. Considering that the rich spectral-temporal structure may be explored in local frequency and time-varying spectral patches, this study proposes a segmental NMF (SNMF) speech enhancement scheme to improve the conventional frame-wise NMF-based method. Two algorithms are derived to decompose the original nonnegative matrix associated with the magnitude spectrogram; the first algorithm is used in the spectral domain and the second algorithm is used in the temporal domain. When using the decomposition processes, noisy speech signals can be modeled more precisely, and spectrograms regarding the speech part can be constituted more favorably compared with using the conventional NMF-based method. Objective evaluations using perceptual evaluation of speech quality (PESQ) indicate that the proposed SNMF strategy increases the sound quality in noise conditions and outperforms the well-known MMSE log-spectral amplitude (LSA) estimation.
ieee automatic speech recognition and understanding workshop | 2001
Shang-Ming Lee; Shi-Hau Fang; Jeih-weih Hung; Lin-Shan Lee
Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coefficients are data-driven and obtained by applying principal component analysis (PCA) to the FFT spectrum of the training data. The experimental results show that this method is robust under noisy environment and is well additive with other noise-handling techniques.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Jeih-weih Hung; Wei-Yi Tsai
Data-driven temporal filtering approaches based on a specific optimization technique have been shown to be capable of enhancing the discrimination and robustness of speech features in speech recognition. The filters in these approaches are often obtained with the statistics of the features in the temporal domain. In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. Three new temporal filtering approaches are proposed and based on constrained versions of linear discriminant analysis (LDA), principal component analysis (PCA), and minimum class distance (MCD), respectively. It is shown that these proposed temporal filters can effectively improve the speech recognition accuracy in various noise-corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provide average relative error reduction rates of over 40% and 27% when compared with baseline Mel frequency cepstral coefficient (MFCC) processing and CMVN alone, respectively.
Signal Processing | 2012
Jeih-weih Hung; Wen-hsiang Tu; Chien-chou Lai
In this paper, we present two novel algorithms to improve the noise robustness of features in speech recognition: modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF). The magnitude spectra of feature streams are updated by referring to the information collected in the clean training set, and the resulting new feature streams are more noise-robust to achieve higher recognition accuracy. In experiments conducted on the Aurora-2 noisy digit database, we show that the proposed MSR achieves an average relative error reduction rate of nearly 57% compared to baseline processing, and MSF is specifically effective in enhancing the features preprocessed by conventional feature normalization methods to achieve even better recognition accuracy in noise-corrupted situations.
IEEE Transactions on Speech and Audio Processing | 2001
Jeih-weih Hung; Jia-Lin Shen; Lin-Shan Lee
Parallel model combination (PMC) techniques have been very successful and popularly used in many applications to improve the performance of speech recognition systems under noisy environments. However, it is believed that some assumptions and approximations made in this approach, primarily in the domain transformation and parameter combination processes, are not necessarily accurate enough in certain practical situations, which may degrade the achievable performance of PMC. In this paper, the possible sources that cause the performance degradation in these processes are carefully analyzed and discussed. Three new approaches, including the truncated Gaussian approach and the split mixture approach for the domain transformation process and the estimated cross-term approach for parameter combination process, are proposed in this paper in order to handle these problems, minimize such degradation, and improve the accuracy of the PMC techniques. These proposed approaches were analyzed and discussed with two recognition tasks, one relatively simple, and the other more complicated and realistic. Both sets of experiments showed that these proposed approaches are able to provide significant improvements over the original PMC method, especially when the SNR condition is worse.
IEEE Signal Processing Letters | 2009
Jeih-weih Hung; Hao-Teng Fan
This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into nonuniform subbands using the discrete wavelet transform (DWT), and then each subband stream is individually processed by well-known normalization methods, such as mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all of the modified subband streams using the inverse DWT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately.
international conference on acoustics, speech, and signal processing | 2002
Jeih-weih Hung; Lin-Shan Lee
In deriving the data-driven temporal filters for speech features, the Linear Discriminant Analysis (LDA) and the Principal Component Analysis (PCA) have been shown to be successful in improving the feature robustness. In this paper, its proposed that the criterion of Minimum Classification Error (MCE) can also be used to obtain the data-driven temporal filters. Two versions of MCE-derived temporal filters, Feature-based and Model-based, are proposed and it is shown that both of them can significantly improve the recognition performance of the original MFCC features as the LDA/PCA-derived filters do. Detailed comparative analysis among the different temporal filtering approaches is presented. It is also shown that the proposed MCE filters can be integrated with the conventional temporal filters, RASTA or CMS, to obtain improved recognition performance regardless of whether the training and testing environments are matched or mismatched, compressed or noise corrupted.
IEEE Signal Processing Letters | 2009
Jeih-weih Hung; Wen-hsiang Tu
Cepstral statistics normalization techniques have been shown to be very successful at improving the noise robustness of speech features. This letter proposes a hybrid-based scheme to achieve a more accurate estimate of the statistical information of features in these techniques. By properly integrating codebook and utterance knowledge, the resulting hybrid-based approach significantly outperforms conventional utterance-based, segment-based and codebook-based approaches in additive noise environments. Furthermore, the high-performance CS-HEQ can be implemented with a short delay and can thus be applied in real-time online systems.
international conference on acoustics, speech, and signal processing | 2003
Ni-chun Wang; Jeih-weih Hung; Lin-Shan Lee
It was previously proposed to use the principal component analysis (PCA) to derive the data-driven temporal filters for obtaining robust features in speech recognition, in which the first principal components are taken as the filter coefficients. In this paper, a multi-eigenvector approach is proposed instead, in which the first M eigenvectors obtained in PCA are weighted by their corresponding eigenvalues and summed to be used as the filter coefficients. Experimental results showed that the multi-eigenvector filters offer significant recognition performance as compared to the previously proposed PCA-derived filters under all different conditions tested with the AURORA2 database, especially when the training and testing environments are highly mismatched.