Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hao-Teng Fan is active.

Publication


Featured researches published by Hao-Teng Fan.


international conference on acoustics, speech, and signal processing | 2014

Speech enhancement using segmental nonnegative matrix factorization

Hao-Teng Fan; Jeih-weih Hung; Xugang Lu; Syu-Siang Wang; Yu Tsao

The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components associated with the clean-speech spectral basis vectors are used to construct the updated magnitude spectrogram, producing an enhanced speech utterance. Considering that the rich spectral-temporal structure may be explored in local frequency and time-varying spectral patches, this study proposes a segmental NMF (SNMF) speech enhancement scheme to improve the conventional frame-wise NMF-based method. Two algorithms are derived to decompose the original nonnegative matrix associated with the magnitude spectrogram; the first algorithm is used in the spectral domain and the second algorithm is used in the temporal domain. When using the decomposition processes, noisy speech signals can be modeled more precisely, and spectrograms regarding the speech part can be constituted more favorably compared with using the conventional NMF-based method. Objective evaluations using perceptual evaluation of speech quality (PESQ) indicate that the proposed SNMF strategy increases the sound quality in noise conditions and outperforms the well-known MMSE log-spectral amplitude (LSA) estimation.


IEEE Signal Processing Letters | 2009

Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition

Jeih-weih Hung; Hao-Teng Fan

This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into nonuniform subbands using the discrete wavelet transform (DWT), and then each subband stream is individually processed by well-known normalization methods, such as mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all of the modified subband streams using the inverse DWT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately.


international conference on system science and engineering | 2012

Enhancing the sub-band modulation spectra of speech features via nonnegative matrix factorization for robust speech recognition

Hao-Teng Fan; Yi-chang Tsai; Jeih-weih Hung

In this paper, we propose to enhance the modulation spectrum of speech features in noise robustness via the technique of nonnegative matrix factorization (NMF). With NMF, a set of nonnegative basis spectra vectors is derived from the clean speech to represent the important components for speech recognition. However, compared with the original NMF-based scheme that employs iterative search to update the full-band modulation spectra, we propose to apply the orthogonal projection to update the low sub-band modulation spectra. In contrast to the original scheme, the presented new process significantly reduces the computation complexity without the cost of degraded recognition performance. In the experiments conducted on the Aurora-2 database, we show that the presented new NMF-based approach can provide an average error reduction rate of over 65% relative as compared with the baseline MFCC system.


International journal of innovation, management and technology | 2012

Several New DWT-Based Methods for Noise-Robust Speech Recognition

Jeih-weih Hung; Hao-Teng Fan; Syu-Siang Wang

—This paper proposes three novel noise robustness techniques for speech recognition based on discrete wavelet transform (DWT), which are wavelet filter cepstral coefficients (WFCCs), sub-band power normalization (SBPN), and lowpass filtering plus zero interpolation (LFZI). According to our experiments, the proposed WFCC is found to provide a more robust c0 (the zeroth ceptral coefficient) for speech recognition, and with the proper integration of WFCCs and the conventional MFCCs, the resulting compound features can enhance the recognition accuracy. Second, the SBPN procedure is found to reduce the power mismatch within each modulation spectral sub-band, and thus to improve the recognition accuracy significantly. Finally, the third technique, LFZI, can reduce the storage space for speech features, while it is still helpful in speech recognition under noisy conditions.


international conference on its telecommunications | 2012

Modulation spectrum exponential weighting for robust speech recognition

Hao-Teng Fan; Yi-cheng Lian; Jeih-weih Hung

In this paper, we present a novel scheme to improve the noise robustness of features in speech recognition for vehicle noise environments. In the algorithm termed modulation spectrum exponential weighting (MSEW), the magnitude spectra of feature streams are updated by integrating a reference magnitude spectrum and the original magnitude spectrum with varying exponential weights based on the signal-to-noise ratio (SNR) of the operating environment. Specifically, we present three modes of MSEW, which can be viewed as a generalization of the two algorithms, modulation spectrum replacement/filtering (MSR/MSF). In experiments conducted on the AURORA-2 noisy digit database, the presented MSEW algorithms can achieve better recognition accuracy rates relative to the original MSR and MSF in various vehicle-noise environments.


EURASIP Journal on Advances in Signal Processing | 2012

Enhancing the magnitude spectrum of speech features for robust speech recognition

Jeih-weih Hung; Hao-Teng Fan; Wen-hsiang Tu

AbstractIn this article, we present an effective compensation scheme to improve noise robustness for the spectra of speech signals. In this compensation scheme, called magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is performed on the frame sequence of the utterance. The magnitude spectra of non-speech frames are then reduced while those of speech frames are amplified. In experiments conducted on the Aurora-2 noisy digits database, MSE achieves an error reduction rate of nearly 42% relative to baseline processing. This method outperforms well-known spectral-domain speech enhancement techniques, including spectral subtraction (SS) and Wiener filtering (WF). In addition, the proposed MSE can be integrated with cepstral-domain robustness methods, such as mean and variance normalization (MVN) and histogram normalization (HEQ), to achieve further improvements in recognition accuracy under noise-corrupted environments.


international symposium on chinese spoken language processing | 2010

DCT-based processing of dynamic features for robust speech recognition

Wen-Chi Lin; Hao-Teng Fan; Jeih-weih Hung

In this paper, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify these DCT-based filters by windowing, removing DC gain, and varying the filter length. The speech recognition task using Aurora-2 digit database show that the proposed methods can enhance the original CTC in improving the recognition accuracy. The resulting relative error reduction is around 20%.


international conference on system science and engineering | 2012

The study of q-logarithmic modulation spectral normalization for robust speech recognition

Hao-Teng Fan; Che-hsien Hsu; Jeih-weih Hung

This paper presents a novel use of the generalized logarithm operation (q-logarithm) in refining the modulation spectrum of speech features for noise-robust speech recognition. The resulting new method, generalized logarithmic modulation spectral mean normalization (GLMSMN), equalizes the average of the magnitude modulation spectrum in q-logarithmic domain for different utterances in order to alleviate the effect of noise. In the Aurora-2 connected-digit database and evaluation task, the presented GLMSMN operating on the MVN features reveals significant improvement in recognition accuracy in comparison with the MFCC baseline and MVN. The overall averaged recognition accuracy brought by GLMSMN can be nearly 90%.


fuzzy systems and knowledge discovery | 2013

Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition

Hao-Teng Fan; Wei-jeih Yeh; Jeih-weih Hung

This paper proposes a novel approach to enhance the speech features in noise robustness for speech recognition. In the proposed approach, the speech feature time sequence is first converted into the modulation spectral domain via discrete Fourier transform (DFT). The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment is individually processed by a specific statistics normalization method, like mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature time sequence with all the modified sub-band magnitude spectral segments together with the original phase spectrum using the inverse DFT. During the process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately, and more spectral samples within each band give rise to more accurate statistic estimates due to overlapping the adjacent segments. For the Aurora-2 clean-condition training task, the new proposed method gives rise to significant improvement in recognition accuracy over the baseline results, and it behaves better than the similar technique dealing with non-overlapped sub-bands.


fuzzy systems and knowledge discovery | 2013

Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition

Hao-Teng Fan; Kuan-wei Hsieh; Chien-hao Huang; Jeih-weih Hung

The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.

Collaboration


Dive into the Hao-Teng Fan's collaboration.

Top Co-Authors

Avatar

Jeih-weih Hung

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Wen-hsiang Tu

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Wen-yu Tseng

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Syu-Siang Wang

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Che-hsien Hsu

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Chien-hao Huang

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Kuan-wei Hsieh

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Pao-han Lin

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Wei-jeih Yeh

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Wen-Chi Lin

National Chi Nan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge