Tai-Shih Chi
National Chiao Tung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tai-Shih Chi.
international conference on acoustics, speech, and signal processing | 2013
Chung-Chien Hsu; Tse-En Lin; Jian-Hueng Chen; Tai-Shih Chi
In this paper, we propose a voice activity detection (VAD) algorithm based on spectro-temporal modulation structures of input sounds. A multi-resolution spectro-temporal analysis framework is used to inspect prominent speech structures. By comparing with an adaptive threshold, the proposed VAD distinguishes speech from non-speech based on the energy of the frequency modulation of harmonics. Compared with three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, our proposed VAD significantly outperforms them in non-stationary noises in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
international conference on acoustics, speech, and signal processing | 2012
Chung-Chien Hsu; Tse-En Lin; Jian-Hueng Chen; Tai-Shih Chi
In this paper, we propose a signal-channel speech enhancement algorithm by applying the conventional Wiener filter in the spectro-temporal modulation domain. The multi-resolution spectro-temporal analysis and synthesis framework for Fourier spectrograms [12] is extended to the analysis-modification-synthesis (AMS) framework for speech enhancement. Compared with conventional speech enhancement algorithms, a Wiener filter and an extended minimum mean-square error (MMSE) algorithm, our proposed method outperforms them by a large/small margin in white/babble noise conditions from both objective and subjective evaluations.
international conference on acoustics, speech, and signal processing | 2011
Wen-Sheng Chou; Kah-Meng Cheong; Tai-Shih Chi
A binaural algorithm to simultaneously detect the azimuth angle and the pitch of the sound source is proposed in this paper. This algorithm is extended from the stereausis model with two-dimensional coincidence detectors in the joint Space-Pitch domain. In our simulations, sounds from different locations are produced by passing through the Head-Related-Transfer-Function (HRTF). Simulation results show that estimated azimuth angles from our proposed algorithm are more accurate than those from the stereausis model in the single sound source testing condition. Satisfactory results in streaming sound sources from the two-sound mixture by using estimated Space-Pitch information are also demonstrated in our pilot experiments.
international conference on acoustics, speech, and signal processing | 2015
Chung-Chien Hsu; Kah-Meng Cheong; Jen-Tzung Chien; Tai-Shih Chi
This paper presents a single-channel high-dimensional Wiener filter in the spectro-temporal modulation domain. Unlike other conventional noise reduction techniques, the proposed algorithm not only reduces noise but also enhances the “textures” of the speech signal. A non-iterative decision-directed noise estimation method is adopted to estimate the modulation SNR for the modulation-domain Wiener filter. The efficacy of the proposed algorithm in enhancing speech intelligibility is assessed using the short-time objective intelligibility (STOI) measure. Statistical analysis results demonstrate that our proposed algorithm can improve STOI scores in speech-shape noise (SSN) and white noise conditions, but not in babble noise condition, while the conventional Wiener filter fails to improve STOI scores in all three noise conditions.
international conference on acoustics, speech, and signal processing | 2009
Ting-Yu Yen; Jian-Hueng Chen; Tai-Shih Chi
A joint spectro-temporal auditory model is utilized to assess speech quality objectively. The model mimics early and central auditory functions and serves as a spectro-temporal modulation filterbank. Three perceptual relevant parameters, intelligibility, clarity and naturalness, are addressed by the model and are combined to estimate the subjective mean opinion score (MOS) for speech quality measure. Through a simple multiple linear regression analysis, we demonstrate the performance of our proposed perception-based objective speech quality measure is better than that of the state-of-the-art P.563 standard in estimating MOS of the codec-distorted speech in ITU-T Supp. 23 database.
conference of the international speech communication association | 2016
Chung-Chien Hsu; Tai-Shih Chi; Jen-Tzung Chien
This paper proposes a discriminative layered nonnegative matrix factorization (DL-NMF) for monaural speech separation. The standard NMF conducts the parts-based representation using a single-layer of bases which was recently upgraded to the layered NMF (L-NMF) where a tree of bases was estimated for multi-level or multi-aspect decomposition of a complex mixed signal. In this study, we develop the DL-NMF by extending the generative bases in L-NMF to the discriminative bases which are estimated according to a discriminative criterion. The discriminative criterion is conducted by optimizing the recovery of the mixed spectra from the separated spectra and minimizing the reconstruction errors between separated spectra and original source spectra. The experiments on single-channel speech separation show the superiority of DL-NMF to NMF and L-NMF in terms of the SDR, SIR and SAR measures.
international conference on acoustics, speech, and signal processing | 2015
Pei-Chun Tsai; Shih-Ting Lin; Wen-Chung Lee; Chung-Chien Hsu; Tai-Shih Chi; Chia-Fone Lee
A hearing model, which is parameterized by hearing thresholds, degrees of loudness recruitment and reductions of frequency resolution of a hearing-impaired (HI) patient, is proposed in this paper. The model is developed in the filter-bank framework and is flexible for fitting hearing-loss conditions of HI patients. Psychoacoustic experiments were conducted under clean and noisy conditions to validate the models capability in predicting Mandarin speech intelligibility for HI patients. Statistical analysis on the hearing-test results suggests that the proposed model can predict Mandarin speech intelligibility for HI patients to a certain degree.
international conference on acoustics, speech, and signal processing | 2011
Chung-Chien Hsu; Ting-Han Lin; Tai-Shih Chi
The concept of the two-dimensional spectro-temporal modulation filtering of the auditory model [1] is implemented for the FFT spectrogram. It analyzes the spectrogram in terms of the temporal dynamics and the spectral structures of the sound. The overlap and add (OLA) method, which is more convenient and reliable than the iterative-projection method proposed in [1], is used to invert the FFT spectrogram back to sounds. The Non-Negative Sparse Coding (NNSC) method is adopted to demonstrate the benefit of our analysis-synthesis procedures in a noise suppression application. Even without fine-tuning parameters, our proposed analysis-synthesis procedures offer benefits in de-noising especially under low SNR conditions.
international conference on acoustics, speech, and signal processing | 2017
Tzu-Hao Chen; Chun Huang; Tai-Shih Chi
Humans analyze sounds not only based on their frequency contents, but also on the temporal variations of the frequency contents. Inspired by auditory perception, we propose a deep neural network (DNN) based dereverberation algorithm in the rate domain, which presents the temporal variations of frequency contents, in this paper. We show convolutional noise in the time domain can be approximated to multiplicative noise in the rate domain. To remove the multiplicative noise, we adopt the rate-domain complex-valued ideal ratio mask (RDcIRM) as the training target of the DNN. Simulation results show that the proposed rate-domain DNN algorithm is more capable of recovering high-intelligible and high-quality speech from reverberant speech than the compared state-of-the-art dereverberation algorithm. Hence, it is highly suitable for speech applications involving human listeners.
international symposium on chinese spoken language processing | 2016
Yi-Ting Chen; Tzu-Hao Chen; Mao-Chang Huang; Tai-Shih Chi
We have proposed a spatial-cue based binaural noise reduction algorithm for hearing aids. However, in that algorithm, decision parameters are empirically selected. In this paper, we extend the work and propose a supervised classification algorithm for binaural speech enhancement/separation and dereverberation using a modified ideal binary mask (mIBM) as the training target and simple neural networks (NNs) as classifiers. The low complexity of the simple NNs makes the proposed algorithm practical for binaural hearing aids. The interaural time difference (ITD) and the interaural level difference (ILD) of each T-F unit are extracted as the basic binaural features. For the purpose of dereverberation, the interaural coherence (IC) is also considered when building the target mIBM and training the NNs. For separation evaluations, our method yields comparable performance to a more complicated benchmark system, which cannot de-reverb the signal. For concurrent separation and dereverberation, our method offers 4 to 5 dB improvement on the frequency-weighted segmental speech-to-noise ratio (SNRfw) over unprocessed speech.