Syu-Siang Wang
Center for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Syu-Siang Wang.
international conference on acoustics, speech, and signal processing | 2014
Hao-Teng Fan; Jeih-weih Hung; Xugang Lu; Syu-Siang Wang; Yu Tsao
The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components associated with the clean-speech spectral basis vectors are used to construct the updated magnitude spectrogram, producing an enhanced speech utterance. Considering that the rich spectral-temporal structure may be explored in local frequency and time-varying spectral patches, this study proposes a segmental NMF (SNMF) speech enhancement scheme to improve the conventional frame-wise NMF-based method. Two algorithms are derived to decompose the original nonnegative matrix associated with the magnitude spectrogram; the first algorithm is used in the spectral domain and the second algorithm is used in the temporal domain. When using the decomposition processes, noisy speech signals can be modeled more precisely, and spectrograms regarding the speech part can be constituted more favorably compared with using the conventional NMF-based method. Objective evaluations using perceptual evaluation of speech quality (PESQ) indicate that the proposed SNMF strategy increases the sound quality in noise conditions and outperforms the well-known MMSE log-spectral amplitude (LSA) estimation.
Neuroscience | 2010
Pei-Yi Lin; Syu-Siang Wang; Mei-Yun Tai; Yuan-Feen Tsai
Extinction reflects a decrease in the conditioned response (CR) following non-reinforcement of a conditioned stimulus. Behavioral evidence indicates that extinction involves an inhibitory learning mechanism in which the extinguished CR reappears with presentation of an unconditioned stimulus. However, recent studies on fear conditioning suggest that extinction erases the original conditioning if the time interval between fear acquisition and extinction is short. The present study examined the effects of different intervals between acquisition and extinction of the original memory in conditioned taste aversion (CTA). Male Long-Evans rats acquired CTA by associating a 0.2% sucrose solution with malaise induced by i.p. injection of 4 ml/kg 0.15 M LiCl. Two different time intervals, 5 and 24 h, between CTA acquisition and extinction were used. Five or 24 h after CTA acquisition, extinction trials were performed, in which a bottle containing 20 ml of a 0.2% sucrose solution was provided for 10 min without subsequent LiCl injection. If sucrose consumption during the extinction trials was greater than the average water consumption, then rats were considered to have reached CTA extinction. Rats subjected to extinction trials lasting 24 h, but not 5 h, after acquisition re-exhibited the extinguished CR following injection of 0.15 M LiCl alone 7 days after acquisition. Extracellular signal-regulated kinase (ERK) in the medial prefrontal cortex (mPFC) and basolateral nucleus of the amygdala (BLA) was examined by Western blot after the first extinction trial. ERK activation in the mPFC was induced after the extinction trial beginning 5 h after acquisition, whereas the extinction trial performed 24 h after acquisition induced ERK activation in the BLA. These data suggest that the original conditioning can be inhibited or retained by CTA extinction depending on the time interval between acquisition and extinction and that the ERK transduction pathway in the mPFC and BLA is differentially involved in these processes.
IEEE Transactions on Biomedical Engineering | 2017
Tien-En Chen; Shih-I Yang; Li-Ting Ho; Kun-Hsi Tsai; Yu-Hsuan Chen; Yun-Fan Chang; Ying-Hui Lai; Syu-Siang Wang; Yu Tsao; Chau-Chung Wu
OBJECTIVE This study focuses on the first (S1) and second (S2) heart sound recognition based only on acoustic characteristics; the assumptions of the individual durations of S1 and S2 and time intervals of S1-S2 and S2-S1 are not involved in the recognition process. The main objective is to investigate whether reliable S1 and S2 recognition performance can still be attained under situations where the duration and interval information might not be accessible. METHODS A deep neural network (DNN) method is proposed for recognizing S1 and S2 heart sounds. In the proposed method, heart sound signals are first converted into a sequence of Mel-frequency cepstral coefficients (MFCCs). The K-means algorithm is applied to cluster MFCC features into two groups to refine their representation and discriminative capability. The refined features are then fed to a DNN classifier to perform S1 and S2 recognition. We conducted experiments using actual heart sound signals recorded using an electronic stethoscope. Precision, recall, F-measure, and accuracy are used as the evaluation metrics. RESULTS The proposed DNN-based method can achieve high precision, recall, and F-measure scores with more than 91% accuracy rate. CONCLUSION The DNN classifier provides higher evaluation scores compared with other well-known pattern classification methods. SIGNIFICANCE The proposed DNN-based method can achieve reliable S1 and S2 recognition performance based on acoustic characteristics without using an ECG reference or incorporating the assumptions of the individual durations of S1 and S2 and time intervals of S1-S2 and S2-S1.
international symposium on chinese spoken language processing | 2012
Syu-Siang Wang; Jeih-weih Hung; Yu Tsao
In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced frontend (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.
IEEE Transactions on Biomedical Engineering | 2017
Ying-Hui Lai; Fei Chen; Syu-Siang Wang; Xugang Lu; Yu Tsao; Chin-Hui Lee
Objective: In a cochlear implant (CI) speech processor, noise reduction (NR) is a critical component for enabling CI users to attain improved speech perception under noisy conditions. Identifying an effective NR approach has long been a key topic in CI research. Method: Recently, a deep denoising autoencoder (DDAE) based NR approach was proposed and shown to be effective in restoring clean speech from noisy observations. It was also shown that DDAE could provide better performance than several existing NR methods in standardized objective evaluations. Following this success with normal speech, this paper further investigated the performance of DDAE-based NR to improve the intelligibility of envelope-based vocoded speech, which simulates speech signal processing in existing CI devices. Results: We compared the performance of speech intelligibility between DDAE-based NR and conventional single-microphone NR approaches using the noise vocoder simulation. The results of both objective evaluations and listening test showed that, under the conditions of nonstationary noise distortion, DDAE-based NR yielded higher intelligibility scores than conventional NR approaches. Conclusion and significance: This study confirmed that DDAE-based NR could potentially be integrated into a CI processor to provide more benefits to CI users under noisy conditions.
asia pacific signal and information processing association annual summit and conference | 2015
Syu-Siang Wang; Hsin-Te Hwang; Ying-Hui Lai; Yu Tsao; Xugang Lu; Hsin-Min Wang; Borching Su
This paper investigates the use of the speech parameter generation (SPG) algorithm, which has been successfully adopted in deep neural network (DNN)-based voice conversion (VC) and speech synthesis (SS), for incorporating temporal information to improve the deep denoising auto-encoder (DDAE)-based speech enhancement. In our previous studies, we have confirmed that DDAE could effectively suppress noise components from noise corrupted speech. However, because DDAE converts speech in a frame by frame manner, the enhanced speech shows some level of discontinuity even though context features are used as input to the DDAE. To handle this issue, this study proposes using the SPG algorithm as a post-processor to transform the DDAE processed feature sequence to one with a smoothed trajectory. Two types of temporal information with SPG are investigated in this study: static-dynamic and context features. Experimental results show that the SPG with context features outperforms the SPG with static-dynamic features and the baseline system, which considers context features without SPG, in terms of standardized objective tests in different noise types and SNRs.
IEEE Access | 2017
Tassadaq Hussain; Sabato Marco Siniscalchi; Chi-Chun Lee; Syu-Siang Wang; Yu Tsao; Wen-Hung Liao
In wireless telephony and audio data mining applications, it is desirable that noise suppression can be made robust against changing noise conditions and operates in real time (or faster). The learning effectiveness and speed of artificial neural networks are therefore critical factors in applications for speech enhancement tasks. To address these issues, we present an extreme learning machine (ELM) framework, aimed at the effective and fast removal of background noise from a single-channel speech signal, based on a set of randomly chosen hidden units and analytically determined output weights. Because feature learning with shallow ELM may not be effective for natural signals, such as speech, even with a large number of hidden nodes, hierarchical ELM (H-ELM) architectures are deployed by leveraging sparse auto-encoders. In this manner, we not only keep all the advantages of deep models in approximating complicated functions and maintaining strong regression capabilities, but we also overcome the cumbersome and time-consuming features of both greedy layer-wise pre-training and back-propagation (BP)-based fine tuning schemes, which are typically adopted for training deep neural architectures. The proposed ELM framework was evaluated on the Aurora–4 speech database. The Aurora–4 task provides relatively limited training data, and test speech data corrupted with both additive noise and convolutive distortions for matched and mismatched channels and signal-to-noise ratio (SNR) conditions. In addition, the task includes a subset of testing data involving noise types and SNR levels that are not seen in the training data. The experimental results indicate that when the amount of training data is limited, both ELM- and H-ELM-based speech enhancement techniques consistently outperform the conventional BP-based shallow and deep learning algorithms, in terms of standardized objective evaluations, under various testing conditions.
IEEE Signal Processing Letters | 2016
Syu-Siang Wang; Alan Chern; Yu Tsao; Jeih-weih Hung; Xugang Lu; Ying-Hui Lai; Borching Su
For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.
international conference on acoustics, speech, and signal processing | 2015
Ying-Hui Lai; Syu-Siang Wang; Pei-Chun Li; Yu Tsao
For hearing aid (HA) devices, speech enhancement (SE) is an essential unit aiming to improve signal-to-noise ratio (SNR) and quality of speech signals. Previous studies, however, indicated that user experience with current HAs was not fully satisfactory in noisy environments, suggesting that there is still room for improvement of SE in HA devices. This study proposes a novel discriminative post-filter (DPF) approach to further enhance the SNR and quality of SE processed speech signals. The DPF uses a filter to increase the energy contrast (discrimination) of speech and noise segments in a noisy utterance. In this way, SNR and sound quality of speech signals can be improved, and annoying musical noises can be suppressed. To verify the effectiveness of DPF, the present study integrates DPF with a previously proposed generalized maximum a posteriori spectral amplitude estimation (GMAPA) SE method. Experimental results demonstrated that when comparing to GMAPA alone, this integration can further improve output SNR and perceptual evaluation of speech quality (PESQ) scores and effectively suppress musical noises across various noisy conditions. Due to its low-complexity, low-latency, and high-performance, DPF can be suitably integrated in HA devices, where computational efficiency, power consumption, and effectiveness are major considerations.
International journal of innovation, management and technology | 2012
Jeih-weih Hung; Hao-Teng Fan; Syu-Siang Wang
—This paper proposes three novel noise robustness techniques for speech recognition based on discrete wavelet transform (DWT), which are wavelet filter cepstral coefficients (WFCCs), sub-band power normalization (SBPN), and lowpass filtering plus zero interpolation (LFZI). According to our experiments, the proposed WFCC is found to provide a more robust c0 (the zeroth ceptral coefficient) for speech recognition, and with the proper integration of WFCCs and the conventional MFCCs, the resulting compound features can enhance the recognition accuracy. Second, the SBPN procedure is found to reduce the power mismatch within each modulation spectral sub-band, and thus to improve the recognition accuracy significantly. Finally, the third technique, LFZI, can reduce the storage space for speech features, while it is still helpful in speech recognition under noisy conditions.
Collaboration
Dive into the Syu-Siang Wang's collaboration.
National Institute of Information and Communications Technology
View shared research outputs