Shabnam Ghaffarzadegan
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shabnam Ghaffarzadegan.
international conference on acoustics, speech, and signal processing | 2014
Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen
This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7 % absolute WER reduction over the baseline system trained on neutral speech, and a 1.3 % reduction over a baseline system with whisper-adapted acoustic models.
international conference on acoustics, speech, and signal processing | 2015
Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen
The lack of available large corpora of transcribed whispered speech is one of the major roadblocks for development of successful whisper recognition engines. Our recent study has introduced a Vector Taylor Series (VTS) approach to pseudo-whisper sample generation which requires availability of only a small number of real whispered utterances to produce large amounts of whisper-like samples from easily accessible transcribed neutral recordings. The pseudo-whisper samples were found particularly effective in adapting a neutral-trained recognizer to whisper. Our current study explores the use of denoising autoencoders (DAE) for pseudo-whisper sample generation. Two types of generative models are investigated - one which produces pseudo-whispered cepstral vectors on a frame basis and another which generates pseudo-whisper statistics of whole phone segments. It is shown that the DAE approach considerably reduces word error rates of the baseline system as well as the system adapted on real whisper samples. The DAE approach provides competitive results to the VTS-based method while cutting its computational overhead nearly in half.
international conference on acoustics, speech, and signal processing | 2016
Jidong Tao; Shabnam Ghaffarzadegan; Lei Chen; Klaus Zechner
We investigate two deep learning architectures reported to have superior performance in ASR over the conventional GMM system, with respect to automatic speech scoring. We use an approximately 800-hour large-vocabulary non-native spontaneous English corpus to build three ASR systems. One system is in GMM, and two are in deep learning architectures - namely, DNN and Tandem with bottleneck features. The evaluation results show that the both deep learning systems significantly outperform the GMM ASR. These ASR systems are used as the front-end in building an automated speech scoring system. To examine the effectiveness of the deep learning ASR systems for automated scoring, another non-native spontaneous speech corpus is used to train and evaluate the scoring models. Using deep learning architectures, ASR accuracies drop significantly on the scoring corpus, whereas the performance of the scoring systems get closer to human raters, and consistently better than the GMM one. Compared to the DNN ASR, the Tandem performs slightly better on the scoring speech while it is a little less accurate on the ASR evaluation dataset. Furthermore, given the results of the improved scoring performance while using fewer scoring features, the Tandem system shows more robustness for scoring task than the DNN one.
international conference on acoustics, speech, and signal processing | 2015
Oldooz Hazrati; Shabnam Ghaffarzadegan; John H. L. Hansen
Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition in a variety of adverse environments, including reverberation. In this study, we exploit advancements seen in ASR technology for alternative formulated solutions to benefit CI users. Specifically, an ASR system is developed using multicondition training on speech data with different reverberation characteristics (e.g., T60 values), resulting in low word error rates (WER) in reverberant conditions. A speech synthesizer is then utilized to generate speech waveforms from the output of the ASR system, from which the synthesized speech is presented to CI listeners. The effectiveness of this hybrid recognition-synthesis CI strategy is evaluated under moderate to highly reverberant conditions (i.e., T60 = 0.3, 0.6, 0.8, and 1.0s) using speech material extracted from the TIMIT corpus. Experimental results confirm the effectiveness of multi-condition training on performance of the ASR system in reverberation, which consequently results in substantial speech intelligibility gains for CI users in reverberant environments.
conference of the international speech communication association | 2014
Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen
conference of the international speech communication association | 2017
Shabnam Ghaffarzadegan; Attila Reiss; Mirko Ruhs; Robert Duerichen; Zhe Feng
International Journal of Speech Technology | 2017
Shabnam Ghaffarzadegan; Hynek Bořil; John H. L. Hansen
international conference on acoustics, speech, and signal processing | 2018
Lei Chen; Jidong Tao; Shabnam Ghaffarzadegan; Yao Qian
conference of the international speech communication association | 2018
Ahmed Imtiaz Humayun; Md. Tauhiduzzaman Khan; Shabnam Ghaffarzadegan; Zhe Feng; Taufiq Hasan