Shabnam Ghaffarzadegan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shabnam Ghaffarzadegan is active.

Explore More

Publication

Featured researches published by Shabnam Ghaffarzadegan.

international conference on acoustics, speech, and signal processing | 2014

UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech

Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen

This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7 % absolute WER reduction over the baseline system trained on neutral speech, and a 1.3 % reduction over a baseline system with whisper-adapted acoustic models.

international conference on acoustics, speech, and signal processing | 2015

Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition

Shabnam Ghaffarzadegan; Hynek Boril; John H. L. Hansen

The lack of available large corpora of transcribed whispered speech is one of the major roadblocks for development of successful whisper recognition engines. Our recent study has introduced a Vector Taylor Series (VTS) approach to pseudo-whisper sample generation which requires availability of only a small number of real whispered utterances to produce large amounts of whisper-like samples from easily accessible transcribed neutral recordings. The pseudo-whisper samples were found particularly effective in adapting a neutral-trained recognizer to whisper. Our current study explores the use of denoising autoencoders (DAE) for pseudo-whisper sample generation. Two types of generative models are investigated - one which produces pseudo-whispered cepstral vectors on a frame basis and another which generates pseudo-whisper statistics of whole phone segments. It is shown that the DAE approach considerably reduces word error rates of the baseline system as well as the system adapted on real whisper samples. The DAE approach provides competitive results to the VTS-based method while cutting its computational overhead nearly in half.

international conference on acoustics, speech, and signal processing | 2016

Exploring deep learning architectures for automatically grading non-native spontaneous speech

Jidong Tao; Shabnam Ghaffarzadegan; Lei Chen; Klaus Zechner

We investigate two deep learning architectures reported to have superior performance in ASR over the conventional GMM system, with respect to automatic speech scoring. We use an approximately 800-hour large-vocabulary non-native spontaneous English corpus to build three ASR systems. One system is in GMM, and two are in deep learning architectures - namely, DNN and Tandem with bottleneck features. The evaluation results show that the both deep learning systems significantly outperform the GMM ASR. These ASR systems are used as the front-end in building an automated speech scoring system. To examine the effectiveness of the deep learning ASR systems for automated scoring, another non-native spontaneous speech corpus is used to train and evaluate the scoring models. Using deep learning architectures, ASR accuracies drop significantly on the scoring corpus, whereas the performance of the scoring systems get closer to human raters, and consistently better than the GMM one. Compared to the DNN ASR, the Tandem performs slightly better on the scoring speech while it is a little less accurate on the ASR evaluation dataset. Furthermore, given the results of the improved scoring performance while using fewer scoring features, the Tandem system shows more robustness for scoring task than the DNN one.

international conference on acoustics, speech, and signal processing | 2015

Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation

Oldooz Hazrati; Shabnam Ghaffarzadegan; John H. L. Hansen

Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition in a variety of adverse environments, including reverberation. In this study, we exploit advancements seen in ASR technology for alternative formulated solutions to benefit CI users. Specifically, an ASR system is developed using multicondition training on speech data with different reverberation characteristics (e.g., T60 values), resulting in low word error rates (WER) in reverberant conditions. A speech synthesizer is then utilized to generate speech waveforms from the output of the ASR system, from which the synthesized speech is presented to CI listeners. The effectiveness of this hybrid recognition-synthesis CI strategy is evaluated under moderate to highly reverberant conditions (i.e., T60 = 0.3, 0.6, 0.8, and 1.0s) using speech material extracted from the TIMIT corpus. Experimental results confirm the effectiveness of multi-condition training on performance of the ASR system in reverberation, which consequently results in substantial speech intelligibility gains for CI users in reverberant environments.

conference of the international speech communication association | 2014