Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marzieh Razavi is active.

Publication


Featured researches published by Marzieh Razavi.


international conference on acoustics, speech, and signal processing | 2014

On Modeling Context-Dependent Clustered States: Comparing HMM/GMM, Hybrid HMM/ANN and KL-HMM Approaches

Marzieh Razavi; Ramya Rasipuram; Mathew Magimai-Doss

Deep architectures have recently been explored in hybrid hidden Markov model/artificial neural network (HMM/ANN) framework where the ANN outputs are usually the clustered states of context-dependent phones derived from the best performing HMM/Gaussian mixture model (GMM) system. We can view a hybrid HMM/ANN system as a special case of recently proposed Kullback-Leibler divergence based hidden Markov model (KL-HMM) approach. In KL-HMM approach a probabilistic relationship between the ANN outputs and the context-dependent HMM states is modeled. In this paper, we show that in KL-HMM framework we may not require as many clustered states as the best HMM/GMM system in the ANN output layer. Our experimental results on German part of Media-Parl database show that KL-HMM system achieves better performance compared to hybrid HMM/ANN and HMM/GMM systems with much fewer number of clustered states than is required for HMM/GMM system. The reduction in number of clustered states has broader implications on model complexity and data sparsity issues.


international conference on acoustics, speech, and signal processing | 2015

An HMM-based formalism for automatic subword unit derivation and pronunciation generation

Marzieh Razavi; Mathew Magimai.-Doss

We propose a novel hidden Markov model (HMM) formalism for automatic derivation of subword units and pronunciation generation using only transcribed speech data. In this approach, the subword units are derived from the clustered context-dependent units in a grapheme based system using maximum-likelihood criterion. The subword unit based pronunciations are then learned in the framework of Kullback-Leibler divergence based HMM. The automatic speech recognition (ASR) experiments on WSJ0 English corpus show that the approach leads to 12.7% relative reduction in word error rate compared to grapheme-based system. Our approach can be beneficial in reducing the need for expert knowledge in development of ASR as well as text-to-speech systems.


ieee automatic speech recognition and understanding workshop | 2013

Probabilistic lexical modeling and unsupervised training for zero-resourced ASR

Ramya Rasipuram; Marzieh Razavi; Mathew Magimai-Doss

Standard automatic speech recognition (ASR) systems rely on transcribed speech, language models, and pronunciation dictionaries to achieve state-of-the-art performance. The unavailability of these resources constrains the ASR technology to be available for many languages. In this paper, we propose a novel zero-resourced ASR approach to train acoustic models that only uses list of probable words from the language of interest. The proposed approach is based on Kullback-Leibler divergence based hidden Markov model (KL-HMM), grapheme subword units, knowledge of grapheme-to-phoneme mapping, and graphemic constraints derived from the word list. The approach also exploits existing acoustic and lexical resources available in other resource rich languages. Furthermore, we propose unsupervised adaptation of KL-HMM acoustic model parameters if untranscribed speech data in the target language is available. We demonstrate the potential of the proposed approach through a simulated study on Greek language.


international conference on acoustics, speech, and signal processing | 2015

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling

Ramya Rasipuram; Marzieh Razavi; Mathew Magimai-Doss

Standard automatic speech recognition (ASR) systems use phoneme-based pronunciation lexicon prepared by linguistic experts. When the hand crafted pronunciations fail to cover the vocabulary of a new domain, a grapheme-to-phoneme (G2P) converter is used to extract pronunciations for new words and then a phoneme based ASR system is trained. G2P converters are typically trained only on the existing lexicons. In this paper, we propose a grapheme based ASR approach in the framework of probabilistic lexical modeling that integrates pronunciation learning as a stage in ASR system training, and exploits both acoustic and lexical resources (not necessarily from the domain or language of interest). The proposed approach is evaluated on four lexical resource constrained ASR tasks and compared with the conventional two stage approach where G2P training is followed by ASR system development.


IEEE Signal Processing Letters | 2017

A Posterior-Based Multistream Formulation for G2P Conversion

Marzieh Razavi; Mathew Magimai-Doss

In the literature, a number of approaches have been proposed for learning grapheme-to-phoneme (G2P) relationship and inferring pronunciations. In this letter, we present a novel multistream framework for G2P conversion, where various machine learning techniques providing different estimates of probability of phonemes given graphemes can be effectively combined during pronunciation inference. More precisely, analogous to multistream automatic speech recognition, the framework involves obtaining different streams of estimates of probability of phonemes given graphemes, combining them based on probability combination rules, and inferring pronunciations by decoding the probabilities resulting after combination. We demonstrate the potential of the proposed approach by combining probabilities estimated by the state-of-the-art conditional random field-based G2P conversion approach and acoustic data-driven G2P conversion approach in the Kullback–Leibler-divergence-based hidden Markov model framework on the PhoneBook 600-word task.


conference of the international speech communication association | 2016

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery

Marzieh Razavi; Mathew Magimai-Doss

Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on languageindependent data; and then training a lexical model that captures grapheme-to-multilingual phone relationships on the target language data. In this paper, we show that such an approach can be employed to discover a latent space of subword units for under-resourced languages, and subsequently improve the performance of the ASR system through both acoustic and lexical model adaptation. Specifically, we present two approaches to discover the latent space: (1) inference of a subset of the multilingual phone set based on the learned graphemeto-multilingual phone relationships, and (2) derivation of automatic subword unit space based on clustering of the graphemeto-multilingual phone relationships. Experimental studies on Scottish Gaelic, a truly under-resourced language, show that both approaches lead to significant performance improvements, with the latter approach yielding the best system.


conference of the international speech communication association | 2014

On Recognition of Non-Native Speech Using Probabilistic Lexical Model

Marzieh Razavi; Mathew Magimai-Doss


4th Biennial Workshop on Less-Resourced Languages | 2015

Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic

Marzieh Razavi; Ramya Rasipuram; Mathew Magimai.-Doss


Archive | 2015

On the Application of Automatic Subword Unit Derivation and Pronunciation Generation for Under-Resourced Language ASR: A Study on Scottish Gaelic

Marzieh Razavi; Ramya Rasipuram; Mathew Magimai.-Doss


Archive | 2015

Posterior-Based Multi-Stream Formulation To Combine Multiple Grapheme-to-Phoneme Conversion Techniques

Marzieh Razavi; Mathew Magimai.-Doss

Collaboration


Dive into the Marzieh Razavi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge