Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Maider Lehr is active.

Publication


Featured researches published by Maider Lehr.


international conference on acoustics, speech, and signal processing | 2010

Discriminatively estimated joint acoustic, duration, and language model for speech recognition

Maider Lehr; Izhak Shafran

We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model for speech recognition G is a finite state transduction from acoustic state sequences to word sequences (e.g., search graph in many speech recognizers). The lattices from a baseline recognizer can be viewed as an a posteriori version of G after having observed an utterance. So far, discriminative language models have been proposed to correct the output side of G and is applied on the lattices. The acoustic state sequences on the input side of these lattice can also be exploited to improve the choice of the best hypotheses through the lattice. Taking this view, the model proposed in this paper jointly estimates the parameters for acoustic and language components in a discriminative setting. The resulting model can be factored as corrections for the input and the output sides of the general model G. This formulation allows us to incorporate duration cues seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.6% absolute. Through a series of experiments we analyze the contributions from and interactions between acoustic, duration and language components to find that duration cues play an important role in Arabic task.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

Maider Lehr; Izhak Shafran

Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n -grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.


international conference on acoustics, speech, and signal processing | 2012

Hallucinated n-best lists for discriminative language modeling

Kenji Sagae; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.


international conference on acoustics, speech, and signal processing | 2012

Semi-supervised discriminative language modeling for Turkish ASR

Arda Çelebi; Hasim Sak; Erinç Dikici; Murat Saraclar; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Kenji Sagae; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.


international conference on acoustics, speech, and signal processing | 2012

Continuous space discriminative language modeling

Puyang Xu; Sanjeev Khudanpur; Maider Lehr; Emily Prud'hommeaux; Nathan Glenn; Damianos Karakos; Brian Roark; Kenji Sagae; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are extracted automatically through the use of convolutional layers. Experimental results show that as a stand-alone model the continuous space model yields significantly lower word error rate (1% absolute), while having a much more compact parameterization (60%-90% smaller). If the baseline scores are combined, our approach performs equally well.


international conference on acoustics, speech, and signal processing | 2011

Discriminatively estimated discrete, parametric and smoothed-discrete duration models for speech recognition

Maider Lehr; Izhak Shafran

Duration of phonemic segments provide important cues for distinguishing words in languages such as Arabic. Recently, we proposed a discriminatively estimated joint acoustic, duration and language model for large vocabulary speech recognition [1]. In that work, we found simple discrete models to be effective for modeling duration, albeit they were neither smoothed nor parsimonious. These limitations are addressed here with two alternative models - parametric and smoothed-discrete models. Unlike previous work on parametric duration model, we estimate their parameters discriminatively and derive an analytical expression for estimating the parameters of a log-normal distribution using a recent approach [2]. On a large vocabulary Arabic task, we empirically evaluated different segmental units and durations models. Our results show bigrams of clustered states modeled with smoothed-discrete duration models are relatively more accurate and efficient than other models considered.


conference of the international speech communication association | 2012

Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment.

Maider Lehr; Emily Prud'hommeaux; Izhak Shafran; Brian Roark


north american chapter of the association for computational linguistics | 2013

Discriminative Joint Modeling of Lexical Variation and Acoustic Confusion for Automated Narrative Retelling Assessment

Maider Lehr; Izhak Shafran; Emily Prud'hommeaux; Brian Roark


conference of the international speech communication association | 2014

Discriminative pronunciation modeling for dialectal speech recognition

Maider Lehr; Kyle Gorman; Izhak Shafran


conference of the international speech communication association | 2012

Deriving conversation-based features from unlabeled speech for discriminative language modeling.

Damianos Karakos; Brian Roark; Izhak Shafran; Kenji Sagae; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Sanjeev Khudanpur; Murat Saraclar; Daniel M. Bikel; Mark Dredze; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

Collaboration


Dive into the Maider Lehr's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenji Sagae

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darcey Riley

University of Rochester

View shared research outputs
Researchain Logo
Decentralizing Knowledge