Michael Levit | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Levit is active.

Explore More

Publication

Featured researches published by Michael Levit.

EELC'06 Proceedings of the Third international conference on Emergence and Evolution of Linguistic Communication: symbol Grounding and Beyond | 2006

The human speechome project

Deb Roy; Rupal Patel; Philip DeCamp; Rony Kubat; Michael Fleischman; Brandon Cain Roy; Nikolaos Mavridis; Stefanie Tellex; Alexia Salata; Jethran Guinness; Michael Levit; Peter Gorniak

The Human Speechome Project is an effort to observe and computationally model the longitudinal course of language development for a single child at an unprecedented scale. We are collecting audio and video recordings for the first three years of one childs life, in its near entirety, as it unfolds in the childs home. A network of ceiling-mounted video cameras and microphones are generating approximately 300 gigabytes of observational data each day from the home. One of the worlds largest single-volume disk arrays is under construction to house approximately 400,000 hours of audio and video recordings that will accumulate over the three year study. To analyze the massive data set, we are developing new data mining technologies to help human analysts rapidly annotate and transcribe recordings using semi-automatic methods, and to detect and visualize salient patterns of behavior and interaction. To make sense of large-scale patterns that span across months or even years of observations, we are developing computational models of language acquisition that are able to learn from the childs experiential record. By creating and evaluating machine learning systems that step into the shoes of the child and sequentially process long stretches of perceptual experience, we will investigate possible language learning strategies used by children with an emphasis on early word learning.

ieee automatic speech recognition and understanding workshop | 2013

Accelerating recurrent neural network training via two stage classes and parallelization

Zhiheng Huang; Geoffrey Zweig; Michael Levit; Benoit Dumoulin; Barlas Oguz; Shawn Chang

Recurrent neural network (RNN) language models have proven to be successful to lower the perplexity and word error rate in automatic speech recognition (ASR). However, one challenge to adopt RNN language models is due to their heavy computational cost in training. In this paper, we propose two techniques to accelerate RNN training: 1) two stage class RNN and 2) parallel RNN training. In experiments on Microsoft internal short message dictation (SMD) data set, two stage class RNNs and parallel RNNs not only result in equal or lower WERs compared to original RNNs but also accelerate training by 2 and 10 times respectively. It is worth noting that two stage class RNN speedup can also be applied to test stage, which is essential to reduce the latency in real time ASR applications.

ieee automatic speech recognition and understanding workshop | 2009

Garbage modeling with decoys for a sequential recognition scenario

Michael Levit; Shuangyu Chang; Bruce Melvin Buntschuh

This paper is concerned with a speech recognition scenario where two unequal ASR systems, one fast with constrained resources, the other significantly slower but also much more powerful, work together in a sequential manner. In particular, we focus on decisions when to accept the results of the first recognizer and when the second recognizer needs to be consulted. As a kind of application-dependent garbage modeling, we suggest an algorithm that augments the grammar of the first recognizer with those valid paths through the language model of the second recognizer that are confusable with the phrases from this grammar. We show how this algorithm outperforms a system that only looks at recognition confidences by about 20% relative.

international conference on acoustics speech and signal processing | 1999

Discriminative estimation of interpolation parameters for language model classifiers

Volker Warnke; Stefan Harbeck; Elmar Nöth; Heinrich Niemann; Michael Levit

In this paper we present a new approach for estimating the interpolation parameters of language models (LM) which are used as classifiers. With the classical maximum likelihood (ML) estimation theoretically one needs to have a huge amount of data and the fundamental density assumption has to be correct. Usually one of these conditions is violated, so different optimization techniques like maximum mutual information (MMI) and minimum classification error (MCE) can be used instead, where the interpolation parameters are not optimized on their own but in consideration of all models together. In this paper we present how MCE and MMI techniques can be applied to two different kind of interpolation strategies: the linear interpolation, which is the standard interpolation method and the rational interpolation. We compare ML, MCE and MMI on the German part of the Verbmobil corpus, where we get a reduction of 3% of classification error when discriminating between 18 dialog act classes.

ieee automatic speech recognition and understanding workshop | 2007

Integrating several annotation layers for statistical information distillation

Michael Levit; Dilek Hakkani-Tür; Gökhan Tür; Daniel Gillick

We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources. For each sentence of the relevant documents (that are assumed to be known from the upstream stages) we employ statistical classification methods to estimate the extent of its relevance to the query, whereby two aspects of relevance are taken into account: the template (type) of the query and its slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered). The idiosyncrasy of the presented method is in the choice of features used for classification. We extract our features from charts, compilations of elements from various annotation levels, such as word transcriptions, syntactic and semantic parses, and Information Extraction annotations. In our experiments we show that this integrated approach outperforms a purely lexical baseline by as much as 30% relative in terms of F-measure. We also investigate the algorithms behavior under noisy conditions, by comparing its performance on ASR output and on corresponding manual transcriptions.

international conference on acoustics, speech, and signal processing | 2015

Token-level interpolation for class-based language models

Michael Levit; Andreas Stolcke; Shuangyu Chang; Sarangarajan Parthasarathy

We describe a method for interpolation of class-based n-gram language models. Our algorithm is an extension of the traditional EM-based approach that optimizes perplexity of the training set with respect to a collection of n-gram language models linearly combined in the probability space. However, unlike prior work, it naturally supports context-dependent interpolation for class-based LMs. In addition, the method works naturally with the recently introduced wordphrase- entity (WPE) language models that unify words, phrases and entities into a single statistical framework. Applied to the Calendar scenario of the Personal Assistant domain, our method achieved significant perplexity reduction and improved word error rates.

international conference on acoustics, speech, and signal processing | 2012

End-to-end speech recognition accuracy metric for voice-search tasks

Michael Levit; Shuangyu Chang; Bruce Melvin Buntschuh; Nick Kibre

We introduce a novel metric for speech recognition success in voice search tasks, designed to reflect the impact of speech recognition errors on users overall experience with the system. The computation of the metric is seeded using intuitive labels from human subjects and subsequently automated by replacing human annotations with a machine learning algorithm. The results show that search-based recognition accuracy is significantly higher than accuracy based on sentence error rate computation, and that the automated system is very successful in replicating human judgments regarding search quality results.

Computer Speech & Language | 2009

IXIR: A statistical information distillation system

Michael Levit; Dilek Hakkani-Tür; Gökhan Tür; Daniel Gillick

The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that are relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machines. The distinguishing contribution of the approach is a novel method to generate classification features. The features are extracted from charts, compilations of elements from various annotation layers, such as word transcriptions, syntactic and semantic parses, and information extraction (IE) annotations. We describe a procedure for creating charts from documents and queries, while paying special attention to query slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered), and suggest various types of classification features that can be extracted from these charts. While observing a 30% relative improvement due to non-lexical annotation layers, we perform a detailed analysis of the contributions of each of these layers to classification performance.

international conference on acoustics, speech, and signal processing | 2008

An iterative unsupervised learning method for information distillation

Kamand Kamangar; Dilek Hakkani-Tür; Gökhan Tür; Michael Levit

Information distillation techniques are used to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interest to the user. In this work, we propose an iterative unsupervised sentence extraction method to answer open-ended natural language queries about an event. The approach consists of finding the subset of sentences that are very likely to be relevant or irrelevant for the query from candidate documents, and iteratively training a classification model using these examples. Our results indicate that performance of the system may be improved by around 30% relative in terms of F-measure, by using the proposed method.

ieee automatic speech recognition and understanding workshop | 2015

Discriminative training of context-dependent language model scaling factors and interpolation weights

Shuangyu Chang; Abhik Lahiri; Issac Alphonso; Barlas Oguz; Michael Levit; Benoit Dumoulin

We demonstrate how context-dependent language model scaling factors and interpolation weights can be unified in a single formulation where free parameters are discriminatively trained using linear and non-linear optimization. Objective functions of the optimization are defined based on pairs of superior and inferior recognition hypotheses and correlate well with recognition error metrics. Experiments on a large, real world application demonstrated the effectiveness of the solution in significantly reducing recognition errors, by leveraging the benefits of both context-dependent weighting and discriminative training.

Explore More