Is this you? Create Your Porfile

Karen Livescu

Toyota Technological Institute at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karen Livescu is active.

Explore More

Publication

Featured researches published by Karen Livescu.

international conference on machine learning | 2009

Multi-view clustering via canonical correlation analysis

Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan

Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).

meeting of the association for computational linguistics | 2014

Tailoring Continuous Word Representations for Dependency Parsing

Mohit Bansal; Kevin Gimpel; Karen Livescu

Word representations have proven useful for many NLP tasks, e.g., Brown clusters as features in dependency parsing (Koo et al., 2008). In this paper, we investigate the use of continuous word representations as features for dependency parsing. We compare several popular embeddings to Brown clusters, via multiple types of features, in both news and web domains. We find that all embeddings yield significant parsing gains, including some recent ones that can be trained in a fraction of the time of others. Explicitly tailoring the representations for the task leads to further improvements. Moreover, an ensemble of all representations achieves the best results, suggesting their complementarity.

Journal of the Acoustical Society of America | 2007

Speech production knowledge in automatic speech recognition

Simon King; Joe Frankel; Karen Livescu; Erik McDermott; Korin Richmond; Mirjam Wester

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

international conference on acoustics, speech, and signal processing | 2005

Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop

Mark Hasegawa-Johnson; James Baker; Sarah Borys; Ken Chen; Emily Coogan; Steven Greenberg; Katrin Kirchhoff; Karen Livescu; Srividya Mohan; Jennifer Muller; M. Kemal Sönmez; Tianyu Wang

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines (SVM); dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an ASR, current theories of human speech perception and phonology. All systems begin with a high-D multiframe acoustic-to-distinctive feature transformation, implemented using SVMs trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the SVMs are then integrated using one of 3 pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a 1st pass recognizer, and the resulting combination score is used to compute a 2nd-pass speech recognition output.

allerton conference on communication, control, and computing | 2012

Stochastic optimization for PCA and PLS

Raman Arora; Andrew Cotter; Karen Livescu; Nathan Srebro

We study PCA, PLS, and CCA as stochastic optimization problems, of optimizing a population objective based on a sample. We suggest several stochastic approximation (SA) methods for PCA and PLS, and investigate their empirical performance.

international conference on computer vision | 2005

Visual speech recognition with loosely synchronized feature streams

Kate Saenko; Karen Livescu; Michael R. Siracusa; Kevin W. Wilson; James R. Glass; Trevor Darrell

We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulate features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulate production. These components often evolve in a semi-independent fashion, and conventional viseme-based approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utterance task. We show comparative results on lip detection and speech/non-speech classification, as well as recognition performance against several baseline systems

Speech Communication | 2005

PRONUNCIATION MODELING USING A FINITE-STATE TRANSDUCER REPRESENTATION

Timothy J. Hazen; I. Lee Hetherington; Han Shu; Karen Livescu

Abstract The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.

international conference on acoustics, speech, and signal processing | 2000

Lexical modeling of non-native speech for automatic speech recognition

Karen Livescu; James R. Glass

The paper examines the recognition of non-native speech in JUPITER, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-native data with a single model. In particular, the paper describes an attempt to better model non-native lexical patterns. These patterns are incorporated by applying context-independent phonetic confusion rules, whose probabilities are estimated from training data. Using this approach, the word error rate on a non-native test set is reduced from 20.9% to 18.8%.

north american chapter of the association for computational linguistics | 2015

Deep Multilingual Correlation for Improved Word Embeddings.

Ang Lu; Weiran Wang; Mohit Bansal; Kevin Gimpel; Karen Livescu

Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddingsfromtwo languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of the two languages, using the recently proposed deep canonical correlation analysis. The resulting embeddings, when evaluated on multiple word and bigram similarity tasks, consistently improve over monolingual embeddings and over embeddings transformed with linear CCA.

international conference on acoustics, speech, and signal processing | 2013

Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains

Raman Arora; Karen Livescu

Canonical correlation analysis (CCA) and kernel CCA can be used for unsupervised learning of acoustic features when a second view (e.g., articulatory measurements) is available for some training data, and such projections have been used to improve phonetic frame classification. Here we study the behavior of CCA-based acoustic features on the task of phonetic recognition, and investigate to what extent they are speaker-independent or domain-independent. The acoustic features are learned using data drawn from the University of Wisconsin X-ray Microbeam Database (XRMB). The features are evaluated within and across speakers on XRMB data, as well as on out-of-domain TIMIT and MOCHA-TIMIT data. Experimental results show consistent improvement with the learned acoustic features over baseline MFCCs and PCA projections. In both speaker-dependent and cross-speaker experiments, phonetic error rates are improved by 4-9% absolute (10-23% relative) using CCA-based features over baseline MFCCs. In cross-domain phonetic recognition (training on XRMB and testing on MOCHA or TIMIT), the learned projections provide smaller improvements.

Explore More