Eric Fosler-Lussier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric Fosler-Lussier is active.

Explore More

Publication

Featured researches published by Eric Fosler-Lussier.

Journal of the Acoustical Society of America | 2003

Effects of disfluencies, predictability, and utterance position on word form variation in English conversation

Alan Bell; Daniel Jurafsky; Eric Fosler-Lussier; Cynthia Girand; Michelle L. Gregory; Daniel Gildea

Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., thi, thaet, aend, inverted-v v) or a more reduced or lenited pronunciation (e.g., thax, thixt, n, ax). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

meeting of the association for computational linguistics | 2003

Discourse Segmentation of Multi-Party Conversation

Michel Galley; Kathleen R. McKeown; Eric Fosler-Lussier; Hongyan Jing

We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. A significant error reduction is obtained by combining the two knowledge sources.

Journal of the American Medical Informatics Association | 2014

A review of approaches to identifying patient phenotype cohorts using electronic health records

Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J. Embi; Noémie Elhadad; Stephen B. Johnson; Albert M. Lai

Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.

international conference on acoustics speech and signal processing | 1998

Combining multiple estimators of speaking rate

Nelson Morgan; Eric Fosler-Lussier

We report progress in the development of a measure of speaking rate that is computed from the acoustic signal. The newest form of our analysis incorporates multiple estimates of rate; besides the spectral moment for a full-band energy envelope that we have previously reported, we also used pointwise correlation between pairs of compressed sub-band energy envelopes. The complete measure, called mrate, has been compared to a reference syllable rate derived from a manually transcribed subset of the Switchboard database. The correlation with transcribed syllable rate is significantly higher than our earlier measure; estimates are typically within 1-2 syllables/second of the reference syllable rate. We conclude by assessing the use of mrate as a detector for rapid speech.

Speech Communication | 1999

Effects of speaking rate and word frequency on pronunciations in conversational speech

Eric Fosler-Lussier; Nelson Morgan

Automatic speech recognition (ASR) systems typically have a static dictionary of word pronunciations for matching acoustic models to words. In this work, we argue that, in fact, pronunciations in spontaneous speech are dynamic and that ASR systems should change models in accordance with contextual factors. Two variables, speaking rate and word frequency, should be particularly promising for determining dynamic pronunciations, according to the linguistic literature. We analyze the relationship between these factors and realized pronunciations through a statistical exploration of the effects of these factors at the word, syllable, and phone levels in the Switchboard corpus. Both increased speaking rate and word likelihood can induce a significant shift in probabilities of the pronunciations of frequent words. However, the interplay between all of these variables in the realization of pronunciations is complex. We also confirm the intuition that variations in these factors correlate with changes in ASR system performance for both the Switchboard and Broadcast News corpora.

international conference on acoustics, speech, and signal processing | 2002

Discriminative training of language models for speech recognition

Hong-Kwang Jeff Kuo; Eric Fosler-Lussier; Hui Jiang; Chin-Hui Lee

In this paper we describe how discriminative training can be applied to language models for speech recognition. Language models are important to guide the speech recognition search, particularly in compensating for mistakes in acoustic decoding. A frequently used measure of the quality of language models is the perplexity; however, what is more important for accurate decoding is not necessarily having the maximum likelihood hypothesis, but rather the best separation of the correct string from the competing, acoustically confusible hypotheses. Discriminative training can help to improve language models for the purpose of speech recognition by improving the separation of the correct hypothesis from the competing hypotheses. We describe the algorithm and demonstrate modest improvements in word and sentence error rates on the DARPA Communicator task without any increase in language model complexity.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Conditional Random Fields for Integrating Local Discriminative Classifiers

Jeremy Morris; Eric Fosler-Lussier

Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity in both the automatic speech recognition (ASR) and natural language processing communities because of the different nature of assumptions that are made in predicting sequences of labels compared to the more traditional hidden Markov model (HMM). In the ASR community, CRFs have been employed in a method similar to that of HMMs, using the sufficient statistics of input data to compute the probability of label sequences given acoustic input. In this paper, we explore the application of CRFs to combine local posterior estimates provided by multilayer perceptrons (MLPs) corresponding to the frame-level prediction of phone classes and phonological attribute classes. We compare phonetic recognition using CRFs to an HMM system trained on the same input features and show that the monophone label CRF is able to achieve superior performance to a monophone-based HMM and performance comparable to a 16 Gaussian mixture triphone-based HMM; in both of these cases, the CRF obtains these results with far fewer free parameters. The CRF is also able to better combine these posterior estimators, achieving a substantial increase in performance over an HMM-based triphone system by mixing the two highly correlated sets of phone class and phonetic attribute class posteriors.

Speech Communication | 2002

Connectionist speech recognition of Broadcast News

Anthony J. Robinson; Gary D. Cook; Daniel P. W. Ellis; Eric Fosler-Lussier; Steve Renals; D. A. G. Williams

Abstract This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimation, pronunciation modelling and search. In addition we have investigated a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources. We have incorporated all of these techniques into a system for the transcription of Broadcast News, and we present results on the 1998 DARPA Hub-4E Broadcast News evaluation data.

international conference on natural language generation | 2006

Noun Phrase Generation for Situated Dialogs

Laura Stoia; Darla Magdalene Shockley; Donna K. Byron; Eric Fosler-Lussier

We report on a study examining the generation of noun phrases within a spoken dialog agent for a navigation domain. The task is to provide real-time instructions that direct the user to move between a series of destinations within a large interior space. A subtask within sentence planning is determining what form to choose for noun phrases. This choice is driven by both the discourse history and spatial context features derived from the direction-followers position, e.g. his view angle, distance from the target referent and the number of similar items in view. The algorithm was developed as a decision tree and its output was evaluated by a group of human judges who rated 62.6% of the expressions generated by the system to be as good as or better than the language originally produced by human dialog partners.

international conference on acoustics, speech, and signal processing | 2010

Backpropagation training for multilayer conditional random field based phone recognition

Rohit Prabhavalkar; Eric Fosler-Lussier

Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective combiners of posterior estimates from multilayer perceptrons (MLPs) in phone and word recognition tasks. In this paper, we describe a novel hybrid Multilayer-CRF structure (ML-CRF), where a MLP-like hidden layer serves as input to the CRF; moreover, we propose a technique for directly training the ML-CRF to optimize a conditional log-likelihood based criterion, based on error backpropagation. The proposed technique thus allows for the implicit learning of suitable feature functions for the CRF. We present results for initial phone recognition experiments on the TIMIT database that indicate that our proposed method is a promising approach for training CRFs.

Explore More