Katrin Kirchhoff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katrin Kirchhoff is active.

Explore More

Publication

Featured researches published by Katrin Kirchhoff.

north american chapter of the association for computational linguistics | 2003

Factored language models and generalized parallel backoff

Jeff A. Bilmes; Katrin Kirchhoff

We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represents words as bundles of features (e.g., morphological classes, stems, data-driven clusters, etc.), and induces a probability model covering sequences of bundles rather than just words. GPB extends standard backoff to general conditional probability tables where variables might be heterogeneous types, where no obvious natural (temporal) backoff order exists, and where multiple dynamic backoff strategies are allowed. These methodologies were implemented during the JHU 2002 workshop as extensions to the SRI language modeling toolkit. This paper provides initial perplexity results on both CallHome Arabic and on Penn Treebank Wall Street Journal articles. Significantly, FLMs with GPB can produce bigrams with significantly lower perplexity, sometimes lower than highly-optimized baseline trigrams. In a multi-pass speech recognition context, where bigrams are used to create first-pass bigram lattices or N-best lists, these results are highly relevant.

Speech Communication | 2002

Combining acoustic and articulatory feature information for robust speech recognition

Katrin Kirchhoff; Gernot A. Fink; Gerhard Sagerer

Abstract The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label “articulatory” include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articulatory features. In this study, we revisit the use of features belonging to the third category. In particular, we concentrate on the potential benefits of pseudo-articulatory features in adverse acoustic environments and on their combination with standard acoustic features. Systems based on articulatory features only and combined acoustic-articulatory systems are tested on two different recognition tasks: telephone-speech continuous numbers recognition and conversational speech recognition. We show that articulatory feature (AF) systems are capable of achieving a superior performance at high noise levels and that the combination of acoustic and AFs consistently leads to a significant reduction of word error rate across all acoustic conditions.

Speech Communication | 2005

Error-correction detection and response generation in a spoken dialogue system

Ivan Bulyko; Katrin Kirchhoff; Mari Ostendorf; J. Goldberg

Speech understanding errors in spoken dialogue systems can be frustrating for users and difficult to recover from in a mixed-initiative spoken dialogue system. Handling such errors requires both detecting error conditions and adjusting the response generation strategy accordingly. In this paper, we show that different response wording choices tend to be associated with different user behaviors that can impact word recognition performance in a telephone-based dialogue system. We leverage these findings in a system that integrates an error correction detection module with a modified dialogue strategy in order to drive the response generation module. In a user study, we find slight preferences for a dialogue system using this error handling strategy over a simple reprompting strategy.

international conference on acoustics, speech, and signal processing | 2005

Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop

Mark Hasegawa-Johnson; James Baker; Sarah Borys; Ken Chen; Emily Coogan; Steven Greenberg; Katrin Kirchhoff; Karen Livescu; Srividya Mohan; Jennifer Muller; M. Kemal Sönmez; Tianyu Wang

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines (SVM); dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an ASR, current theories of human speech perception and phonology. All systems begin with a high-D multiframe acoustic-to-distinctive feature transformation, implemented using SVMs trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the SVMs are then integrated using one of 3 pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a 1st pass recognizer, and the resulting combination score is used to compute a 2nd-pass speech recognition output.

Computer Speech & Language | 2006

Morphology-based language modeling for conversational Arabic speech recognition

Katrin Kirchhoff; Dimitra Vergyri; Jeff A. Bilmes; Kevin Duh; Andreas Stolcke

Language modeling for large-vocabulary conversational Arabic speech recognition is faced with the problem of the complex morphology of Arabic, which increases the perplexity and out-of-vocabulary rate. This problem is compounded by the enormous dialectal variability and differences between spoken and written language. In this paper, we investigate improvements in Arabic language modeling by developing various morphology-based language models. We present four different approaches to morphology-based language modeling, including a novel technique called factored language models. Experimental results are presented for both rescoring and first-pass recognition experiments.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Andreas Stolcke; Barry Y. Chen; H. Franco; Venkata Ramana Rao Gadde; Martin Graciarena; Mei-Yuh Hwang; Katrin Kirchhoff; Arindam Mandal; Nelson Morgan; Xin Lei; Tim Ng; Mari Ostendorf; M. Kemal Sönmez; Anand Venkataraman; Dimitra Vergyri; Wen Wang; Jing Zheng; Qifeng Zhu

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages | 2004

Automatic diacritization of Arabic for acoustic modeling in speech recognition

Dimitra Vergyri; Katrin Kirchhoff

Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate various procedures that enable us to use such training data by automatically inserting the missing diacritics into the transcription. These procedures use acoustic information in combination with different levels of morphological and contextual constraints. We evaluate their performance against manually diacritized transcriptions. In addition, we demonstrate the effect of their accuracy on the recognition performance of acoustic models trained on automatically diacritized training data.

international acm sigir conference on research and development in information retrieval | 2008

Learning to rank with partially-labeled data

Kevin Duh; Katrin Kirchhoff

Ranking algorithms, whose goal is to appropriately order a set of objects/documents, are an important component of information retrieval systems. Previous work on ranking algorithms has focused on cases where only labeled data is available for training (i.e. supervised learning). In this paper, we consider the question whether unlabeled (test) data can be exploited to improve ranking performance. We present a framework for transductive learning of ranking functions and show that the answer is affirmative. Our framework is based on generating better features from the test data (via KernelPCA) and incorporating such features via Boosting, thus learning different ranking functions adapted to the individual test queries. We evaluate this method on the LETOR (TREC, OHSUMED) dataset and demonstrate significant improvements.

empirical methods in natural language processing | 2005

The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments

Jeff A. Bilmes; Xiao Li; Jonathan Malkin; Kelley Kilanski; Richard Wright; Katrin Kirchhoff; Amarnag Subramanya; Susumu Harada; James A. Landay; Patricia Dowden; Howard Jay Chizeck

We present a novel voice-based human-computer interface designed to enable individuals with motor impairments to use vocal parameters for continuous control tasks. Since discrete spoken commands are ill-suited to such tasks, our interface exploits a large set of continuous acoustic-phonetic parameters like pitch, loudness, vowel quality, etc. Their selection is optimized with respect to automatic recognizability, communication bandwidth, learnability, suitability, and ease of use. Parameters are extracted in real time, transformed via adaptation and acceleration, and converted into continuous control signals. This paper describes the basic engine, prototype applications (in particular, voice-based web browsing and a controlled trajectory-following task), and initial user studies confirming the feasibility of this technology.

international conference on spoken language processing | 1996

Syllable-level desynchronisation of phonetic features for speech recognition

Katrin Kirchhoff

Describes a novel approach to speech recognition which is based on phonetic features as basic recognition units and the delayed synchronisation of these features within a higher-level prosodic domain, viz. the syllable. The object of this approach is to avoid a rigid segmentation of the speech signal as it is usually carried out by standard segment-based recognition systems. The architectural setup of the system is described, as well as evaluation tests carried out on a medium-sized corpus of spontaneous speech (German). Syllable and phoneme recognition results are given and compared to recognition rates obtained by a standard triphone-based HMM recogniser trained and tested on the same data set.

Explore More