Thilo Pfau
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thilo Pfau.
international conference on acoustics, speech, and signal processing | 2003
Adam Janin; Don Baron; Jane Edwards; Daniel P. W. Ellis; David Gelbart; Nelson Morgan; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters
We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute (ICSI) in Berkeley, California over the last three years. The corpus contains audio recorded simultaneously from head-worn and table-top microphones, word-level transcripts of meetings, and various metadata on participants, meetings, and hardware. Such a corpus supports work in automatic speech recognition, noise robustness, dialog modeling, prosody, rich transcription, information retrieval, and more. We present details on the contents of the corpus, as well as rationales for the decisions that led to its configuration. The corpus were delivered to the Linguistic Data Consortium (LDC).
international conference on human language technology research | 2001
Nelson Morgan; Don Baron; Jane Edwards; Daniel P. W. Ellis; David Gelbart; Adam Janin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke
In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings. The work includes a substantial data collection and transcription effort, and has required a nontrivial degree of infrastructure development. We are undertaking this because the new task area provides a significant challenge to current HLT capabilities, while offering the promise of a wide range of potential applications. In this paper, we give our vision of the task, the challenges it represents, and the current state of our development, with particular attention to automatic transcription.
ieee automatic speech recognition and understanding workshop | 2001
Thilo Pfau; Daniel P. W. Ellis; Andreas Stolcke
As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.
international conference on acoustics, speech, and signal processing | 2003
Nelson Morgan; Don Baron; Sonali Bhagat; Hannah Carvey; Rajdip Dhillon; Jane Edwards; David Gelbart; Adam Janin; Ashley Krupski; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters
In early 2001, we reported (at the Human Language Technology meeting) the early stages of an ICSI (International Computer Science Institute) project on processing speech from meetings (in collaboration with other sites, principally SRI, Columbia, and UW). We report our progress from the first few years of this effort, including: the collection and subsequent release of a 75-meeting corpus (over 70 meeting-hours and up to 16 channels for each meeting); the development of a prosodic database for a large subset of these meetings, and its subsequent use for punctuation and disfluency detection; the development of a dialog annotation scheme and its implementation for a large subset of the meetings; and the improvement of both near-mic and far-mic speech recognition results for meeting speech test sets.
international conference on acoustics speech and signal processing | 1998
Thilo Pfau; Günther Ruske
We present a new feature-based method for estimating the speaking rate by detecting vowels in continuous speech. The features used are the modified loudness and the zerocrossing rate which are both calculated in the standard preprocessing unit of our speech recognition system. As vowels in general correspond to syllable nuclei, the feature-based vowel rate is comparable to an estimate of the lexically-based syllable rate. The vowel detector presented is tested on the spontaneously spoken German Verbmobil task and is evaluated using manually transcribed data. The lowest vowel error rate (including insertions) on the defined test set is 22.72% on average over all vowels. Additionally correlation coefficients between our estimates and reference rates are calculated. These coefficients reach up to 0.796 and therefore are comparable to those for lexically-based measures (like the phone rate) on other tasks. The accuracy is sufficient to use our measurement for speaking rate adaptation.
Tagungsband ITG-Fachtagung "Sprachkommunikation" | 2011
Thilo Pfau; Günther Ruske
KURZFASSUNG Das hier vorgestellte inkrementell arbeitende Verfahren zur Generierung von Worthypothesengraphen (WHG) ermöglicht es, Worthypothesengraphen schritthaltend während des Suchvorgangs zu erzeugen und direkt an höhere Verarbeitungsstufen weiterzuleiten. Eine Pufferung der von der Suche emittierten Wortfolgehypothesen macht es möglich, sinnvolle lokale Entscheidungen über die Aufnahme von Worthypothesen in den WHG zu treffen. Weitere Maßnahmen wie ein zusätzliches Pruning auf Wortebene vermindern die Redundanz der im Worthypothesengraph enthaltenen Information und wirken sich günstig auf Speicherund Rechenzeitbedarf des Systems aus. Darüberhinaus kann mithilfe des entwickelten Mechanismus die Größe des WHG je nach Anwendungsgebiet unabhängig von der Suchraumgröße eingestellt werden.
Archive | 2001
Thilo Pfau; Daniel P. W. Ellis
conference of the international speech communication association | 1999
Robert Faltlhauser; Thilo Pfau; Günther Ruske
international conference on acoustics, speech, and signal processing | 2000
Matthias Thomae; Günther Ruske; Thilo Pfau
Forum phoneticum | 2000
Thilo Pfau; Manfred Beham; Wolfgang Reichl; Günther Ruske