Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Gelbart is active.

Publication


Featured researches published by David Gelbart.


international conference on acoustics, speech, and signal processing | 2003

The ICSI Meeting Corpus

Adam Janin; Don Baron; Jane Edwards; Daniel P. W. Ellis; David Gelbart; Nelson Morgan; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters

We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute (ICSI) in Berkeley, California over the last three years. The corpus contains audio recorded simultaneously from head-worn and table-top microphones, word-level transcripts of meetings, and various metadata on participants, meetings, and hardware. Such a corpus supports work in automatic speech recognition, noise robustness, dialog modeling, prosody, rich transcription, information retrieval, and more. We present details on the contents of the corpus, as well as rationales for the decisions that led to its configuration. The corpus were delivered to the Linguistic Data Consortium (LDC).


international conference on human language technology research | 2001

The meeting project at ICSI

Nelson Morgan; Don Baron; Jane Edwards; Daniel P. W. Ellis; David Gelbart; Adam Janin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke

In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings. The work includes a substantial data collection and transcription effort, and has required a nontrivial degree of infrastructure development. We are undertaking this because the new task area provides a significant challenge to current HLT capabilities, while offering the promise of a wide range of potential applications. In this paper, we give our vision of the task, the challenges it represents, and the current state of our development, with particular attention to automatic transcription.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Automatic speech recognition with an adaptation model motivated by auditory processing

Marcus Holmberg; David Gelbart; Werner Hemmert

The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.


international conference on acoustics, speech, and signal processing | 2003

Meetings about meetings: research at ICSI on speech in multiparty conversations

Nelson Morgan; Don Baron; Sonali Bhagat; Hannah Carvey; Rajdip Dhillon; Jane Edwards; David Gelbart; Adam Janin; Ashley Krupski; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters

In early 2001, we reported (at the Human Language Technology meeting) the early stages of an ICSI (International Computer Science Institute) project on processing speech from meetings (in collaboration with other sites, principally SRI, Columbia, and UW). We report our progress from the first few years of this effort, including: the collection and subsequent release of a 75-meeting corpus (over 70 meeting-hours and up to 16 channels for each meeting); the development of a prosodic database for a large subset of these meetings, and its subsequent use for punctuation and disfluency detection; the development of a dialog annotation scheme and its implementation for a large subset of the meetings; and the improvement of both near-mic and far-mic speech recognition results for meeting speech test sets.


ieee automatic speech recognition and understanding workshop | 2001

Evaluating long-term spectral subtraction for reverberant ASR

David Gelbart; Nelson Morgan

Even a modest degree of room reverberation can greatly increase the difficulty of automatic speech recognition. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from head-mounted microphones. In this paper, we describe experiments with a proposed remedy based on the subtraction of an estimate of the log spectrum from a long-term (e.g., 2 s) analysis window, followed by overlap-add resynthesis. Since the technique is essentially one of enhancement, the processed signal it generates can be used as input for complete speech recognition systems. Here we report results with both the HTK and the SRI Hub-5 recognizer. For simpler recognizer configurations and/or moderate-sized training, the improvements are huge, while moderate improvements are still observed for more complex configurations under a number of conditions.


Speech Communication | 2007

Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition

Marcus Holmberg; David Gelbart; Werner Hemmert

Our notion of how speech is processed is still very much dominated by von Helmholtzs theory of hearing. He deduced that the human inner ear decomposes the spectrum of sound signals. However, physiological recordings of auditory nerve fibers (ANF) showed that the rate-place code, which is thought to transmit spectral information to the brain, is at least complemented by a temporal code. In our paper we challenge the rate-place code using a complex but realistic scenario: speech in noise. We used a detailed model of human auditory processing that closely replicates key aspects of auditory nerve spike trains. We performed quantitative evaluations of coding strategies using standard automatic speech recognition (ASR) tools. Our test data was spoken letters of the whole English alphabet from a variety of speakers, with and without background noise. We evaluated a purely rate-place-based encoding strategy, a temporal strategy based on interspike intervals, and a combination thereof. The results suggest that as few as 4% of the total number of ANFs would be sufficient to code speech information in a rate-place fashion. Rate-place coding performed its best for speech in clean conditions at normal sound level, but broke down at higher-than-normal levels, and failed dramatically in noise at high levels. Low-spontaneous rate fibers improved the rate-place code, mainly for vowels and at higher-than-normal levels. At high speech levels, and in particular in the presence of background noise, combining rate-place coding with the temporal coding strategy greatly improved recognition accuracy. We therefore conclude that the human auditory system does not rely on a rate-place code alone but requires the abundance of fibers for precise temporal coding.


international conference on machine learning | 2004

The 2004 ICSI-SRI-UW meeting recognition system

Chuck Wooters; Nikki Mirghafori; Andreas Stolcke; Tuomo W. Pirinen; Ivan Bulyko; David Gelbart; Martin Graciarena; Scott Otterson; Barbara Peskin; Mari Ostendorf

The paper describes our system devised for recognizing speech in meetings, which was an entry in the NIST Spring 2004 Meeting Recognition Evaluation. This system was developed as a collaborative effort between ICSI, SRI, and UW and was based on SRIs 5xRT Conversational Telephone Speech (CTS) recognizer. The CTS system was adapted to the Meetings domain by adapting the CTS acoustic and language models to the Meeting domain, adding noise reduction and delay-sum array processing for far-field recognition, and adding postprocessing for cross-talk suppression for close-talking microphones. A modified MAP adaptation procedure was developed to make best use of discriminatively trained (MMIE) prior models. These meeting-specific changes yielded an overall 9% and 22% relative improvement as compared to the original CTS system, and 16% and 29% relative improvement as compared to our 2002 Meeting Evaluation system, for the individual-headset and multiple-distant microphones conditions, respectively.


SmartKom | 2006

SmartKom-English: From Robust Recognition to Felicitous Interaction

David Gelbart; John Bryant; Andreas Stolcke; Robert Porzel; Manja Baudis; Nelson Morgan

This chapter describes the English-language SmartKom-Mobile system and related research. We explain the work required to support a second language in SmartKom and the design of the English speech recognizer. We then discuss research carried out on signal processing methods for robust speech recognition and on language analysis using the Embodied Construction Grammar formalism. Finally, the results of human-subject experiments using a novel Wizard and Operator model are analyzed with an eye to creating more felicitous interaction in dialogue systems.


conference of the international speech communication association | 2002

Improving Word Accuracy with Gabor Feature Extraction

Michael Kleinschmidt; David Gelbart


conference of the international speech communication association | 2002

Double the trouble: handling noise and reverberation in far-field automatic speech recognition.

David Gelbart; Nelson Morgan

Collaboration


Dive into the David Gelbart's collaboration.

Top Co-Authors

Avatar

Nelson Morgan

University of California

View shared research outputs
Top Co-Authors

Avatar

Barbara Peskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Chuck Wooters

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Adam Janin

University of California

View shared research outputs
Top Co-Authors

Avatar

Don Baron

University of California

View shared research outputs
Top Co-Authors

Avatar

Jane Edwards

University of California

View shared research outputs
Top Co-Authors

Avatar

Thilo Pfau

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge