Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chuck Wooters is active.

Publication


Featured researches published by Chuck Wooters.


international conference on acoustics, speech, and signal processing | 2003

The ICSI Meeting Corpus

Adam Janin; Don Baron; Jane Edwards; Daniel P. W. Ellis; David Gelbart; Nelson Morgan; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters

We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute (ICSI) in Berkeley, California over the last three years. The corpus contains audio recorded simultaneously from head-worn and table-top microphones, word-level transcripts of meetings, and various metadata on participants, meetings, and hardware. Such a corpus supports work in automatic speech recognition, noise robustness, dialog modeling, prosody, rich transcription, information retrieval, and more. We present details on the contents of the corpus, as well as rationales for the decisions that led to its configuration. The corpus were delivered to the Linguistic Data Consortium (LDC).


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Acoustic Beamforming for Speaker Diarization of Meetings

Xavier Anguera; Chuck Wooters; Javier Hernando

When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.


international conference on acoustics, speech, and signal processing | 1992

CDNN: a context dependent neural network for continuous speech recognition

Nelson Morgan; Chuck Wooters; Steve Renals

A series of theoretical and experimental results have suggested that multilayer perceptrons (MLPs) are an effective family of algorithms for the smooth estimate of highly dimensioned probability density functions that are useful in continuous speech recognition. All of these systems have exclusively used context-independent phonetic models, in the sense that the probabilities or costs are estimated for simple speech units such as phonemes or words, rather than biphones or triphones. Numerous conventional systems based on hidden Markov models (HMMs) have been reported that use triphone or triphone like context-dependent models. In one case the outputs of many context-dependent MLPs (one per context class) were used to help choose the best sentence from the N best sentences as determined by a context-dependent HMM system. It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs.<<ETX>>


international conference on acoustics, speech, and signal processing | 1995

Using a stochastic context-free grammar as a language model for speech recognition

Daniel Jurafsky; Chuck Wooters; Jonathan Segal; Andreas Stolcke; Eric Fosler; G. Tajchaman; Nelson Morgan

This paper describes a number of experiments in adding new grammatical knowledge to the Berkeley Restaurant Project (BeRP), our medium-vocabulary (1300 word), speaker-independent, spontaneous continuous-speech understanding system. We describe an algorithm for using a probabilistic Earley parser and a stochastic context-free grammar (SCFG) to generate word transition probabilities at each frame for a Viterbi decoder. We show that using an SCFG as a language model improves the word error rate from 34.6% (bigram) to 29.6% (SCFG), and the semantic sentence recognition error from from 39.0% (bigram) to 34.1% (SCFG). In addition, we get a further reduction to 28.8% word error by mixing the bigram and SCFG LMs. We also report on our preliminary results from using discourse-context information in the LM.


IEEE Transactions on Computers | 2007

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

José M. Pardo; Xavier Anguera; Chuck Wooters

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, which includes speaker diarization together with the annotation of sentence boundaries and the elimination of speaker disfluencies. The sub-area of speaker diarization attempts to identify the number of participants in a meeting and create a list of speech time intervals for each such participant. In this paper, we analyze the correlation between signals coming from multiple microphones and propose an improved method for carrying out speaker diarization for meetings with multiple distant microphones. The proposed algorithm makes use of acoustic information and information from the delays between signals coming from the different sources. Using this procedure, we were able to achieve state-of-the-art performance in the NIST spring 2006 rich transcription evaluation, improving the Diarization Error Rate (DER) by 15% to 20% relative to previous systems.


international conference on machine learning | 2005

Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Frantisek Grezl; Adam Janin; Arindam Mandal; Barbara Peskin; Chuck Wooters; Jing Zheng

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This years system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last years evaluation system. Results on lecture data are comparable to the best reported results for that task.


international conference on acoustics, speech, and signal processing | 2003

Meetings about meetings: research at ICSI on speech in multiparty conversations

Nelson Morgan; Don Baron; Sonali Bhagat; Hannah Carvey; Rajdip Dhillon; Jane Edwards; David Gelbart; Adam Janin; Ashley Krupski; Barbara Peskin; Thilo Pfau; Elizabeth Shriberg; Andreas Stolcke; Chuck Wooters

In early 2001, we reported (at the Human Language Technology meeting) the early stages of an ICSI (International Computer Science Institute) project on processing speech from meetings (in collaboration with other sites, principally SRI, Columbia, and UW). We report our progress from the first few years of this effort, including: the collection and subsequent release of a 75-meeting corpus (over 70 meeting-hours and up to 16 channels for each meeting); the development of a prosodic database for a large subset of these meetings, and its subsequent use for punctuation and disfluency detection; the development of a dialog annotation scheme and its implementation for a large subset of the meetings; and the improvement of both near-mic and far-mic speech recognition results for meeting speech test sets.


ieee automatic speech recognition and understanding workshop | 2007

A fast-match approach for robust, faster than real-time speaker diarization

Yan Huang; Oriol Vinyals; Gerald Friedland; Christian A. Müller; Nikki Mirghafori; Chuck Wooters

During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-trans-form based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.


international conference on machine learning | 2005

Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

Xavier Anguera; Chuck Wooters; Barbara Peskin; Mateu Aguilo

In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.


international conference on machine learning | 2006

Speaker diarization for multi-microphone meetings using only between-channel differences

Jose M. Pardo; Xavier Anguera; Chuck Wooters

We present a method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a cross-correlation between the available channels. The method is robust against the number of speakers (which is unknown to the system), the number of channels, and the acoustics of the room. The delays between channels are processed and clustered to obtain a segmentation hypothesis. We have obtained a 31.2% diarization error rate (DER) for the NIST´s RT05s MDM conference room evaluation set. For a MDM subset of NIST´s RT04s development set, we have obtained 36.93% DER and 35.73% DER*. Comparing those results with the ones presented by Ellis and Liu [8], who also used between-channels differences for the same data, we have obtained 43% relative improvement in the error rate.

Collaboration


Dive into the Chuck Wooters's collaboration.

Top Co-Authors

Avatar

Xavier Anguera

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Nelson Morgan

University of California

View shared research outputs
Top Co-Authors

Avatar

Barbara Peskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Adam Janin

University of California

View shared research outputs
Top Co-Authors

Avatar

Jose M. Pardo

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Javier Hernando

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

David Gelbart

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge