Is this you? Create Your Porfile

Walter D. Andrews

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Walter D. Andrews is active.

Explore More

Publication

Featured researches published by Walter D. Andrews.

international conference on acoustics, speech, and signal processing | 2003

The SuperSID project: exploiting high-level information for high-accuracy speaker recognition

Douglas A. Reynolds; Walter D. Andrews; Joseph P. Campbell; Jiri Navratil; Barbara Peskin; André Gustavo Adami; Qin Jin; David Klusacek; Joy S. Abramson; Radu Mihaescu; John J. Godfrey; Douglas A. Jones; Bing Xiang

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples that such high-level information can be used successfully in automatic speaker recognition systems and has the potential to improve accuracy and add robustness. For the 2002 JHU CLSP summer workshop, the SuperSID project (http://www.clsp.jhu.edu/ws2002/groups/supersid/) was undertaken to exploit these high-level information sources and dramatically increase speaker recognition accuracy on a defined NIST evaluation corpus and task. The paper provides an overview of the structure, data, task, tools, and accomplishments of this project. Wide ranging approaches using pronunciation models, prosodic dynamics, pitch and duration features, phone streams, and conversational interactions were explored and developed. We show how these novel features and classifiers indeed provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST extended data task to 0.2% - a 71% relative reduction in error over the previous state of the art.

international conference on acoustics, speech, and signal processing | 2002

Gender-dependent phonetic refraction for speaker recognition

Walter D. Andrews; Mary A. Kohler; Joseph P. Campbell; John J. Godfrey; Jaime Hernandez-Cordero

This paper describes improvements to an innovative high-performance speaker recognition system. Recent experiments showed that with sufficient training data phone strings from multiple languages are exceptional features for speaker recognition. The prototype phonetic speaker recognition system used phone sequences from six languages to produce an equal error rate of 11.5% on Switchboard-I audio files. The improved system described in this paper reduces the equal error rate to less then 4%. This is accomplished by incorporating gender-dependent phone models, pre-processing the speech files to remove cross-talk, and developing more sophisticated fusion techniques for the multi-language likelihood scores.

international conference on acoustics, speech, and signal processing | 2003

Phonetic speaker recognition using maximum-likelihood binary-decision tree models

Jiri Navratil; Qin Jin; Walter D. Andrews; Joseph P. Campbell

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. The paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particularly bigrams) by exploiting statistical dependencies within a longer sequence window without exponentially increasing the model complexity, as is the case with n-grams. Two ways of dealing with data sparsity are also studied; namely, model adaptation and a recursive bottom-up smoothing of symbol distributions. Results obtained under a variety of experimental conditions using the NIST 2001 Speaker Recognition Extended Data Task indicate consistent improvements in equal-error rate performance as compared to standard bigram models. The described approach confirms the relevance of long phonetic context in phonetic speaker recognition and represents an intermediate stage between short phone context and word-level modeling without the need for any lexical knowledge, which suggests its language independence.

international conference on acoustics, speech, and signal processing | 2003

Combining cross-stream and time dimensions in phonetic speaker recognition

Qin Jin; Jiri Navratil; Douglas A. Reynolds; Joseph P. Campbell; Walter D. Andrews; Joy S. Abramson

Recent studies show that phonetic sequences from multiple languages can provide effective features for speaker recognition. So far, only pronunciation dynamics in the time dimension, i.e., n-gram modeling on each of the phone sequences, have been examined. In the JHU 2002 Summer Workshop, we explored modeling the statistical pronunciation dynamics across streams in multiple languages (cross-stream dimension) as an additional component to the time dimension. We found that bigram modeling in the cross-stream dimension achieves improved performance over that in the time dimension on the NIST 2001 Speaker Recognition Evaluation Extended Data Task. Moreover, a linear combination of information from both dimensions at the score level further improves the performance, showing that the two dimensions contain complementary information.

asilomar conference on signals, systems and computers | 2001

Phonetic speaker recognition

Mary A. Kohler; Walter D. Andrews; Joseph P. Campbell; J. Herndndez-Cordero

This paper introduces a novel language-independent speaker-recognition system based on differences in dynamic realization of phonetic features (i.e., pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from six languages to perform text independent speaker recognition. All experiments were performed on the NIST 2001 Speaker Recognition Evaluation Extended Data Task. Recognition results are provided for unigram, bigram, and trigram models. Performance for each of the three models is examined for phones from each individual language and the final multilanguage fused system. Additional fusion experiments demonstrate that speaker recognition capability is maintained even without phonetic information in the language of the speaker.

conference of the international speech communication association | 2001

Phonetic speaker recognition.

Walter D. Andrews; Mary A. Kohler; Joseph P. Campbell

Odyssey | 2001

Phonetic, idiolectal and acoustic speaker recognition.

Walter D. Andrews; Mary A. Kohler; Joseph P. Campbell; John J. Godfrey

Archive | 2002

Method of and device for phone-based speaker recognition

Mary A. Kohler; Walter D. Andrews; Joseph P. Campbell

language resources and evaluation | 2006

The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research

Christopher Cieri; Walter D. Andrews; Joseph P. Campbell; George R. Doddington; John J. Godfrey; Shudong Huang; Mark Liberman; Alvin F. Martin; Hirotaka Nakasone; Mark A. Przybocki; Kevin Walker

Archive | 2003

Exploiting high-level information for high-accuracy speaker recognition

Douglas A. Reynolds; Walter D. Andrews; Joseph P. Campbell; Jiri Navratil; Barbara Peskin; André Gustavo Adami; Jin Qin; David Klusacek; Joy S. Abramson; Radu Mihaescu; Js Godfrey; D. Brian Jones; Bing Xiang

Explore More