Is this you? Create Your Porfile

Wade Shen

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wade Shen is active.

Explore More

Publication

Featured researches published by Wade Shen.

ieee automatic speech recognition and understanding workshop | 2009

Query-by-example spoken term detection using phonetic posteriorgram templates

Timothy J. Hazen; Wade Shen; Christopher M. White

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.

IEEE Signal Processing Magazine | 2009

Forensic speaker recognition

Joseph P. Campbell; Wade Shen; William M. Campbell; Reva Schwartz; Jean-François Bonastre; Driss Matrouf

Looking at the different points highlighted in this article, we affirm that forensic applications of speaker recognition should still be taken under a necessary need for caution. Disseminating this message remains one of the most important responsibilities of speaker recognition researchers.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Speaker Verification Using Support Vector Machines and High-Level Features

William M. Campbell; Joseph P. Campbell; Terry P. Gleason; Douglas A. Reynolds; Wade Shen

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of n-gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation

William M. Campbell; Terry P. Gleason; Jiri Navratil; Douglas A. Reynolds; Wade Shen; Elliot Singer; Pedro A. Torres-Carrasquillo

This paper presents a description of the MIT Lincoln Laboratory submissions to the 2005 NIST Language Recognition Evaluation (LRE05). As was true in 2003, the 2005 submissions were combinations of core cepstral and phonotactic recognizers whose outputs were fused to generate final scores. For the 2005 evaluation, Lincoln Laboratory had five submissions built upon fused combinations of six core systems. Major improvements included the generation of phone streams using lattices, SVM-based language models using lattice-derived phonotactics, and binary tree language models. In addition, a development corpus was assembled that was designed to test robustness to unseen languages and sources. Language recognition trends based on NIST evaluations conducted since 1996 show a steady improvement in language recognition performance

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Experiments with Lattice-based PPRLM Language Identification

Wade Shen; William M. Campbell; Terry P. Gleason; Douglas A. Reynolds; Elliot Singer

In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05, the PPRLM-lattice sub-system presented here achieved a 30s/primary condition EER of 4.87%, making it the single best performing recognizer developed by the MIT-LL team. Details of implementation issues and experimental results are presented and interactions with backend score normalization are explored

international conference on acoustics, speech, and signal processing | 2010

A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models

Nancy F. Chen; Wade Shen; Joseph P. Campbell

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by 7.5% (relative). Our proposed dialect discriminating biphone system achieves similar performance to a baseline all-biphone system despite using 25% fewer biphone models. In addition, our system complements PRLM (Phone Recognition followed by Language Modeling), verified by obtaining relative gains of 15–29% when fused with PRLM. Our work is an encouraging first step towards a linguistically-informative dialect recognition system, with potential applications in forensic phonetics, accent training, and language learning.

international conference on acoustics, speech, and signal processing | 2007

The MIT-LL/IBM 2006 Speaker Recognition System: High-Performance Reduced-Complexity Recognition

William M. Campbell; Douglas E. Sturim; Wade Shen; Douglas A. Reynolds; Jiri Navratil

Many powerful methods for speaker recognition have been introduced in recent years - high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon cepstral features and different linguistic tiers of high-level features. With enough enrollment data, this approach produced dramatic error rate reductions and showed conceptually that better performance was attainable. A drawback in this approach is that many high-level systems were being run independently requiring significant computational complexity and resources. In 2006, MIT Lincoln Laboratory focused on a new system architecture which emphasized reduced complexity. This system was a carefully selected mixture of high-level techniques, new classifier methods, and novel channel compensation techniques. This new system has excellent accuracy and has substantially reduced complexity. The performance and computational aspects of the system are detailed on a NIST 2006 SRE task.

international conference on acoustics, speech, and signal processing | 2005

Measuring human readability of machine generated text: three case studies in speech recognition and machine translation

Douglas A. Jones; Edward Gibson; Wade Shen; Neil Granoien; Martha Herzog; Douglas A. Reynolds; Clifford J. Weinstein

We present highlights from three experiments that test the readability of current state-of-the art system output from: (1) an automated English speech-to-text (SST) system; (2) a text-based Arabic-to-English machine translation (MT) system; and (3) an audio-based Arabic-to-English MT process. We measure readability in terms of reaction time and passage comprehension in each case, applying standard psycholinguistic testing procedures and a modified version of the standard defense language proficiency test for Arabic called the DLPT*. We learned that: (1) subjects are slowed down by about 25% when reading system STT output; (2) text-based MT systems enable an English speaker to pass Arabic Level 2 on the DLPT*; and (3) audio-based MT systems do not enable English speakers to pass Arabic Level 2. We intend for these generic measures of readability to predict performance of more application-specific tasks.

Archive | 2008

Automatic Language Recognition Via Spectral and Token Based Approaches

Douglas A. Reynolds; William M. Campbell; Wade Shen; Elliot Singer

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.

international conference on acoustics, speech, and signal processing | 2008

Improved GMM-based language recognition using constrained MLLR transforms

Wade Shen; Douglas A. Reynolds

In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.

Explore More