Is this you? Create Your Porfile

William M. Campbell

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William M. Campbell is active.

Explore More

Publication

Featured researches published by William M. Campbell.

IEEE Signal Processing Letters | 2006

Support vector machines using GMM supervectors for speaker verification

William M. Campbell; Douglas E. Sturim; Douglas A. Reynolds

Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.

international conference on acoustics, speech, and signal processing | 2006

SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

William M. Campbell; Douglas E. Sturim; Douglas A. Reynolds; Alex Solomonoff

Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique

international conference on acoustics, speech, and signal processing | 2002

Generalized linear discriminant sequence kernels for speaker recognition

William M. Campbell

Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather than a probability at the frame level. We introduce a novel sequence kernel derived from generalized linear discriminants. The kernel has several advantages. First, the kernel uses an explicit expansion into “feature space”-this property allows all of the support vectors to be collapsed into a single vector creating a small speaker model. Second, the kernel retains the computational advantage of generalized linear discriminants trained using mean-squared error training. Finally, the kernel shows dramatic reductions in equal error rates over standard mean-squared error training in matched and mismatched conditions on a NIST speaker recognition task.

IEEE Transactions on Speech and Audio Processing | 2002

Speaker recognition with polynomial classifiers

William M. Campbell; Khaled T. Assaleh; Charles C. Broun

Modern speaker recognition applications require high accuracy at low complexity. We propose the use of a polynomial-based classifier to achieve these objectives. This approach has several advantages. First, polynomial classifier scoring yields a system which is highly computationally scalable with the number of speakers. Second, a new training algorithm is proposed which is discriminative, handles large data sets, and has low memory usage. Third, the output of the polynomial classifier is easily incorporated into a statistical framework allowing it to be combined with other techniques such as hidden Markov models. Results are given for the application of the new methods to the YOHO speaker recognition database.

international conference on acoustics, speech, and signal processing | 2008

A covariance kernel for svm language recognition

William M. Campbell

Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of a universal background model (UBM) GMM. This work proposes a novel extension to this idea by extending the supervector framework to the covariances of the UBM. We demonstrate a new SVM kernel including this covariance structure. In addition, we propose a method for pushing SVM model parameters back to GMM models. These GMM models can be used as an alternate form of scoring. The new approach is demonstrated on a fourteen language task with substantial performance improvements over prior techniques.

international conference on acoustics, speech, and signal processing | 2008

Language recognition with discriminative keyword selection

Fred Richardson; William M. Campbell

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order N-grams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.

international conference on acoustics, speech, and signal processing | 2007

Language Recognition with Word Lattices and Support Vector Machines

William M. Campbell; Fred Richardson; Douglas A. Reynolds

Language recognition is typically performed with methods that exploit phonotactics - a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we modify this approach in two distinct ways. First, we replace the phone tokenizer by a powerful speech-to-text system. Second, we use a discriminative support vector machine for language modeling. Our goals are twofold. First, we explore the ability of a single speech-to-text system to distinguish multiple languages. Second, we fuse the new system with an SVM PRLM system to see if it complements current approaches. Experiments on the 2005 NIST language recognition corpus show the new word system accomplishes these goals and has significant potential for language recognition.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Exploiting Nonacoustic Sensors for Speech Encoding

Thomas F. Quatieri; Kevin Brady; Dave Messing; Joseph P. Campbell; William M. Campbell; Michael S. Brandstein; Clifford J. Weinstein; John D. Tardelli; Paul D. Gatewood

The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ears insensitivity to fine spectral detail in this frequency region.

international conference on acoustics, speech, and signal processing | 2005

The 2004 MIT Lincoln Laboratory speaker recognition system

Douglas A. Reynolds; William M. Campbell; Terry T. Gleason; Carl Quillen; Douglas E. Sturim; Pedro A. Torres-Carrasquillo; André Gustavo Adami

The MIT Lincoln Laboratory submission for the 2004 NIST speaker recognition evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian mixture models, support vector machines and N-gram language models and were combined using a single layer perceptron fuser. The 2004 SRE used a new multi-lingual, multi-channel speech corpus that provided a challenging speaker detection task for the above systems. We describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation

William M. Campbell; Terry P. Gleason; Jiri Navratil; Douglas A. Reynolds; Wade Shen; Elliot Singer; Pedro A. Torres-Carrasquillo

This paper presents a description of the MIT Lincoln Laboratory submissions to the 2005 NIST Language Recognition Evaluation (LRE05). As was true in 2003, the 2005 submissions were combinations of core cepstral and phonotactic recognizers whose outputs were fused to generate final scores. For the 2005 evaluation, Lincoln Laboratory had five submissions built upon fused combinations of six core systems. Major improvements included the generation of phone streams using lattices, SVM-based language models using lattice-derived phonotactics, and binary tree language models. In addition, a development corpus was assembled that was designed to test robustness to unseen languages and sources. Language recognition trends based on NIST evaluations conducted since 1996 show a steady improvement in language recognition performance

Explore More