Is this you? Create Your Porfile

Joseph P. Campbell

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joseph P. Campbell is active.

Explore More

Publication

Featured researches published by Joseph P. Campbell.

Computer Speech & Language | 2006

Support vector machines for speaker and language recognition

William M. Campbell; Joseph P. Campbell; Douglas A. Reynolds; Elliot Singer; Pedro A. Torres-Carrasquillo

Abstract Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping. We consider the application of SVMs to speaker and language recognition. A key part of our approach is the use of a kernel that compares sequences of feature vectors and produces a measure of similarity. Our sequence kernel is based upon generalized linear discriminants. We show that this strategy has several important properties. First, the kernel uses an explicit expansion into SVM feature space—this property makes it possible to collapse all support vectors into a single model vector and have low computational complexity. Second, the SVM builds upon a simpler mean-squared error classifier to produce a more accurate system. Finally, the system is competitive and complimentary to other approaches, such as Gaussian mixture models (GMMs). We give results for the 2003 NIST speaker and language evaluations of the system and also show fusion with the traditional GMM approach.

Archive | 2005

The NIST speaker recognition evaluation program

Alvin F. Martin; Mark A. Przybocki; Joseph P. Campbell

The National Institute of Standards and Technology (NIST) has coordinated annual scientific evaluations of text-independent speaker recognition since 1996. These evaluations aim to provide important contributions to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text-independent speaker recognition. To this end, the evaluations are designed to be simple, fully supported, accessible and focused on core technology issues. The evaluations have focused primarily on speaker detection in the context of conversational telephone speech. More recent evaluations have also included related tasks, such as speaker segmentation, and have used data in addition to conversational telephone speech. The evaluations are designed to foster research progress, with the objectives of:

international conference on acoustics, speech, and signal processing | 2003

The SuperSID project: exploiting high-level information for high-accuracy speaker recognition

Douglas A. Reynolds; Walter D. Andrews; Joseph P. Campbell; Jiri Navratil; Barbara Peskin; André Gustavo Adami; Qin Jin; David Klusacek; Joy S. Abramson; Radu Mihaescu; John J. Godfrey; Douglas A. Jones; Bing Xiang

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples that such high-level information can be used successfully in automatic speaker recognition systems and has the potential to improve accuracy and add robustness. For the 2002 JHU CLSP summer workshop, the SuperSID project (http://www.clsp.jhu.edu/ws2002/groups/supersid/) was undertaken to exploit these high-level information sources and dramatically increase speaker recognition accuracy on a defined NIST evaluation corpus and task. The paper provides an overview of the structure, data, task, tools, and accomplishments of this project. Wide ranging approaches using pronunciation models, prosodic dynamics, pitch and duration features, phone streams, and conversational interactions were explored and developed. We show how these novel features and classifiers indeed provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST extended data task to 0.2% - a 71% relative reduction in error over the previous state of the art.

international conference on acoustics, speech, and signal processing | 1995

Testing with the YOHO CD-ROM voice verification corpus

Joseph P. Campbell

A standard database for testing voice verification systems, called YOHO, is now available from the Linguistic Data Consortium (LDC). The purpose of this database is to enable research, spark competition, and provide a means for comparative performance assessments between various voice verification systems. A test plan is presented for the suggested use of the LDCs YOHO CD-ROM for testing voice verification systems. This plan is based upon ITTs voice verification test methodology as described by Higgins, et al. (1992), but differs slightly in order to match the LDCs CD-ROM version of YOHO and to accommodate different systems. Test results of several algorithms using YOHO are also presented.

international conference on acoustics speech and signal processing | 1999

Corpora for the evaluation of speaker recognition systems

Joseph P. Campbell; Douglas A. Reynolds

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corporas salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.

international conference on acoustics, speech, and signal processing | 2001

Speaker indexing in large audio databases using anchor models

Douglas E. Sturim; Douglas A. Reynolds; Elliot Singer; Joseph P. Campbell

Introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short of the state-of-the-art Gaussian mixture model with universal background model (GMM-UBM) system. However, it is further shown that its computational efficiency lends itself to speaker indexing for searching large audio databases for desired speakers. Here, excessive computation may prohibit the use of the GMM-UBM recognition system. Finally, the paper presents a method for cascading anchor model and GMM-UBM detectors for speaker indexing. This approach benefits from the efficiency of anchor modeling and high accuracy of GMM-UBM recognition.

IEEE Signal Processing Magazine | 2009

Forensic speaker recognition

Joseph P. Campbell; Wade Shen; William M. Campbell; Reva Schwartz; Jean-François Bonastre; Driss Matrouf

Looking at the different points highlighted in this article, we affirm that forensic applications of speaker recognition should still be taken under a necessary need for caution. Disseminating this message remains one of the most important responsibilities of speaker recognition researchers.

international conference on acoustics, speech, and signal processing | 2002

Gender-dependent phonetic refraction for speaker recognition

Walter D. Andrews; Mary A. Kohler; Joseph P. Campbell; John J. Godfrey; Jaime Hernandez-Cordero

This paper describes improvements to an innovative high-performance speaker recognition system. Recent experiments showed that with sufficient training data phone strings from multiple languages are exceptional features for speaker recognition. The prototype phonetic speaker recognition system used phone sequences from six languages to produce an equal error rate of 11.5% on Switchboard-I audio files. The improved system described in this paper reduces the equal error rate to less then 4%. This is accomplished by incorporating gender-dependent phone models, pre-processing the speech files to remove cross-talk, and developing more sophisticated fusion techniques for the multi-language likelihood scores.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Speaker Verification Using Support Vector Machines and High-Level Features

William M. Campbell; Joseph P. Campbell; Terry P. Gleason; Douglas A. Reynolds; Wade Shen

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of n-gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.

international conference on acoustics, speech, and signal processing | 2003

Phonetic speaker recognition using maximum-likelihood binary-decision tree models

Jiri Navratil; Qin Jin; Walter D. Andrews; Joseph P. Campbell

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. The paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particularly bigrams) by exploiting statistical dependencies within a longer sequence window without exponentially increasing the model complexity, as is the case with n-grams. Two ways of dealing with data sparsity are also studied; namely, model adaptation and a recursive bottom-up smoothing of symbol distributions. Results obtained under a variety of experimental conditions using the NIST 2001 Speaker Recognition Extended Data Task indicate consistent improvements in equal-error rate performance as compared to standard bigram models. The described approach confirms the relevance of long phonetic context in phonetic speaker recognition and represents an intermediate stage between short phone context and word-level modeling without the need for any lexical knowledge, which suggests its language independence.

Explore More