Is this you? Create Your Porfile

Elliot Singer

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elliot Singer is active.

Explore More

Publication

Featured researches published by Elliot Singer.

Computer Speech & Language | 2006

Support vector machines for speaker and language recognition

William M. Campbell; Joseph P. Campbell; Douglas A. Reynolds; Elliot Singer; Pedro A. Torres-Carrasquillo

Abstract Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping. We consider the application of SVMs to speaker and language recognition. A key part of our approach is the use of a kernel that compares sequences of feature vectors and produces a measure of similarity. Our sequence kernel is based upon generalized linear discriminants. We show that this strategy has several important properties. First, the kernel uses an explicit expansion into SVM feature space—this property makes it possible to collapse all support vectors into a single model vector and have low computational complexity. Second, the SVM builds upon a simpler mean-squared error classifier to produce a more accurate system. Finally, the system is competitive and complimentary to other approaches, such as Gaussian mixture models (GMMs). We give results for the 2003 NIST speaker and language evaluations of the system and also show fusion with the traditional GMM approach.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1986

A new application of adaptive noise cancellation

William Harrison; Jae S. Lim; Elliot Singer

A new application of Widrows adaptive noise cancellation (ANC) is presented in this paper. Specifically, the method is applied to the case where an acoustic barrier exists between the primary and reference microphones. By updating the coefficients of the noise estimation filter only during silence, it is shown that ANC can provide substantial noise reduction with little speech distortion even when the acoustic barrier provides only moderate attenuation of acoustic signals. The use of the modified ANC method is evaluated using an oxygen facemask worn by fighter aircraft pilots. Experiments demonstrate that if a noise field is created using a single source, 11 dB signal-to-noise ratio improvements can be achieved by attaching a reference microphone to the exterior of the facemask. The length of the ANC filter required for this particular environment is only 50 points.

international conference on acoustics, speech, and signal processing | 1994

Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling

Marc A. Zissman; Elliot Singer

The paper compares the performance of four approaches to automatic language identification (LID) of telephone speech messages: Gaussian mixture model classification (GMM), language-independent phoneme recognition followed by language-dependent language modeling (PRLM), parallel PRLM (PRLM-P), and language-dependent parallel phoneme recognition (PPR). These approaches span a wide range of training requirements and levels of recognition complexity. All approaches were tested on the development test subset of the OGI multi-language telephone speech corpus. Generally, system performance was directly related to system complexity, with PRLM-P and PPR performing best. On 45 second test utterances, average two language, closed-set, forced-choice classification performance reached 94.5% correct. The best 10 language, closed-set, forced-choice performance was 79.2% correct.<<ETX>>

international conference on acoustics, speech, and signal processing | 2001

Speaker indexing in large audio databases using anchor models

Douglas E. Sturim; Douglas A. Reynolds; Elliot Singer; Joseph P. Campbell

Introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short of the state-of-the-art Gaussian mixture model with universal background model (GMM-UBM) system. However, it is further shown that its computational efficiency lends itself to speaker indexing for searching large audio databases for desired speakers. Here, excessive computation may prohibit the use of the GMM-UBM recognition system. Finally, the paper presents a method for cascading anchor model and GMM-UBM detectors for speaker indexing. This approach benefits from the efficiency of anchor modeling and high accuracy of GMM-UBM recognition.

international conference on acoustics, speech, and signal processing | 2010

The MITLL NIST LRE 2009 language recognition system

Pedro A. Torres-Carrasquillo; Elliot Singer; Terry P. Gleason; Alan McCree; Douglas A. Reynolds; Fred Richardson; Douglas E. Sturim

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation

William M. Campbell; Terry P. Gleason; Jiri Navratil; Douglas A. Reynolds; Wade Shen; Elliot Singer; Pedro A. Torres-Carrasquillo

This paper presents a description of the MIT Lincoln Laboratory submissions to the 2005 NIST Language Recognition Evaluation (LRE05). As was true in 2003, the 2005 submissions were combinations of core cepstral and phonotactic recognizers whose outputs were fused to generate final scores. For the 2005 evaluation, Lincoln Laboratory had five submissions built upon fused combinations of six core systems. Major improvements included the generation of phone streams using lattices, SVM-based language models using lattice-derived phonotactics, and binary tree language models. In addition, a development corpus was assembled that was designed to test robustness to unseen languages and sources. Language recognition trends based on NIST evaluations conducted since 1996 show a steady improvement in language recognition performance

international conference on acoustics, speech, and signal processing | 1992

A speech recognizer using radial basis function neural networks in an HMM framework

Elliot Singer; R.P. Lippman

A high performance speaker-independent isolated-word speech recognizer was developed which combines hidden Markov models (HMMs) and radial basis function (RBF) neural networks. RBF networks in this recognizer use discriminant training techniques to estimate Bayesian probabilities for each speech frame while HMM decoders estimate overall word likelihood scores for network outputs. RBF training is performed after the HMM recognizer has automatically segmented training tokens using forced Viterbi alignment. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid system was based. The error rate was also lower than that of a tied-mixture HMM recognizer with the same number of centers. These results demonstrate that RBF networks can be successfully incorporated in hybrid recognizers and suggest that they may be capable of good performance with fewer parameters than required by Gaussian mixture classifiers.<<ETX>>

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Experiments with Lattice-based PPRLM Language Identification

Wade Shen; William M. Campbell; Terry P. Gleason; Douglas A. Reynolds; Elliot Singer

In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05, the PPRLM-lattice sub-system presented here achieved a 30s/primary condition EER of 4.87%, making it the single best performing recognizer developed by the MIT-LL team. Details of implementation issues and experimental results are presented and interactions with backend score normalization are explored

Odyssey 2016 | 2016

The MITLL NIST LRE 2015 Language Recognition System.

Pedro A. Torres-Carrasquillo; Najim Dehak; Elizabeth Godoy; Douglas A. Reynolds; Fred Richardson; Stephen Shum; Elliot Singer; Douglas E. Sturim

Abstract : In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.

international conference on acoustics, speech, and signal processing | 1993

Hybrid neural-network/HMM approaches to wordspotting

Richard P. Lippmann; Elliot Singer

Two approaches to integrating neural network and hidden Markov model (HMM) algorithms into one hybrid wordspotter are being explored. One approach uses neural network secondary testing to analyze putative hits produced by a high-performance HMM wordspotter. This has provided consistent but small reductions in the number of false alarms required to obtain a given detection rate. In one set of experiments using the NIST Road Rally database, secondary testing reduced the false alarm rate by an average of 16.4%. A second approach uses radial basis function (RBF) neural networks to produce local machine scores for a Viterbi decoder. Network weights and RBF centers are trained at the word level to produce a high score for the correct keyword hits and a low score for false alarms generated by nonkeyword speech. Preliminary experiments using this approach are exploring a constructive approach which adds RBF centers to model nonkeyword near-misses and a cost function which attempts to maximize directly average detection accuracy over a specified range of false alarm rates.<<ETX>>

Explore More