Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan L. Higgins is active.

Publication


Featured researches published by Alan L. Higgins.


Digital Signal Processing | 1991

Speaker verification using randomized phrase prompting

Alan L. Higgins; Lawrence G. Bahler; Jack E. Porter

Speaker verification is a viable method of controlling access to computer and communications networks and physical places. The system described here is capable of accurately verifying an individual’s claimed identity from a short sample of his or her speech. Design requirements for the system included the following: (1) an enrollment session should take no longer than 3 min; (2) an acceptance or rejection decision should be reached within 20 s; (3) the probabilities of false rejection and false acceptance should be as low as possible; and (4) prompts should be chosen at random from a large number of possibilities. A number of systems have been designed to satisfy the first three requirements [l-3], but do not adequately address requirement (4). Several systems use a single fixed word or phrase as a “voice password” [4,5]. To address requirement (4), the system described here uses a prompting strategy in which phrases are composed at random using a small vocabulary of words. The spoken phrases are compared with word templates derived from enrollment sessions. This strategy introduces the following difficulty. Words occur in the test material in contexts that did not occur in the enrollment material. The context in which a word is spoken influences its pronunciation through coarticulation, caused by limitations in the movement of the speech articulators. These unmodeled coarticulations contribute to the measured dissimilarity between the input speech and the claimant’s word templates, increasing the likelihood of rejecting valid users. A scoring method called likelihood ratio scoring partially overcomes this difficulty. Likelihood ratio scoring is based on an approximation to the likelihood ratio, the ratio of the probability of the observed input assuming it was spoken by the claimant to the probability of the observed input assuming it was spoken by someone else. To compute this approximation, input speech is compared with the claimant’s templates and with the templates of a set of other speakers (who may be other enrolled users) called the ratio set. All templates are derived from enrollment sessions in which the same scripts were read; therefore, the same contexts are represented in the templates. When a new context is encountered in test material, all raw match scores to templates are affected similarly. The likelihood ratio score, however, by virtue of its definition as a ratio, is relatively unaffected. The template matching algorithm, which is otherwise conventional [6], is modified for computation of the likelihood ratio score. In particular, a specialized syntax is used, and additional data from the template match, other than the “recognized” word sequence, are saved. The syntax is constructed to make certain that the algorithm is quite robust with respect to false starts and nonspeech sounds. A reliable enrollment method, in which speaker-specific word templates are computed automatically using several incremental steps, starting with speaker-pooled templates for male and female speakers, was developed. Two real-time prototype systems were developed using Sun-3 workstations with additional processor boards. One such system was tested, simultaneously collecting a large speaker verification database. A rationale was developed for determining the size of the test required to allow hypotheses regarding the system’s true error rates to be tested with stated confidence levels. The remainder of the paper is organized as follows. The speech material used for verification is described in Section 2. The algorithm is then presented in Sec-


Journal of the Acoustical Society of America | 1992

Real-time speech processing development system

Edwin H. Wrench; Alan L. Higgins

A real-time speech processing development system has a control subsystem (CS) and a recognition subsystem (RS) interconnected by a CS/RS interface. The control subsystem includes a control processor, an operator interface, a user interface, and a control program module for loading any one of a plurality of control programs which employ speech recognition processes. The recognition system RS includes a master processor, speech signal processor, and template matching processors all interconnected on a common bus which communicates with the control subsystem through the mediation of the CS/RS interface. The two-part configuration allows the control subsystem to be accessed by the operator for non-real-time system functions, and the recognition subsystem to be accessed by the user for real-time speech processing functions. An embodiment of a speaker verification system includes template enrollment, template training, recognition by template-concatenation and time alignment, silence and filler template generation, and speaker monitoring modes.


international conference on acoustics, speech, and signal processing | 1993

Voice identification using nearest-neighbor distance measure

Alan L. Higgins; Lawrence G. Bahler; Jack E. Porter

An algorithm for attributing a sample of unconstrained speech to one of several known speakers is described. The algorithm is based on measurement of the similarity of distributions of features extracted from reference speech samples and from the sample to be attributed. The measure of feature distribution similarity employed is not based on any assumed form of the distributions involved. The theoretical basis of the algorithm is examined, and a plausible connection is shown to the divergence statistic of Kullback (1972). Experimental results are presented for the King telephone database and the Switchboard database. The performance of the algorithm is better than that reported for algorithms based on Gaussian modeling and robust discrimination.<<ETX>>


international conference on acoustics, speech, and signal processing | 1994

Improved voice identification using a nearest-neighbor distance measure

Lawrence G. Bahler; Jack E. Porter; Alan L. Higgins

This paper describes recent improvements to an algorithm for identifying an unknown voice from a set of known voices using unconstrained speech material. These algorithms compare the underlying probability distributions of speech utterances using a method that is free of assumptions regarding the form of the distributions (e.g., Gaussian, etc.). In comparing two utterances, the algorithms accumulate minimum inter-frame distances between frames of the utterances. In recognition tests on the Switchboard database, using a closed population of speakers, we show that the new algorithm performs substantially better than the baseline algorithm. The modifications are segment-based scoring, limiting likelihood ratio estimates for robustness and estimating biases associated with reference files.<<ETX>>


Journal of the Acoustical Society of America | 2000

System for voice verification of telephone transactions

William Y. Huang; Lawrence G. Bahler; Alan L. Higgins

A system and a method is disclosed for verifying a voice of a user conducting a telephone transaction. The system and method includes a mechanism for prompting the user to speak in a limited vocabulary. A feature extractor converts the limited vocabulary into a plurality of speech frames. A pre-processor is coupled to the feature extractor for processing the plurality of speech frames to produce a plurality of processed frames. The processing includes frame selection, which eliminates each of the plurality of speech frames having an absence of words. A Viterbi decoder is also coupled to said feature extractor for assigning a frame label to each of the plurality of speech frames to produce a plurality of frame labels. The processed frames and frame labels are then combined to produce a voice model, which includes each of the plurality of frame labels that correspond to the number of plurality of processed frames. A mechanism is also provided for comparing the voice model with the claimants voice model, derived during a previous enrollment session. The voice model also is compared with an alternate voice model set, derived during previous enrollment sessions. The identity claimed is accepted if the voice model matches the claimants voice model better than the alternative voice model set.


Journal of the Acoustical Society of America | 2006

Method for speech processing involving whole-utterance modeling

Alan L. Higgins; Lawrence G. Bahler

A speech verification process involves comparison of enrollment and test speech data and an improved method of comparing the data is disclosed, wherein segmented frames of speech are analyzed jointly, rather than independently. The enrollment and test speech are both subjected to a feature extraction process to derive fixed-length feature vectors, and the feature vectors are compared, using a linear discriminant analysis and having no dependence upon the order of the words spoken or the speaking rate. The discriminant analysis is made possible, despite a relatively high dimensionality of the feature vectors, by a mathematical procedure provided for finding an eigenvector to simultaneously diagonalize the between-speaker and between-channel covariances of the enrollment and test data.


Journal of the Acoustical Society of America | 1995

System and method for passive voice verification in a telephone network

Lawrence G. Bahler; Alan L. Higgins


Archive | 1998

Noise suppression and channel equalization preprocessor for speech and speaker recognizers: method and apparatus

Alan L. Higgins; Steven F. Boll; Jack E. Porter


Archive | 1992

Keyword recognition system and method using template concantenation model

Alan L. Higgins; Robert E. Wohlford; Lawrence G. Bahler


Journal of the Acoustical Society of America | 1995

Speaker verifier using nearest-neighbor distance measure

Alan L. Higgins

Collaboration


Dive into the Alan L. Higgins's collaboration.

Top Co-Authors

Avatar

William Y. Huang

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge