Kevin R. Farrell
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kevin R. Farrell.
IEEE Transactions on Speech and Audio Processing | 1994
Kevin R. Farrell; Richard J. Mammone; Khaled T. Assaleh
An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed- and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLPs), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval. >
Pattern Recognition | 2002
Kevin R. Farrell; Richard J. Mammone
Speaker recognition refers to the concept of recognizing a speaker by his/her voice or speech samples. Some of the important applications of speaker recognition include customer verification for bank transactions, access to bank accounts through telephones, control on the use of credit cards, and for security purposes in the army, navy and airforce. This paper is purely a tutorial that presents a review of the classifier based methods used for speaker recognition. Both unsupervised and supervised classifiers are described. In addition, practical approaches that utilize diversity, redundancy and fusion strategies are discussed with the aim of improving performance.
military communications conference | 1992
Khaled T. Assaleh; Kevin R. Farrell; Richard J. Mammone
A modulation model representation of a signal is used to provide a convenient form for subsequent analysis. The modulation model is formed by estimating the instantaneous frequency and bandwidth using autoregressive spectrum analysis. In particular, the instantaneous bandwidth and derivative of the instantaneous frequency prove to be valuable parameters in estimating modulation type. This method performed extremely well for input carrier-to-noise ratios as low as 15 dB. Additionally, since the autoregressive fit to the frequency spectrum is second order, the autoregressive polynomials coefficients and corresponding roots can be computed with closed-form expressions. Thus, the method is computationally efficient.<<ETX>>
international conference on acoustics, speech, and signal processing | 1995
Kevin R. Farrell
A new system is presented for text-dependent speaker verification. The system uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers. Hence, both intraspeaker and interspeaker information are utilized in the final decision. The distortion and discriminant-based classifiers are based on dynamic time warping (DTW) and the neural tree network (NTN), respectively. The system is evaluated with several hundred two word utterances collected over a telephone channel. The combined classifier yields an equal error rate of two percent for this task, which is better than the individual performance of either classifier.
international conference on acoustics speech and signal processing | 1998
Kevin R. Farrell; Richard J. Mammone
We analyse the diversity of information as provided by several modeling approaches for speaker verification. This information is used to facilitate the fusion of the individual results into an overall result that provides advantages in accuracy over the individual models. The modeling methods that are evaluated consist of the neural tree network (NTN), Gaussian mixture model (GMM), hidden Markov model (HMM), and dynamic time warping (DTW). With the exception of DTW, all methods utilize subword-based approaches. The phrase-level scores for each modeling approach are used for combination. Several data fusion methods are evaluated for combining the model results, including the linear and log opinion pool approaches along with voting. The results of the above analysis have been integrated into a system that has been tested with several databases collected within landline and cellular environments. We have found the linear and log opinion pool methods to consistently reduce the error rate from that obtained when the models are need individually.
international conference on acoustics speech and signal processing | 1998
William Mistretta; Kevin R. Farrell
Model adaptation methods for a text-dependent speaker verification system are evaluated. The speaker verification system uses a discriminant model and a statistical model to represent each enrolled speaker. These modeling approaches consist of a neural tree network and Gaussian mixture model. Adaptation methods are evaluated for both modeling approaches. We show that the overall system performance with adaptation is comparable to that obtained by training the model with the additional information. However, the adaptation can be performed within a fraction of the time required to retrain a model. Additionally, we have evaluated the adapted and non-adapted models with data recorded six months after the initial enrolment. The adaptation reduced the error rate for the aged data by 40%.
international conference on acoustics, speech, and signal processing | 1994
Kevin R. Farrell; Richard J. Mammone
A modified neural tree network (NTN) is examined for use in text independent speaker identification. The NTN is a hierarchical classifier that combines the properties of decision trees and feed-forward neural networks. The modified NTN uses discriminant learning to partition feature space as opposed to the more common clustering approaches, such as vector quantization. The modified NTN also uses forward pruning to avoid overfitting the training data. The modified NTN is evaluated for both closed and open set speaker identification experiments using the TIMIT database. The performance of the modified NTN is compared to that of vector quantization classifiers. The results presented show the modified NTN to provide comparable performance to the vector quantization classifier for closed set speaker identification while providing improved performance for the open set problem.<<ETX>>
Archive | 1995
Kevin R. Farrell; Richard J. Mammone
Speaker recognition refers to the capability of recognizing a person based on his or her voice. Specifically, this consists of either speaker verification or speaker identification. The objective of speaker verification is to verify a person’s claimed identity based on a sample of speech from that person. The objective of speaker identification is to use a person’s voice to identify that person among a predetermined set of people.
military communications conference | 1993
Kevin R. Farrell; Richard J. Mammone
A new classifier is presented for estimating the modulation type for digitally modulated signals. The new classifier is known as the neural tree network (NTN). The NTN is a self-organizing, hierarchical classifier that implements a sequential linear decision strategy. The NTN does not require a statistical analysis of the features, as do Bayesian methods or decision trees. The NTN also allows for a more flexible partitioning of feature space than the prior classification methods. The features used for modulation classification are obtained from an autoregressive model of the signal. These features include the instantaneous frequency, bandwidth, and derivative of the instantaneous frequency. The modulation types to be estimated are continuous wave, binary and quadrature phase shift keying, and binary and quadrature frequency shift keying. The experiment results show the NTN to perform well for low carrier to noise ratio (CNR) input signals, in addition to outperforming the decision tree classifier.<<ETX>>
international conference on acoustics, speech, and signal processing | 1993
Kevin R. Farrell; Richard J. Mammone; Allen L. Gorin
An incremental approach to solving an algebraic formulation of the language acquisition problem is presented. This problem consists of solving a system of linear equations, where each equation represents a sentence/action pair and each variable denotes a word/action association. The algebraic model for language acquisition has been shown to provide advantages over the relative frequency estimate models when dealing with small-sample statistics. Two incremental methods are investigated to solve the system of linear equations. The incremental methods provide a regularized solution that is shown experimentally to be advantageous over the pseudo-inverse solution for classifying test data. In addition, the methods are more efficient with respect to computational and memory requirements.<<ETX>>