Richard J. Mammone
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard J. Mammone.
IEEE Transactions on Speech and Audio Processing | 1994
Kevin R. Farrell; Richard J. Mammone; Khaled T. Assaleh
An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed- and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLPs), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval. >
IEEE Transactions on Computers | 1993
A. Sakar; Richard J. Mammone
A pattern classification method called neural tree networks (NTNs) is presented. The NTN consists of neural networks connected in a tree architecture. The neural networks are used to recursively partition the feature space into subregions. Each terminal subregion is assigned a class label which depends on the training data routed to it by the neural networks. The NTN is grown by a learning algorithm, as opposed to multilayer perceptrons (MLPs), where the architecture must be specified before learning can begin. A heuristic learning algorithm based on minimizing the L1 norm of the error is used to grow the NTN. It is shown that this method has better performance in terms of minimizing the number of classification errors than the squared error minimization method used in backpropagation. An optimal pruning algorithm is given to enhance the generalization of the NTN. Simulation results are presented on Boolean function learning tasks and a speaker independent vowel recognition task. The NTN compares favorably to both neural networks and decision trees. >
Journal of The Optical Society of America A-optics Image Science and Vision | 1990
Christine Podilchuk; Richard J. Mammone
We introduce a new convex constraint for image recovery using the method of projection onto convex sets. The set of least-squares solutions to the image-recovery problem is shown to form a convex set. The projection operator onto this set is presented. The resulting least-squares projection method is shown to provide improved performance over conventional projection techniques.
IEEE Transactions on Speech and Audio Processing | 1994
Khaled T. Assaleh; Richard J. Mammone
A new set of features is introduced that has been found to improve the performance of automatic speaker identification systems, The new set of features is referred to as the adaptive component weighting (ACW) cepstral coefficients. The new features emphasize the formant structure of the speech spectrum while attenuating the broad-bandwidth spectral components. The attenuated components correspond to the variations in spectral tilt of transmission and recording environment, and other characteristics that are irrelevant to speaker identification. The resulting ACW spectrum introduces zeros into the usual all-pole linear prediction (LP) spectrum. This is equivalent to applying a finite impulse response (FIR) filter that normalizes the narrow-band modes of the spectrum. Unlike existing fixed cepstral weighting schemes, the ACW cepstrum provides an adaptively weighted version of the LP cepstrum. The adaptation results in deemphasizing the irrelevant variations of the LP cepstral coefficients on a frame-by-frame basis. The ACW features are evaluated for text-independent speaker identification and are shown to yield improved performance. >
Pattern Recognition | 2002
Kevin R. Farrell; Richard J. Mammone
Speaker recognition refers to the concept of recognizing a speaker by his/her voice or speech samples. Some of the important applications of speaker recognition include customer verification for bank transactions, access to bank accounts through telephones, control on the use of credit cards, and for security purposes in the army, navy and airforce. This paper is purely a tutorial that presents a review of the classifier based methods used for speaker recognition. Both unsupervised and supervised classifiers are described. In addition, practical approaches that utilize diversity, redundancy and fusion strategies are discussed with the aim of improving performance.
military communications conference | 1992
Khaled T. Assaleh; Kevin R. Farrell; Richard J. Mammone
A modulation model representation of a signal is used to provide a convenient form for subsequent analysis. The modulation model is formed by estimating the instantaneous frequency and bandwidth using autoregressive spectrum analysis. In particular, the instantaneous bandwidth and derivative of the instantaneous frequency prove to be valuable parameters in estimating modulation type. This method performed extremely well for input carrier-to-noise ratios as low as 15 dB. Additionally, since the autoregressive fit to the frequency spectrum is second order, the autoregressive polynomials coefficients and corresponding roots can be computed with closed-form expressions. Thus, the method is computationally efficient.<<ETX>>
IEEE Transactions on Speech and Audio Processing | 1998
Mihailo S. Zilovic; Richard J. Mammone
A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. A different cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was previously introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt.
IEEE Transactions on Speech and Audio Processing | 1995
Mihailo S. Zilovic; Richard J. Mammone
Various linear predictive (LP) analysis methods are studied and compared from the points of view of robustness to noise and of application to speaker identification. The key to the success of the LP techniques is in separating the vocal tract information from the pitch information present in a speech signal even under noisy conditions. In addition to considering the conventional, one-shot weighted least-squares methods, the authors propose three other approaches with the above point as a motivation. The first is an iterative approach that leads to the weighted least absolute value solution. The second is an extension of the one-shot least-squares approach and achieves an iterative update of the weights. The update is a function of the residual and is based on minimizing a Mahalanobis distance. Third, the weighted total least-squares formulation is considered. A study of the deviations in the LP parameters is done when noise (white Gaussian and impulsive) is added to the speech. It is revealed that the most robust method depends on the type of noise. Closed-set speaker identification experiments with 20 speakers are conducted using a vector quantizer classifier trained on clean speech. The relative performance of the various LP approaches depends on the type of speech material used for testing. >
north american chapter of the association for computational linguistics | 2001
Abraham Ittycheriah; Martin Franz; Wei-Jing Zhu; Adwait Ratnaparkhi; Richard J. Mammone
We present a statistical question answering system developed for TREC-9 in detail. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval which did document retrieval from a local encyclopedia, and then expanded the query words and finally did passage retrieval from the TREC collection. We will also discuss the answer selection algorithm which determines the best sentence given both the question and the occurrence of a phrase belonging to the answer class desired by the question. A new method of analyzing system performance via a transition matrix is shown.
Journal of the Acoustical Society of America | 2004
Manish Sharma; Xiaoyu Zhang; Richard J. Mammone
The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.