Khaled T. Assaleh
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Khaled T. Assaleh.
IEEE Transactions on Speech and Audio Processing | 1994
Kevin R. Farrell; Richard J. Mammone; Khaled T. Assaleh
An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed- and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLPs), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval. >
IEEE Transactions on Speech and Audio Processing | 2002
William M. Campbell; Khaled T. Assaleh; Charles C. Broun
Modern speaker recognition applications require high accuracy at low complexity. We propose the use of a polynomial-based classifier to achieve these objectives. This approach has several advantages. First, polynomial classifier scoring yields a system which is highly computationally scalable with the number of speakers. Second, a new training algorithm is proposed which is discriminative, handles large data sets, and has low memory usage. Third, the output of the polynomial classifier is easily incorporated into a statistical framework allowing it to be combined with other techniques such as hidden Markov models. Results are given for the application of the new methods to the YOHO speaker recognition database.
IEEE Transactions on Speech and Audio Processing | 1994
Khaled T. Assaleh; Richard J. Mammone
A new set of features is introduced that has been found to improve the performance of automatic speaker identification systems, The new set of features is referred to as the adaptive component weighting (ACW) cepstral coefficients. The new features emphasize the formant structure of the speech spectrum while attenuating the broad-bandwidth spectral components. The attenuated components correspond to the variations in spectral tilt of transmission and recording environment, and other characteristics that are irrelevant to speaker identification. The resulting ACW spectrum introduces zeros into the usual all-pole linear prediction (LP) spectrum. This is equivalent to applying a finite impulse response (FIR) filter that normalizes the narrow-band modes of the spectrum. Unlike existing fixed cepstral weighting schemes, the ACW cepstrum provides an adaptively weighted version of the LP cepstrum. The adaptation results in deemphasizing the irrelevant variations of the LP cepstral coefficients on a frame-by-frame basis. The ACW features are evaluated for text-independent speaker identification and are shown to yield improved performance. >
military communications conference | 1992
Khaled T. Assaleh; Kevin R. Farrell; Richard J. Mammone
A modulation model representation of a signal is used to provide a convenient form for subsequent analysis. The modulation model is formed by estimating the instantaneous frequency and bandwidth using autoregressive spectrum analysis. In particular, the instantaneous bandwidth and derivative of the instantaneous frequency prove to be valuable parameters in estimating modulation type. This method performed extremely well for input carrier-to-noise ratios as low as 15 dB. Additionally, since the autoregressive fit to the frequency spectrum is second order, the autoregressive polynomials coefficients and corresponding roots can be computed with closed-form expressions. Thus, the method is computationally efficient.<<ETX>>
international conference on acoustics speech and signal processing | 1999
William M. Campbell; Khaled T. Assaleh
Modern speaker verification applications require high accuracy at low complexity. We propose the use of a polynomial-based classifier to achieve this objective. We demonstrate a new combination of techniques which makes polynomial classification accurate and powerful for speaker verification. We show that discriminative training of polynomial classifiers can be performed on large data sets. A prior probability compensation method is detailed which increases accuracy and normalizes the output score range. Results are given for the application of the new methods to YOHO.
international conference on acoustics, speech, and signal processing | 1994
Khaled T. Assaleh; Richard J. Mammone
In this paper we introduce a new set of features that provides improved performance for speaker identification. This feature set is referred to as the adaptive component weighting (ACW) cepstral coefficients. The ACW scheme modifies the linear predictive (LP) spectral components (resonances) so as to emphasize the formant structure by attenuating the broad-bandwidth spectral components. Such components are found to introduce undesired variability in the LP spectra of speech signals due to environmental factors. The ACW cepstral coefficients represent an adaptively weighted version of the LP cepstrum. The adaptation results in deemphasizing the irrelevant variations of the LP cepstral coefficients on a frame-by-frame basis. Experiments are presented using the San Diego portion of the King database. The ACW cepstrum is shown to offer improved speaker identification performance as compared to other common methods of cepstral weighting.<<ETX>>
information sciences, signal processing and their applications | 1999
William M. Campbell; Khaled T. Assaleh; C. C. Brown
Low-complexity technology is a natural focal point for todays portable devices. Low-complexity increases battery life and facilitates small memory platform implementation. We propose the use of polynomial classifiers for portable speech recognition implementation. Polynomial classifiers have a simple architecture which fits well with modern DSP chips. We demonstrate several novel methods for implementing speech recognition. We describe a novel training technique. We also show a technique for fast scoring and prior normalization. We illustrate some of the properties of the methods on a command-and-control database.
international conference on acoustics, speech, and signal processing | 1993
Khaled T. Assaleh; Richard J. Mammone; Mazin Rahim; James L. Flanagan
A set of features and classification metric based on the modulation model (MM) for speech recognition are presented. The MM decomposes speech into amplitude and frequency modulated components. Computer simulations are presented that demonstrate the robustness to aging of the MM-based features when these features are employed for speech recognition.<<ETX>>
visual communications and image processing | 1991
Khaled T. Assaleh; Yehoshua Y. Zeevi; Izidor Gertner
A stable Gabor-type representation of an image requires that the Zak transform (ZT) of the reference function does not vanish over the fundamental cube. We prove that the discrete ZT of any symmetric set of reference data points has a zero. To overcome the computational problem, which is due to the zero plane generated by the ZT of the Gaussian reference function, the Gaussian is translated by a sub-pixel distance. We show that the absolute value of the minimum of the ZT of the Gaussian is a function of the sub-pixel distance of translation and that the optimum value of such translation is 1/2 pixel.
SPIE's 1994 International Symposium on Optics, Imaging, and Instrumentation | 1994
Khaled T. Assaleh; Kevin R. Farrell; Mihailo S. Zilovic; Manish Sharma; Devang Naik; Richard J. Mammone
A new system is presented for text-dependent speaker verification. The system uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers. Hence, both intraspeaker and interspeaker information are utilized in the final decision. The distortion and discriminant-based classifiers used are dynamic time warping and the neural tree network, respectively. The system is evaluated with several hundred one word utterances collected over a telephone channel. All handsets considered in this experiment use electret microphones. The new system is found to perform exceptionally well for this task. A second experiment uses handsets having both electret and carbon button microphones. Here, a channel detection scheme is proposed that improves performance under these conditions.