Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zaki B. Nossair is active.

Publication


Featured researches published by Zaki B. Nossair.


Journal of the Acoustical Society of America | 1991

Dynamic spectral shape features as acoustic correlates for initial stop consonants

Zaki B. Nossair; Stephen A. Zahorian

A comprehensive investigation of two acoustic feature sets for English stop consonants spoken in syllable initial position was conducted to determine the relative invariance of the features that cue place and voicing. The features evaluated were overall spectral shape, encoded as the cosine transform coefficients of the nonlinearly scaled amplitude spectrum, and formants. In addition, features were computed both for the static case, i.e., from one 25-ms frame starting at the burst, and for the dynamic case, i.e., as parameter trajectories over several frames of speech data. All features were evaluated with speaker-independent automatic classification experiments using the data from 15 speakers to train the classifier and the data from 15 different speakers for testing. The primary conclusions from these experiments, as measured via automatic recognition rates, are as follows: (1) spectral shape features are superior to both formants, and formants plus amplitudes; (2) features extracted from the dynamic sp...


IEEE Transactions on Speech and Audio Processing | 1999

A partitioned neural network approach for vowel classification using smoothed time/frequency features

Stephen A. Zahorian; Zaki B. Nossair

A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed using a neural network classifier that is trained to discriminate the two members of one pair of categories. Multiple two way classification decisions are then combined to form an N-way decision. Some of the advantages of the new classification approach include the partitioning of the task allowing independent feature and classifier optimization for each pair of categories, lowered sensitivity of classification performance on network parameters, a reduction in the amount of training data required, and potential for superior performance relative to a single large network. The features described in this paper, closely related to the cepstral coefficients and delta cepstra commonly used in speech analysis, are developed using a unified mathematical framework which allows arbitrary nonlinear frequency, amplitude, and time scales to compactly represent the spectral/temporal characteristics of speech. This classification approach, combined with a feature ranking algorithm which selected the 35 most discriminative spectral/temporal features for each vowel pair, resulted in 71.5% accuracy for classification of 16 vowels extracted from the TIMIT database. These results, significantly higher than other published results for the same task, illustrate the potential for the methods presented in this paper.


international conference on acoustics, speech, and signal processing | 1995

Signal modeling enhancements for automatic speech recognition

Zaki B. Nossair; Peter L. Silsbee; Stephen A. Zahorian

Experiments in modeling speech signals for phoneme classification are described. Enhancements to standard speech processing methods include basis vector representations of dynamic feature trajectories, morphological smoothing (dilation) of spectral features, and the use of many closely spaced, short analysis windows. Results are reported from experiments using the TIMIT database of up to 71.0% correct classification of 16 presegmented vowels in a noise-free environment, and 54.5% correct classification in a 10 dB signal-to-noise ratio environment.


ieee sp international symposium on time frequency and time scale analysis | 1994

A warped time-frequency expansion for speech signal representation

Peter L. Silsbee; Stephen A. Zahorian; Zaki B. Nossair

A novel representation for speech signals is proposed. The time-varying frequency content of a speech segment is represented as a weighted sum of two-dimensional basis vectors; these incorporate both frequency warping and frequency-dependent time warping. This is quite flexible; for example, any arbitrary time or frequency warping function can easily be implemented, and any time-frequency representation can be used as the starting point. Examples are presented which demonstrate desirable characteristics of the representation: (1) explicit quantification of parameter trajectories, (2) time resolution which varies with respect to time and frequency, and (3) the ability to reconstruct a time-frequency plot which reflects the resolution characteristics of the representation.<<ETX>>


international conference on acoustics, speech, and signal processing | 1990

Dynamic spectral shape features for speaker-independent automatic recognition of stop consonants

Stephen A. Zahorian; Zaki B. Nossair

Several acoustic feature sets and automatic classifiers were investigated to determine a combination of features and classifiers which would permit accurate bottom-up speaker- and vowel-independent automatic recognition of initial stop consonants in English. The features evaluated included a form of cepstral coefficients and formants, each computed both for one static frame and as spectral trajectories over various segments of the speech signal. The classifiers investigated included Bayesian maximum-likelihood (BML), artificial neural network (NN), and K-nearest-neighbor (KNN) classifiers. The most accurate results, over 93% of the six stops correctly identified with a speaker-independent classifier, were obtained with the BML classifier using cepstral coefficient trajectories as a 20-dimensional feature vector. These results for stop recognition are higher than any results previously reported for a database of similar diversity.<<ETX>>


ieee sp international symposium on time frequency and time scale analysis | 1994

Smoothed time/frequency features for vowel classification

Zaki B. Nossair; Stephen A. Zahorian

A novel signal modeling technique is described to compute smoothed time-frequency features for encoding speech information. These time-frequency features compactly and accurately model phonetic information, while accounting for the main effects of contextual variations. These segment-level features are computed such that more emphasis is given to the center of the segment and less to the end regions. For phonetic classification, the features are relatively insensitive to both time and frequency resolution, as least insofar as changes in window length and frame spacing are concerned. A 60-dimensional feature space based on this modeling technique resulted in 70.9% accuracy for classification of 16 vowels extracted from the TIMIT data base in speaker-independent experiments. These results are higher than any other results reported in the literature for the same task.<<ETX>>


Journal of the Acoustical Society of America | 1989

Spectral shape factors for speaker‐independent automatic recognition of stop consonants

Zaki B. Nossair; Stephen A. Zahorian

A series of automatic recognition experiments was conducted with naturally produced English stop consonants /b,p,d,t,g,k/ in syllable initial position. The objectives of the experiments were to investigate in detail the effectiveness of spectral shape factors, as both dynamic and static features for automatic recognition of stop consonants. Spectral shape factors were computed as the discrete cosine transform coefficients (DCTCs) of the magnitude spectra. The database used in these experiments consisted of 2481 CVC syllables spoken in isolation by ten males, ten females, and ten children. In all experiments, 15 speakers were used to train the classifier and the other 15 speakers were used for evaluation. For the case of dynamic features, DCTCs were computed over a 50‐ms interval beginning with the burst using 7‐ms frames spaced every 5 ms. For the static case, the DCTCs were computed from one 25.6‐ms frame beginning at the burst. Automatic classification results, based on the test data, were 87.2% for the...


Journal of the Acoustical Society of America | 1987

Evidence against acoustic invariance in initial voiced stop consonants

Stephen A. Zahorian; Zaki B. Nossair; Robert F. Coleman

A perceptual experiment was conducted with naturally spoken intitial stop consonants in order to test recent claims for acoustic invariance in the initial portions of stop consonants [for example, S. E. Blumstein and K. N. Stevens, J. Acoust. Soc. Am. 67, 648–662 (1980)]. The original stimuli consisted of tokens of the stops /b,d,g/ with the vowels /i,a,u/ as spoken by two male and two female speakers. A computer graphics waveform editor was used to locate the initial portions of the waveform up to the first, second, fourth, and sixth voicing pulses. Three experimental conditions were evaluated: (a) the initial segment only; (b) the initial segment plus the original steady‐state vowel; and (c) the intitial segment plus an alternate steady‐state vowel. Ten listeners participated in a forced‐choice recognition experiment to determine the conditions for which perceptual cues to consonant identity are retained. Recognition accuracy was consistently highest for condition (b), lower for condition (a), and lowes...


Journal of the Acoustical Society of America | 1994

Spectral envelope features for vowel classification in clean and noisy speech

Zaki B. Nossair; Zhong‐Jian Zhang; Stephen A. Zahorian

In previous synthesis experiments with multi‐tone vowels it was found that vowel perception is more closely correlated with acoustic cues derived from the envelope of the magnitude spectrum than with cues derived from the overall magnitude spectrum [Zahorian et al., J. Acoust. Soc. Am. 93, 2298–2299 (1993)]. In the present study automatic vowel classification experiments were used to compare spectral magnitude features versus spectral envelope features. The features computed were cepstral coefficients, similar to those typically used in automatic speech recognition, versus cepstral coefficients computed from the envelope spectrum. The two feature sets were compared for four conditions: a single frame of clean speech, a single frame of noisy speech with varying signal‐to‐noise ratios, multi‐frames of clean speech, and multi‐frames of noisy speech. For clean speech conditions, the automatic classification results were nearly identical for the two feature sets. However, for both of the noisy speech condition...


Journal of the Acoustical Society of America | 1993

Spectral shape cues for vowels that predict the perception of multiple‐tone steady‐state stimuli

Stephen A. Zahorian; Zhong‐Jiang Zhang; Zaki B. Nossair

In previous experiments for which multiple tone stimuli were synthesized such that either the formants or global spectral shape were matched to that of naturally spoken vowel tokens, it was found for both cases that vowel identity and quality was not well preserved [S. A. Zahorian and Z.‐J. Zhong, J. Acoust. Soc. Am. 92, 2414–2415 (1992)]. In the present study several additional criteria were tested for selecting the amplitudes and frequencies of sinusoids with the objective that stimuli synthesized from these sinusoids would be perceived as most similar to original ‘‘target’’ vowel tokens. Of the methods investigated, vowel quality from stimuli synthesized from N sinusoids was best preserved if these sinusoids match the N largest peaks in the magnitude spectrum of the original vowels. Depending on the vowel, between 5 and 10 sinusoids are required such that the synthesized token is perceived as sounding nearly identical to the original token. A new metric for spectral shape that yields acoustically invar...

Collaboration


Dive into the Zaki B. Nossair's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge