Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amir J. Jagharghi is active.

Publication


Featured researches published by Amir J. Jagharghi.


Journal of the Acoustical Society of America | 1993

Spectral‐shape features versus formants as acoustic correlates for vowels

Stephen A. Zahorian; Amir J. Jagharghi

The first three formants, i.e., the first three spectral prominences of the short-time magnitude spectra, have been the most commonly used acoustic cues for vowels ever since the work of Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)]. However, spectral shape features, which encode the global smoothed spectrum, provide a more complete spectral description, and therefore might be even better acoustic correlates for vowels. In this study automatic vowel classification experiments were used to compare formants and spectral-shape features for monopthongal vowels spoken in the context of isolated CVC words, under a variety of conditions. The roles of static and time-varying information for vowel discrimination were also compared. Spectral shape was encoded using the coefficients in a cosine expansion of the nonlinearly scaled magnitude spectrum. Under almost all conditions investigated, in the absence of fundamental frequency (F0) information, automatic vowel classification based on spectral-shape features was superior to that based on formants. If F0 was used as an additional feature, vowel classification based on spectral shape features was still superior to that based on formants, but the differences between the two feature sets were reduced. It was also found that the error pattern of perceptual confusions was more closely correlated with errors in automatic classification obtained from spectral-shape features than with classification errors from formants. Therefore it is concluded that spectral-shape features are a more complete set of acoustic correlates for vowel identity than are formants. In comparing static and time-varying features, static features were the most important for vowel discrimination, but feature trajectories were valuable secondary sources of information.


international conference on acoustics, speech, and signal processing | 1991

Acoustic-phonetic transformations for improved speaker-independent isolated word recognition

Stephen A. Zahorian; D. Qian; Amir J. Jagharghi

The authors present a method for improving HMM (hidden Markov model) phonetic discrimination capability which is a linear discriminant transform of acoustic features to a continuous-valued feature space such that phonetic distinctions correlate closely with Euclidean distance in the transformed feature space. Experimental testing with a 30-word single syllable highly confusable vocabulary showed that the acoustic-phonetic transform could be used to reduce word error rates approximately 25%. In general, results based on the LDA2 transform, i.e., linear discriminant analysis with whitening of the within-class covariance matrices, are superior to those obtained with LDA1, linear discriminant analysis without whitening. Recognition results also improve if a block transform of several frames per block is used rather than a transform based on one frame per block.<<ETX>>


IEEE Transactions on Signal Processing | 1992

Minimum mean-square error transformations of categorical data to target positions

Stephen A. Zahorian; Amir J. Jagharghi

A new algorithm is described for transforming multidimensional data such that all the data points in each of several predefined categories map toward a category target position in the transformed space. The procedure is based on minimizing the mean-square error between specified category target positions and actual transformed locations of the data. Least squares estimation techniques are used to derive linear equations for computing the transformation coefficients and for determining an origin offset in the transformed space. However, for additional flexibility in the transformation, a method is presented for combining the linear transformation with a nonlinear connectionist network transformation. This procedure can, among other things, be used as a tool to evaluate the precision with which physical measurements of psychophysical stimuli correlate with the perceptual configuration of those stimuli. Potential speech science applications are identified. Experimental results illustrate some of these applications with vowel data. >


Journal of the Acoustical Society of America | 1986

Matching of “physical” and “perceptual” spaces for vowels

Stephen A. Zahorian; Amir J. Jagharghi

An algorithm for matching physical and perceptual spaces for psychological stimuli will be described. Target points for each stimulus class must be chosen in a multidimensional perceptual space. The physical space consists of a multidimensional measurement space, in which measurements are made of each stimulus for a large number of subjects. A linear transformation from the measurement space to the perceptual space is determined such that the mean square distance between target points and transformed measurement points is minimized. There is no requirement that the dimensionality of the measurement and perceptual spaces be the same. Thus the algorithm can be used to redefine the measurement space with fewer dimensions such that the correspondence with predefined stimulus categories is maximized. This procedure has been tested using vowels spoken in an /hVd/ context, six principal components for measurement parameters, and a three‐dimensional perceptual space. Target positions in the perceptual space were ...


Journal of the Acoustical Society of America | 1987

Speaker‐independent automatic vowel recognition based on overall spectral shape versus formants

Stephen A. Zahorian; Amir J. Jagharghi

Automatic recognition experiments were performed to compare overall spectral shape versus formants as speaker‐independent acoustic parameters for vowel identity. Stimuli consisted of four repetitions of 11 vowels spoken by 17 female speakers and 12 male speakers (29*11*4 = 1276 total stimuli). Formants were computed automatically by peak picking of 12th‐order LP model spectra. Spectral shape was represented using three methods: (1) by a cosine basis vector expansion of the power spectrum: (2) as the output of a 16‐channel, 1/3‐oct filter bank; and (3) as the output of a 16‐channel mel‐spaced filter bank. Automatic recognition was based on maximum likelihood estimation in a multidimensional space. For all cases considered, the representations based on spectal shape resulted in significantly higher recognition accuracy than for recognition based on only three formants. For example, using the entire database of all speakers and 11 vowels, recognition based on spectral shape was about 85% vs 69% for three for...


Journal of the Acoustical Society of America | 1990

Vowel perception: Spectral shape versus formants

Amir J. Jagharghi; Stephen A. Zahorian

Traditional theories of vowel perception favor formants over global spectral shape as the primary perceptual cues to vowel identity. In previous ASA meetings, results of speaker‐independent automatic recognition experiments for vowels were reported that contrasted global spectral shape versus formants [A. J. Jagharghi and S. A. Zahorian, J. Acoust. Soc. Am. Suppl. 1 81, S18 (1987); S. A. Zahorian and A. J. Jagharghi, J. Acoust. Soc. Am. Suppl. 1 82, S37 (1987)]. These results indicate that automatic recognition rates based on global spectral shape are generally slightly superior to recognition rates based on formants. In the present study, the perception of vowels is investigated for vowels synthesized such that the synthesized tokens contain conflicting cues to vowel identity based on overall spectral shape versus formants. Two distinct but close vowels are selected. The spectral shape of the first vowel is modified to match, to the extent possible, the spectral shape of the second vowel without any chan...


Journal of the Acoustical Society of America | 1989

Linear transformations for vowel normalization

Stephen A. Zahorian; Amir J. Jagharghi

The results of an evaluation of multivariable linear regression techniques for speaker normalization of vowel data for 11 vowel classes will be presented. The database for the study consisted of the central vowel portions of 2922 CVC syllables obtained from ten males, ten females, and ten children. Each stimulus was represented both by three formants and in terms of overall spectral shape, via the discrete cosine transform coefficients (DCTCs) of the magnitude spectra. In all classification experiments, half the database was used to train the classifier and the other half was used for evaluation. For the case of formants, the classification accuracy on the evaluation data was 63.3% if different speakers were used for training and testing (and thus no speaker normalization), 63.9% if the same speakers were used in the training and test sets but without explicit normalization, and 75.2% with speaker‐normalized data. The rates for the corresponding conditions, but with DCTCs as parameters, were 58.0%, 66.1%,...


Journal of the Acoustical Society of America | 1987

Vowel identification: Are formants really necessary?

Amir J. Jagharghi; Stephen A. Zahorian

It has generally been assumed at least since the time of the comprehensive study by Peterson and Barney [J. Acoust. Soc. Am. 24, 175–184 (1952)] that the formant locations in vowel spectra are the most significant cues to vowel identity. In this experiment vowel spectra were represented by two methods: (A) by the locations of the first three formants, and (B) by the overall smoothed spectral shape in terms of a discrete cosine transform of the power spectra. Stimuli consisted of four repetitions of the widely separated vowels /u/, /i/, /a/, spoken by each of 12 female and 12 male speakers (4⋅24⋅3 = 288 stimuli total). For each of the two spectral encoding methods, A and B, the vowel data were projected to a three‐dimensional space such that the vowel categories would be well separated and the vowels within each category well clustered [S. A. Zahorian and A. J. Jagharghi, J. Acoust. Soc. Am. Suppl. 1 79, S8 (1986)]. Significantly better clustering was obtained with method B, based on overall spectral shape...


Journal of the Acoustical Society of America | 1985

Color display of vowel spectra as a speech training aid for the deaf

Stephen A. Zahorian; Amir J. Jagharghi

In this paper, the development and use of a vowel to color converter as a speech training aid for the deaf will be discussed. The vowel spectra are encoded in terms of six spectral shape factors called spectral principal components. The basis vectors used to compute the principal‐components parameters have been optimized for use with the 16 bandpass filters which are the first processing stage of the speech training aid. Measurements based on vowels spoken in an /hVd/ context illustrate the clustering of vowel spectra in this parameter space. A linear transformation is used to convert the spectral parameters to control the red, green, and blue inputs of a color CRT display. The transformation is designed to maximize the distance between the three widely separated vowels /a/, /u/, and /i/. A “flow‐mode” display is used. The results of preliminary training experiments to aid in vowel articulation will be presented. [Work supported by the Whitaker Foundation.]


Journal of the Acoustical Society of America | 1991

Speaker normalization of static and dynamic vowel spectral features

Stephen A. Zahorian; Amir J. Jagharghi

Collaboration


Dive into the Amir J. Jagharghi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

D. Qian

Old Dominion University

View shared research outputs
Researchain Logo
Decentralizing Knowledge