Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Augustine H. Gray is active.

Publication


Featured researches published by Augustine H. Gray.


Language | 1982

Linear Prediction of Speech

John D. Markel; Augustine H. Gray

1. Introduction.- 1.1 Basic Physical Principles.- 1.2 Acoustical Waveform Examples.- 1.3 Speech Analysis and Synthesis Models.- 1.4 The Linear Prediction Model.- 1.5 Organization of Book.- 2. Formulations.- 2.1 Historical Perspective.- 2.2 Maximum Likelihood.- 2.3 Minimum Variance.- 2.4 Pronys Method.- 2.5 Correlation Matching.- 2.6 PARCOR (Partial Correlation).- 2.6.1 Inner Products and an Orthogonality Principle.- 2.6.2 The PARCOR Lattice Structure.- 3. Solutions and Properties.- 3.1 Introduction.- 3.2 Vector Spaces and Inner Products.- 3.2.1 Filter or Polynomial Norms.- 3.2.2 Properties of Inner Products.- 3.2.3 Orthogonality Relations.- 3.3 Solution Algorithms.- 3.3.1 Correlation Matrix.- 3.3.2 Initialization.- 3.3.3 Gram-Schmidt Orthogonalization.- 3.3.4 Levinson Recursion.- 3.3.5 Updating Am(z).- 3.3.6 A Test Example.- 3.4 Matrix Forms.- 4. Acoustic Tube Modeling.- 4.1 Introduction.- 4.2 Acoustic Tube Derivation.- 4.2.1 Single Section Derivation.- 4.2.2 Continuity Conditions.- 4.2.3 Boundary Conditions.- 4.3 Relationship between Acoustic Tube and Linear Prediction.- 4.4 An Algorithm, Examples, and Evaluation.- 4.4.1 An Algorithm.- 4.4.2 Examples.- 4.4.3 Evaluation of the Procedure.- 4.5 Estimation of Lip Impedance.- 4.5.1 Lip Impedance Derivation.- 4.6 Further Topics.- 4.6.1 Losses in the Acoustic Tube Model.- 4.6.2 Acoustic Tube Stability.- 5. Speech Synthesis Structures.- 5.1 Introduction.- 5.2 Stability.- 5.2.1 Step-up Procedure.- 5.2.2 Step-down Procedure.- 5.2.3 Polynomial Properties.- 5.2.4 A Bound on |Fm(z)|.- 5.2.5 Necessary and Sufficient Stability Conditions.- 5.2.6 Application of Results.- 5.3 Recursive Parameter Evaluation.- 5.3.1 Inner Product Properties.- 5.3.2 Equation Summary with Program.- 5.4 A General Synthesis Structure.- 5.5 Specific Speech Synthesis Structures.- 5.5.1 The Direct Form.- 5.5.2 Two-Multiplier Lattice Model.- 5.5.3 Kelly-Lochbaum Model.- 5.5.4 One-Multiplier Models.- 5.5.5 Normalized Filter Model.- 5.5.6 A Test Example.- 6. Spectral Analysis.- 6.1 Introduction.- 6.2 Spectral Properties.- 6.2.1 Zero Mean All-Pole Model.- 6.2.2 Gain Factor for Spectral Matching.- 6.2.3 Limiting Spectral Match.- 6.2.4 Non-uniform Spectral Weighting.- 6.2.5 Minimax Spectral Matching.- 6.3 A Spectral Flatness Model.- 6.3.1 A Spectral Flatness Measure.- 6.3.2 Spectral Flatness Transformations.- 6.3.3 Numerical Evaluation.- 6.3.4 Experimental Results.- 6.3.5 Driving Function Models.- 6.4 Selective Linear Prediction.- 6.4.1 Selective Linear Prediction (SLP) Algorithm.- 6.4.2 A Selective Linear Prediction Program.- 6.4.3 Computational Considerations.- 6.5 Considerations in Choice of Analysis Conditions.- 6.5.1 Choice of Method.- 6.5.2 Sampling Rates.- 6.5.3 Order of Filter.- 6.5.4 Choice of Analysis Interval.- 6.5.5 Windowing.- 6.5.6 Pre-emphasis.- 6.6 Spectral Evaluation Techniques.- 6.7 Pole Enhancement.- 7. Automatic Formant Trajectory Estimation.- 7.1 Introduction.- 7.2 Formant Trajectory Estimation Procedure.- 7.2.1 Introduction.- 7.2.2 Raw Data from A(z).- 7.2.3 Examples of Raw Data.- 7.3 Comparison of Raw Data from Linear Prediction and Cepstral Smoothing.- 7.4 Algorithm 1.- 7.5 Algorithm 2.- 7.5.1 Definition of Anchor Points.- 7.5.2 Processing of Each Voiced Segment.- 7.5.3 Final Smoothing.- 7.5.4 Results and Discussion.- 7.6 Formant Estimation Accuracy.- 7.6.1 An Example of Synthetic Speech Analysis.- 7.6.2 An Example of Real Speech Analysis.- 7.6.3 Influence of Voice Periodicity.- 8. Fundamental Frequency Estimation.- 8.1 Introduction.- 8.2 Preprocessing by Spectral Flattening.- 8.2.1 Analysis of Voiced Speech with Spectral Regularity.- 8.2.2 Analysis of Voiced Speech with Spectral Irregularities.- 8.2.3 The STREAK Algorithm.- 8.3 Correlation Techniques.- 8.3.1 Autocorrelation Analysis.- 8.3.2 Modified Autocorrelation Analysis.- 8.3.3 Filtered Error Signal Autocorrelation Analysis.- 8.3.4 Practical Considerations.- 8.3.5 The SIFT Algorithm.- 9. Computational Considerations in Analysis.- 9.1 Introduction.- 9.2 Ill-Conditioning.- 9.2.1 A Measure of Ill-Conditioning.- 9.2.2 Pre-emphasis of Speech Data.- 9.2.3 Prefiltering before Sampling.- 9.3 Implementing Linear Prediction Analysis.- 9.3.1 Autocorrelation Method.- 9.3.2 Covariance Method.- 9.3.3 Computational Comparison.- 9.4 Finite Word Length Considerations.- 9.4.1 Finite Word Length Coefficient Computation.- 9.4.2 Finite Word Length Solution of Equations.- 9.4.3 Overall Finite Word Length Implementation.- 10. Vocoders.- 10.1 Introduction.- 10.2 Techniques.- 10.2.1 Coefficient Transformations.- 10.2.2 Encoding and Decoding.- 10.2.3 Variable Frame Rate Transmission.- 10.2.4 Excitation and Synthesis Gain Matching.- 10.2.5 A Linear Prediction Synthesizer Program.- 10.3 Low Bit Rate Pitch Excited Vocoders.- 10.3.1 Maximum Likelihood and PARCOR Vocoders.- 10.3.2 Autocorrelation Method Vocoders.- 10.3.3 Covariance Method Vocoders.- 10.4 Base-Band Excited Vocoders.- 11. Further Topics.- 11.1 Speaker Identification and Verification.- 11.2 Isolated Word Recognition.- 11.3 Acoustical Detection of Laryngeal Pathology.- 11.4 Pole-Zero Estimation.- 11.5 Summary and Future Directions.- References.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1980

Speech coding based upon vector quantization

Andres Buzo; Augustine H. Gray; Robert M. Gray; John D. Markel

With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

Distance measures for speech processing

Augustine H. Gray; John D. Markel

The properties and interrelationships among four measures of distance in speech processing are theoretically and experimentally discussed. The root mean square (rms) log spectral distance, cepstral distance, likelihood ratio (minimum residual principle or delta coding (DELCO) algorithm), and a cosh measure (based upon two nonsymmetrical likelihood ratios) are considered. It is shown that the cepstral measure bounds the rms log spectral measure from below, while the cosh measure bounds it from above. A simple nonlinear transformation of the likelihood ratio is shown to be highly correlated with the rms log spectral measure over expected ranges. Relationships between distance measure values and perception are also considered. The likelihood ratio, cepstral measure, and cosh measure are easily evaluated recursively from linear prediction filter coefficients, and each has a meaningful and interrelated frequency domain interpretation. Fortran programs are presented for computing the recursively evaluated distance measures.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1980

Distortion measures for speech processing

Robert M. Gray; Andres Buzo; Augustine H. Gray; Yasuo Matsuyama

Several properties, interrelations, and interpretations are developed for various speech spectral distortion measures. The principle results are 1) the development of notions of relative strength and equivalence of the various distortion measures both in a mathematical sense corresponding to subjective equivalence and in a coding sense when used in minimum distortion or nearest neighbor speech processing systems; 2) the demonstration that the Itakura-Saito and related distortion measures possess a property similar to the triangle inequality when used in nearest neighbor systems such as quantization and cluster analysis; and 3) that the Itakura-Saito and normalized model distortion measures yield efficient computation algorithms for generalized centroids or minimum distortion points of groups or clusters of speech frames, an important computation in both classical cluster analysis techniques and in algorithms for optimal quantizer design. We also argue that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.


IEEE Transactions on Audio and Electroacoustics | 1973

Digital lattice and ladder filter synthesis

Augustine H. Gray; J. Markel

There is evidence that in addition to standard digital filter forms such as the direct, parallel, and cascade forms, digital lattice and ladder filters may play an important role in finite word length implementation problems. In this paper, techniques are developed in detail for efficiently synthesizing digital lattice and ladder filters from any stable direct form. In one form, a lattice filter canonic in terms of multiplies and delays is obtained. An internal scaling procedure is also introduced that will be of importance for optimizing one of the lattice forms for finite word length implementation.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1979

Least squares glottal inverse filtering from the acoustic speech waveform

David Y. Wong; John D. Markel; Augustine H. Gray

Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a stable closed glottis interval. Based on a linear model of speech production, it is shown that both the moment of glottal closure and opening can be determined from the normalized total squared error with proper choices of analysis window length and filter order. Results from actual speech are presented to illustrate the technique.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1975

A normalized digital filter structure

Augustine H. Gray; J. Markel

A normalized digital filter structure is presented, based upon an orthonormal polynomial expansion. This structure is recursively designed, has several predictable stability properties in the presence of time-varying parameters, and appears to have roundoff noise properties which are superior to other known filter structures, particularly in the presence of clustered poles. Each section of the filter can be precisely implemented by one complex multiply.


IEEE Transactions on Audio and Electroacoustics | 1973

On autocorrelation equations as applied to speech analysis

John D. Markel; Augustine H. Gray

The mathematical theory underlying one of the techniques currently being used in linear prediction of speech is developed from an inner product formulation. This formulation produces a unified framework for studying the properties of autocorrelation equations. In particular, the solution and stability of the auto-correlation equations are studied in detail. Experimental stability results for finite word length analysis of speech are also presented.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

Quantization and bit allocation in speech processing

Augustine H. Gray; J. Markel

The topic of quantization and bit allocation in speech processing is studied using an L 2 norm. Closed-form expressions are derived for the root mean square (rms) spectral deviation due to variations in one, two, or multiple parameters. For one-parameter variation, the reflection coefficients, log area ratios, and inverse sine coefficients are studied. It is shown that, depending upon the criterion chosen, either log area ratios or inverse sine quantization can be viewed as optimal. From a practical point of view, it is shown experimentally that very little difference exists among the various quantization methods beyond the second coefficient. Two-parameter variations are studied in terms of formant frequency and bandwidth movement and in terms of a two-pair quantization scheme. A lower bound on the number of quantization levels required to satisfy a given maximum spectral deviation is derived along with the two-pair quantization scheme which approximately satisfies the bound. It is shown theoretically that the two-pair quantization scheme has a 10-bit superiority over other above-mentioned quantization schemes in the sense of theoretically assuring that a maximum overall log spectral deviation will not be exceeded.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

A computer program for designing digital elliptic filters

Augustine H. Gray; John D. Markel

A computer program is presented for designing digital elliptic filters of the four most common types: low-pass, high-pass, bandpass, and bandstop. The program is presented in Fortran IV. The various numerical algorithms required for implementing the elliptic functions are referenced in the program and are obtained from readily available tables of mathematical functions. Examples for each filter type are presented.

Collaboration


Dive into the Augustine H. Gray's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andres Buzo

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge