John Makhoul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Makhoul is active.

Explore More

Publication

Featured researches published by John Makhoul.

Proceedings of the IEEE | 1985

Vector quantization in speech coding

John Makhoul; Salim Roucos; H. Gish

Quantization, the process of approximating continuous-amplitude signals by digital (discrete-amplitude) signals, is an important aspect of data compression or coding, the field concerned with the reduction of the number of bits necessary to transmit or store analog data, subject to a distortion or fidelity criterion. The independent quantization of each signal value or parameter is termed scalar quantization, while the joint quantization of a block of parameters is termed block or vector quantization. This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization. Vector quantization is presented as a process of redundancy removal that makes effective use of four interrelated properties of vector parameters: linear dependency (correlation), nonlinear dependency, shape of the probability density function (pdf), and vector dimensionality itself. In contrast, scalar quantization can utilize effectively only linear dependency and pdf shape. The basic concepts are illustrated by means of simple examples and the theoretical limits of vector quantizer performance are reviewed, based on results from rate-distortion theory. Practical issues relating to quantizer design, implementation, and performance in actual applications are explored. While many of the methods presented are quite general and can be used for the coding of arbitrary signals, this paper focuses primarily on the coding of speech signals and parameters.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1977

Stable and efficient lattice methods for linear prediction

John Makhoul

A class of stable and efficient recursive lattice methods for linear prediction is presented. These methods guarantee the stability of the all-pole filter, with or without windowing of the signal, with finite wordlength computations, and at a computational cost comparable to the traditional autocorrelation and covariance methods. In addition, for data-compression purposes, quantization of the reflection coefficients can be accomplished within the recursion, if desired.

IEEE Transactions on Signal Processing | 1991

Discrete all-pole modeling

Amro El-Jaroudi; John Makhoul

A method for parametric modeling and spectral envelopes when only a discrete set of spectral points is given is introduced. This method, called discrete all-pole (DAP) modeling, uses a discrete version of the Itakura-Saito distortion measure as its error criterion. One result is an autocorrelation matching condition that overcomes the limitations of linear prediction and produces better fitting spectral envelopes for spectra that are representable by a relatively small discrete set of values, such as in voiced speech. An iterative algorithm for DAP modeling that is shown to converge to a unique global minimum is presented. Results of applying DAP modeling to real and synthetic speech are also presented. DAP modeling is extended to allow frequency-dependent weighting of the error measure, so that spectral accuracy can be enhanced in certain frequency regions. >

meeting of the association for computational linguistics | 2014

Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Jacob Devlin; Rabih Zbib; Zhongqiang Huang; Thomas Lamar; Richard M. Schwartz; John Makhoul

Recent work has shown success in using neural network language models (NNLMs) as features in MT systems. Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. Our model is purely lexicalized and can be integrated into any MT decoder. We also present several variations of the NNJM which provide significant additive improvements.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1980

A fast cosine transform in one and two dimensions

John Makhoul

The discrete cosine transform (DCT) of an N-point real signal is derived by taking the discrete Fourier transform (DFT) of a 2N-point even extension of the signal. It is shown that the same result may be obtained using only an N-point DFT of a reordered version of the original signal, with a resulting saving of 1/2. If the fast Fourier transform (FFT) is used to compute the DFT, the result is a fast cosine transform (FCT) that can be computed using on the order of N \log_{2} N real multiplications. The method is then extended to two dimensions, with a saving of 1/4 over the traditional method that uses the DFT.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1999

An omnifont open-vocabulary OCR system for English and Arabic

Issam Bazzi; Richard M. Schwartz; John Makhoul

We present an omnifont, unlimited-vocabulary OCR system for English and Arabic. The system is based on hidden Markov models (HMM), an approach that has proven to be very successful in the area of automatic speech recognition. We focus on two aspects of the OCR system. First, we address the issue of how to perform OCR on omnifont and multi-style data, such as plain and italic, without the need to have a separate model for each style. The amount of training data from each style, which is used to train a single model, becomes an important issue in the face of the conditional independence assumption inherent in the use of HMMs. We demonstrate mathematically and empirically how to allocate training data among the different styles to alleviate this problem. Second, we show how to use a word-based HMM system to perform character recognition with unlimited vocabulary. The method includes the use of a trigram language model on character sequences. Using all these techniques, we have achieved character error rates of 1.1 percent on data from the University of Washington English Document Image Database and 3.3 percent on data from the DARPA Arabic OCR Corpus.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1978

A class of all-zero lattice digital filters: Properties and applications

John Makhoul

A class of minimum- or maximum-phase all-zero lattice digital filters, based on the two-multiplier lattice of Itakura and Saito, is developed. Different lattice forms with different numbers of multipliers are derived, including two one-multiplier forms. Many of the properties of these lattice filters are given, including the important orthogonalization and decoupling properties of successive stages in optimal inverse filtering of signals. These properties lead to important applications in the areas of adaptive linear prediction and adaptive Wiener filtering. As a specific example, the design of a new fast start-up equalizer is presented.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1975

Spectral linear prediction: Properties and applications

John Makhoul

Linear prediction (LP) is presented as a spectral modeling technique in which the signal spectrum is modeled by an all-pole spectrum. The method allows for arbitrary spectral shaping in the frequency domain, and for modeling of continuous as well as discrete spectra (such as filter bank spectra). In addition, using the method of selective linear, prediction, all-pole modeling is applied to selected portions of the spectrum, with applications to speech recognition and speech compression. LP is compared with traditional analysis-by-synthesis (AbS) techniques for spectral modeling. It is found that linear prediction offers computational advantages over AbS, as well as better modeling properties if the variations of the signal spectrum from the desired spectral model are large. For relatively smooth spectra and for filter bank spectra, AbS is judged to give better results. Finally, a sub-optimal solution to the problem of all-zero modeling using LP is given.

international conference on acoustics, speech, and signal processing | 1984

Improved hidden Markov modeling of phonemes for continuous speech recognition

Richard M. Schwartz; Yen-Lu Chow; S. Roucos; Michael A. Krasner; John Makhoul

This paper discusses the use of the Hidden Markov Model (HMM) in phonetic recognition. In particular, we present improvements that deal with the problems of modeling the effect of phonetic context and the problem of robust pdf estimation. The effect of phonetic context is taken into account by conditioning the probability density functions (pdfs) of the acoustic parameters on the adjacent phonemes, only to the extent that there are sufficient tokens of the phoneme in that context. This partial conditioning is achieved by combining the conditioned and unconditioned pdfs models with weights that depend on the confidence in each pdf estimate. This combination is shown to result in better performance than either model by itself. We also show that it is possible to obtain the computational advantages of using discrete probability densities without the usual requirement for large amounts of training data.

international conference on acoustics, speech, and signal processing | 1979

High-frequency regeneration in speech coding systems

John Makhoul; Michael G. Berouti

The traditional method of high-frequency regeneration (HFR) of the excitation signal in baseband coders has been to rectify the transmitted baseband, followed by spectral flattening. In addition, a noise source is added at high frequencies to compensate for lack of energy during certain sounds. In this paper, we reexamine the whole HFR process. We show that the degree of rectification does not affect the output speech, and that, with proper processing, the high-frequency noise source may be eliminated. We introduce a new type of HFR based on spectral duplication of the baseband. Two types of spectral duplication are presented: spectral folding and spectral translation. Finally, in order to eliminate the problem of breaking the harmonic structure due to spectral duplication, we propose a pitch-adaptive spectral duplication scheme in the frequency domain by using adaptive transform coding to code the baseband.

Explore More