David L. Thomson
Bell Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David L. Thomson.
Journal of the Acoustical Society of America | 1994
David L. Thomson
A harmonic coding arrangement where the magnitude spectrum of the input speech is modeled at the analyzer by a relatively small set of parameters and, significantly, as a continuous rather than only a line magnitude spectrum. The synthesizer, rather than the analyzer, determines the magnitude, frequency, and phase of a large number of sinusoids which are summed to generate synthetic speech. Rather than receiving information explicitly defining the sinusoids from the analyzer, the synthesizer receives the small set of parameters and uses those parameters to determine a spectrum, which, in turn, is used by the synthesizer to determine the sinusoids for synthesis.
international conference on acoustics speech and signal processing | 1998
David L. Thomson; Rathinavelu Chengalvarayan
We investigate a class of features related to voicing parameters that indicate whether the vocal chords are vibrating. Features describing voicing characteristics of speech signals are integrated with an existing 38-dimensional feature vector consisting of first and second order time derivatives of the frame energy and of the cepstral coefficients with their first and second derivatives. HMM-based connected digit recognition experiments comparing the traditional and extended feature sets show that voicing features and spectral information are complementary and that improved speech recognition performance is obtained by combining the two sources of information.
international conference on acoustics speech and signal processing | 1988
David L. Thomson
A method is described for representing magnitude and phase in a sinusoidal transform coder. Instead of transmitting individual sinusoids, the entire speech spectrum is transmitted. The synthesizer estimates the frequency, amplitude, and phase of each harmonic from the spectrum. Relatively high-quality speech in the 4.8-9.6 kb/s range is obtained by modeling the magnitude/phase spectrum with a combination of pole-zero analysis, phase prediction and vector quantization. A window subtraction method ensures proper synthesis of unvoiced speech. The system is robust since it does not depend on pitch estimates or voicing decisions.<<ETX>>
international conference on acoustics, speech, and signal processing | 1986
David L. Thomson; Dimitrios Panos Prezas
This paper presents a new method of modeling the LPC residual during unvoiced speech for voice coding at 4.8 kb/s. With this method, speech is synthesized using one of three excitation types: periodic pitch pulses, random noise, or multipulse. By using multipulse excitation it is possible to accurately produce speech which is difficult to model using noise and pitch pulses alone [1]. Since multipulse is only used where appropriate, efficient, sub-optimal methods of calculating the pulse amplitudes and positions are adequate, simplifying the implementation into a real-time system. The synthetic speech may be coded at 4.8 kb/s since multipulse, used only where appropriate, suffers little quality loss when quantized. A method of determining which excitation type is to be used is discussed. Formal listening test results are also presented.
international conference on acoustics, speech, and signal processing | 1986
Dimitrios P. Prezas; Joseph Picone; David L. Thomson
A method of determining pitch and voicing information from speech signals is presented. The algorithm, which employs time-domain analysis and pattern recognition techniques, is fast and yields accurate pitch and voicing estimates. A search routine is employed to find periodicity in each of four signals derived from the speech waveform and the results are combined to form a pitch estimate. The voicing decision uses linear discriminant analysis, and declares speech frames voiced or unvoiced based on a weighted sum of 13 parameters. Performance comparisons with other pitch detectors are reported.
international conference on acoustics, speech, and signal processing | 1987
Edward C Bronson; Douglas A Carlone; WBastiaan Kleijn; Kevin M O'dell; Joseph Picone; David L. Thomson
This paper describes a new speech coding technique which yields improved speech quality over existing 2.4 kb/s LPC vocoders. The method is computationally efficient and operates at a data rate of 4.8 kb/s. Each speech frame is initially classified as voiced or unvoiced. Unvoiced frames are synthesized using a linear predictive coding filter with noise or multipulse excitation. Voiced frames are synthesized using a sum of sinusoids. The frequency of each sinusoid is defined by peaks in the frequency spectrum. A new interpolation technique provides a computationally efficient method of locating the spectral peaks. A real-time, fully quantized version has been implemented in hardware.
international conference on acoustics, speech, and signal processing | 1987
David L. Thomson
A new approach to making voiced/unvoiced decisions is presented. The technique is very accurate and dynamically adapts to a wide range of environments. Reliable decisions are achieved by using a weighted sum of multiple speech parameters. Instead of using discriminant analysis to determine the optimal weights, voiced and unvoiced frames are separated into two clusters by a multivariate clustering algorithm. Since cluster analysis requires no prior voicing information, the decision rule is computed from the incoming speech rather than from a training set. An adaptive clustering algorithm is derived which continuously adjusts the weights in response to changing speech characteristics.
Journal of the Acoustical Society of America | 1990
T. E. Jacobs; Dimitrios Panos Prezas; David L. Thomson; Jay G. Wilpon
There are many potential applications for speaker‐independent speech recognition in the telephone network. These include automation of operator services, custom calling feature selection, and call routing. For such applications, recognition under varying background noise levels and speaker characteristics is necessary. In addition, the behavior of untrained users requires the recognizer to be able to: (1) spot keywords embedded in extraneous speech; (2) identify a word as soon as it is spoken, rather than waiting until the end of the recognition window; and (3) determine whether or not the user has spoken a word in the recognition vocabulary. These issues are discussed in this paper in the framework of a hidden Markov model‐based speech recognition system incorporated into the AT&T Intelligent Network in Spain. This system is AT&Ts first application of speech recognition in the telephone network. A description of how the Spanish speech data used for training the recognizer were collected and tested for a...
Journal of the Acoustical Society of America | 1993
David L. Thomson
Journal of the Acoustical Society of America | 1993
David L. Thomson