Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dennis H. Klatt is active.

Publication


Featured researches published by Dennis H. Klatt.


Journal of the Acoustical Society of America | 1980

Software for a cascade/parallel formant synthesizer

Dennis H. Klatt

A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer. A flexible synthesizer configuration permits the synthesis of sonorants by either a cascade or parallel connection of digital resonators, but frication spectra must be synthesized by a set of resonators connected in parallel. A control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points. The synthesizer design is described and motivated in Secs. I–III, and fortran listings for the synthesizer and control program are provided in an appendix. Computer requirements and necessary support software are described in Sec. IV. Strategies for the imitation of any speech utterance are described in Sec. V, and suggested values of control parameters for the synthesis of many English sounds are presented in tabular form.


Journal of the Acoustical Society of America | 1976

Linguistic uses of segmental duration in English: acoustic and perceptual evidence.

Dennis H. Klatt

The pattern of durations of individual phonetic segments and pauses conveys information about the linguistic content of an utterance. Acoustic measures of segmental timing have been used by many investigators to determine the variables that influence the durational structure of a sentence. The literature on segmental duration is reviewed and related to perceptual data on the discrimination of duration and to psychophysical data on the ability of listeners to make linguistic decisions on the basis of durational cues alone. We conclude that, in English, duration often serves as a primary perceptual cue in the distinctions between (1) inherently long verses short vowels, (2) voiced verses voiceless fricatives, (3) phrase‐final verses non‐final syllables, (4) voiced versus voiceless postvocalic consonants, as indicated by changes to the duration of the preceding vowel in phrase‐final positions, (5) stressed verses unstressed or reduced vowels, and (6) the presence or absence of emphasis.Subject Classification...


Journal of the Acoustical Society of America | 1987

Review of text‐to‐speech conversion for English

Dennis H. Klatt

The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix.


Journal of the Acoustical Society of America | 1974

Role of formant transitions in the voiced‐voiceless distinction for stops

Kenneth N. Stevens; Dennis H. Klatt

Previous research on acoustic cues responsible for the voiced‐voiceless distinction in prestressed English plosives has emphasized the importance of voicing onset time with respect to plosive release (VOT). Voiced plosives in English normally have a short VOT (less than 20–30 msec) and a significant formant transition is present following voice onset. Voiceless plosives in prestressed position, on the other hand, have relatively long VOTs (greater than about 50 msec) and the formant transitions are essentially completed prior to voice onset. Our experiments with synthetic speech compare the role of VOT and the presence or absence of a significant formant transition following voicing onset as cues for the voiced‐voiceless distinction. The data indicate that there is a significant trading relationship between these two cues. The presence or absence of a rapid spectral change following voice onset produces up to 15‐msec change in the location of the perceived phoneme boundary as measured in terms of absolut...


international conference on acoustics, speech, and signal processing | 1982

Prediction of perceived phonetic distance from critical-band spectra: A first step

Dennis H. Klatt

Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance (e.g. spectral tilt, relative formant amplitudes, high-pass, low-pass, and notch filtering). These data can be used to evaluate a spectral distance metric. For example, distance calculations based on the sum-of-squares of differences in critical-band filter bank outputs and those based on the linear-prediction residual correlate poorly with the vowel distance-judgement data. On the other hand, a metric based on spectral slope differences near the peaks in the critical-band spectra to be compared can be made to correlate very well (0.93) with the perceptual data.


Journal of the Acoustical Society of America | 1976

Review of the ARPA speech understanding project

Dennis H. Klatt

In September of 1976, four speech understanding systems were demonstrated, signifying the end of a five‐year program of research and development sponsored by the Advanced Research Projects Agency (ARPA). The best performance was displayed by the Harpy system developed at Carnegie–Mellon University. Harpy satisfied a set of design goals that were specified at the beginning of the program, including the gal of understanding over 90% of a set of naturally spoken sentences composed from a 1000‐word lexicon. After defining the nature of the speech understanding problem, the four systems are described and critically evaluated. Based on this review, a structure for a next‐generation speech understanding system is proposed and parts of it are considered as a possible model of the early stages of speech perception. The perceptual model addresses the issue of lexical access and includes a decoding network composed of expected spectral sequences for all word strings of English.


Journal of Verbal Learning and Verbal Behavior | 1979

The Limited Use of Distinctive Features and Markedness in Speech Production: Evidence from Speech Error Data.

Stefanie Shattuck-Hufnagel; Dennis H. Klatt

Analysis of a phoneme confusion matrix consisting of 1620 spontaneous speech errors shows that each consonant segment appears as an intrusion just about as often as it appears as a target, with the exception of a small set of four segments (/s//t//s//c/) for which there is a target-intrusion asymmetry in the direction of more frequent “palatalizing” errors. With this qualification, it is shown that there is no tendency for linguistically unmarked consonants to replace marked consonants. It is also shown that sound segment errors almost always involve the movement of unitary segments and not the movement of component distinctive features. These results, confirmed in an independently collected corpus of 1369 errors, are compatible with a model of the speech production process in which (1) most phoneme errors occur as the result of a mis-selection between two similar planning segments competing for a single location in an utterance, although (2) voiceless alveolar consonants are subject to a palatalizing mechanism which is the source of further segmental errors.


Journal of the Acoustical Society of America | 1973

Interaction Between Two Factors that Influence Vowel Duration

Dennis H. Klatt

It is well known that stressed vowels are shorter before voiceless consonants than before voiced, and that stressed vowels are shorter before an unstressed syllable in a bisyllabic word than in a monsyllabic word. A set of test materials was designed to study the interaction between these two rules. Results suggest that vowels become strongly incompressible beyond a certain amount of shortening and that vowel duration modification rules should have the form Do = k (Di − Dmin)+Dmin, where Di is the input duration to the rule, Do is the output duration of the rule, Dmin is the minimum duration for the vowel, and the scale factor k is greater than zero and depends on the particular rule. The data indicate that Dmin is about 45% of the inherent duration of a given vowel.


Journal of the Acoustical Society of America | 1973

Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception

Dennis H. Klatt

The just‐noticeable difference (JND) for selected aspects of voice fundamental frequency (F0) contours was determined by varying the F0 control parameter of a digitally simulated terminal analog speech synthesizer. Data were obtained from three subjects for a number of 250‐msec segments of the synthetic vowel /ɛ/ differing only in fundamental frequency. Results indicate that the subjects can detect a change of 0.3 Hz in a constant F0 contour when F0 = 120 Hz, but the JND is an order of magnitude larger (2.0 Hz) when the F0 contour is a linear descending ramp (32 Hz/sec). Sensitivity to rate of change of F0 in linear ramps is surprisingly good; greatest sensitivity occurs when one ramp increases and the other decreases (JND = 12 Hz/sec). High‐pass filtering of the stimuli improves performance slightly, suggesting that the fundamental component is not involved in the detection of changes in F0. Substitution of the synthetic stimulus /ya/ with its dynamic formant contours in place of /ɛ/ degrades performance...


international conference on acoustics, speech, and signal processing | 1976

A digital filter bank for spectral matching

Dennis H. Klatt

A new digital filter bank design is proposed for the processing of speech waveforms where spectral pattern matching techniques are applicable. Outputs in decibels from the 30 channels of the filter bank are computed every 12 ms. Care has been taken to select a time window and filter center frequency and bandwidth values that take into account the acoustic characteristics of speech. A distance metric is proposed for comparing a spectral frame with previously derived reference patterns. The metric incorporates procedures for crude speaker/microphone normalization, signal level normalization, background noise normalization, and procedures for emphasizing differences in the region of spectral peaks.

Collaboration


Dive into the Dennis H. Klatt's collaboration.

Top Co-Authors

Avatar

Kenneth N. Stevens

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefanie Shattuck-Hufnagel

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David B. Pisoni

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Jonathan Allen

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Victor W. Zue

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Rolf Carlson

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge