Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Susan R. Hertz is active.

Publication


Featured researches published by Susan R. Hertz.


Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

Integration of rule-based formant synthesis and waveform concatenation: a hybrid approach to text-to-speech synthesis

Susan R. Hertz

This paper describes an approach to speech synthesis in which waveform fragments dynamically produced with a set of formant-based synthesis rules are concatenated with pre-stored natural speech waveform fragments to produce a synthetic utterance. While this hybrid approach was originally implemented as a tool for research into improved voice quality in formant-based synthesis, it has produced such good results that we now view it as a potentially viable and advantageous approach for a text-to-speech product. Possible advantages of the approach include smaller speech databases for waveform concatenation, enhancement of certain speech cues for sub-optimal listening environments, and improved and more efficient unit selection/production. In addition, the approach has already proven its utility as a tool for research and development in both concatenative and formant-based synthesis.


international conference on acoustics, speech, and signal processing | 1981

SRS text-to-phoneme rules: A three-level rule strategy

Susan R. Hertz

SRS (Speech Research System), the interactive speech synthesis system at Cornell University, now allows users to work with three kinds of text-tophoneme rules: text-modification rules, conversion rules, and feature-modification rules. These rules apply in succession to convert a text utterance into a string of phonemes. The SRS strategy is being used to develop a practical set of text-tophoneme rules for English. Performance tests show very promising results.


Journal of the Acoustical Society of America | 1985

A versatile dictionary for speech synthesis by rule

Susan R. Hertz

All synthesis rule developers are faced with the problem of handling phenomena that cannot easily be captured in rules. The Delta System [J. Acoust. Soc. Am. Suppl. 1 75, S60 (1984)] provides the rule writer with an especially versatile exception dictionary. The dictionary has two parts: the active dictionary and sets. The active dictionary can store token sequences representing units of any kind (e.g., phrases, words, demisyllables) and associate arbitrary actions with them. For example, an action might specify a pronunciation, as in conventional dictionaries, or it might invoke a rule. An action can be restricted to the portion of an entry that is an exception. Sets contain token sequences but no actions. They provide an especially compact way to group together items that behave similarly. Rules can test a token sequence for membership in a set to determine whether to apply to the sequence. Deltas dictionary is fully integrated into the rule system it accompanies. It can set variables for the rule program, influence the programs flow of control, and manipulate the utterance being synthesized.


Journal of the Acoustical Society of America | 1989

A new approach to English text‐to‐phoneme conversion using Delta, version 2

Susan McCormick; Susan R. Hertz

The Delta System, version 2 is being used to develop a new set of text‐to‐speech rules for English. This paper focuses on the text‐to‐phoneme portion of these rules. These text‐to‐phoneme rules are based on an earlier rule set developed by Hertz [J. Acoust. Soc. Am. Suppl. 1 69, S83 (1981)], but are much more elegant and complete, taking advantage of Deltas flexible facilities for building multilevel utterance representations and matching patterns against them. The rules convert an input text into an utterance representation containing the lexical, morphological, and phonological structure of the utterance. Sophisticated algorithms are used for identifying prefixes (productive and nonproductive ones), suffixes, and roots (including the roots of compound words), for predicting lexical stress, for assigning grammatical categories, and for predicting pronunciations. Portions of these algorithms will be described, with a focus on how the Delta programming language allows them to be expressed succinctly and c...


international conference on acoustics, speech, and signal processing | 1983

A look at the SRS synthesis rules for Japanese

Susan R. Hertz; Mary E. Beckman

SRS (Speech Research System), Hertzs interactive synthesis rule development system [1], has been used recently to develop a set of phoneme-based rules for Japanese. This paper describes these rules, showing how they transform text through successive stages to produce natural-sounding Japanese utterances. Although the paper presents our overall synthesis strategy for Japanese, it leans quite heavily in its examples toward the description of our strategy for sentence prosody, this being an area where the lack of standard linguistic models has made our approach necessarily novel.


Journal of the Acoustical Society of America | 2009

Challenges in evaluating the intelligibility of text‐to‐speech.

Ann K. Syrdal; Murray F. Spiegel; Deborah Rekart; Susan R. Hertz; Thomas D. Carrell; H. Timothy Bunnell; Corine Bickley

Text‐to‐speech (TTS) technology imposes different constraints on intelligibility than those sufficient for the evaluation of other speech communication systems. For example, the newly revised standard S.2‐2009 explicitly excludes TTS from the speech communication systems it covers. Since there is no current standard appropriate for evaluating TTS intelligibility, the ASA Standards Bioacoustics (S3) working group on Text‐to‐Speech Technology (WG91) was formed with the initial goal of developing such standard. We describe several ways in which standard methods of testing speech intelligibility are unsuitable for TTS technology and outline our approach to overcoming these limitations. We present an overview of our proposed standard, which is currently nearing its final draft stages.


Journal of the Acoustical Society of America | 2006

Speaker identification in hybrid synthesis: Implications for speech perception

Susan R. Hertz; Isaac C. Spencer

As part of an evaluation of hybrid synthesis [Hertz, Proc. IEEE Workshop on Speech Synthesis (2002)], perceptual experiments were conducted that tested the hypothesis that stressed vowels are the primary cues to speaker identification. Hybrid sentences were constructed for eight voices, including child and adult and male and female, in which stressed vowels were taken from a single human speaker, but other segments were replaced by surrogates from different sources. Some surrogates were natural speech segments; others were formant synthesized. Some matched the age or gender of the stressed vowel speaker; others did not. After being trained on six human target voices, listeners were asked to identify the hybrid stimuli, and also fully synthetic and natural stimuli (for target and nontarget voices), in terms of age, gender, and whether and how much they matched a target voice. For all categories, hybrid stimuli, in contrast to synthetic ones, were identified as accurately as natural speech, both by listener...


Journal of the Acoustical Society of America | 2004

Perceptual consequences of nasal consonant ‘‘surrogates’’ in English: Implications for speech synthesis

Susan R. Hertz; Isaac C. Spence; Thomas F. Church; Richard Goldhor

Experiments indicate that non‐nasal obstruents in human utterances can be replaced by ‘‘surrogate’’ segments, either produced by formant synthesis or recorded from other speakers, with virtually no change in speech quality or speaker identity [Hertz, Proc. IEEE 2002 Workshop on Speech Synthesis (2002)]. While the durational and spectral properties of the surrogate segments must be broadly appropriate to their target context, no speaker‐specific tailoring is required. This paper describes follow‐on experiments studying the perceptual consequences of replacing nasal consonants in human utterances with surrogate segments from different phonetic contexts, either synthesized or spoken by other speakers. These experiments indicate that the manipulated speech sounds natural when surrogate segment durations, and the formant transitions and nasalization characteristics of adjacent vowels, are appropriate. In certain contexts F0 is also perceptually salient. The spectral characteristics of surrogate nasal murmurs a...


Journal of the Acoustical Society of America | 1994

Syllt for building deltas: Simple speech synthesis for teaching and research

Susan R. Hertz; Elizabeth C. Zsiga; Marie K. Huffman

Syllt is a new computer software tool for speech synthesis on PCs and Sun workstations. Using the Delta System [Hertz, Papers in Laboratory Phonology I (1990)], Syllt implements a phone‐to‐speech rule set that synthesizes high‐quality CVC syllables from a string of phones entered by the user. The program allows rapid creation of synthetic stimuli for perception experiments, and provides a tool for teaching acoustic phonetics. In the process of creating speech output, Syllt produces a multi‐tiered utterance representation (a ‘‘delta’’) that coordinates phonological units (such as phonemes), phonetic units (such as bursts), and quantitative parameter values (such as formant frequencies) for a Klatt synthesizer. Users can interactively manipulate the value and relative timing of any of these elements—for example, timing of voicing relative to stop release, amplitude and frequency of burst noise, or formant trajectories. The result of any change can be immediately heard and evaluated. Delta manipulation can b...


Journal of the Acoustical Society of America | 1994

Annotation and prosodic control in the Eloquence text‐to‐speech system

Kenneth deJong; Susan R. Hertz

A simple set of text annotations that enables users to produce sophisticated prosodic effects is being developed as part of the Eloquence text‐to‐speech system. The annotations relate to concepts such as ‘‘emphasis’’ and ‘‘level of excitement.’’ The rules interpret the typically sparse annotations in the process of building up a rich, ‘‘multi‐stream’’ phonological and phonetic representation from which the final values for synthesis are derived. This structure includes prosodic phrases with associated tones, words with associated pitch accents, syllables and their nuclei, fundamental frequency values, and durations. For example, marking a work for emphasis triggers several actions: the rules place an accent on the word; they associate tones appropriate to the level of emphasis and phrase type; they attract nuclear stress to the word and deaccent following words in the phrase; they increase the pitch range at the emphasized word; and they lengthen the accented syllable in accordance with our nucleus‐based ...

Collaboration


Dive into the Susan R. Hertz's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

H. Timothy Bunnell

Alfred I. duPont Hospital for Children

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge