Jürgen Trouvain
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jürgen Trouvain.
SSW | 2003
Marc Schröder; Jürgen Trouvain
This paper introduces the German text-to-speech synthesis system MARY. The systems main features, namely a modular design and an XML-based system-internal data representation, are pointed out, and the properties of the individual modules are briefly presented. An interface allowing the user to access and modify intermediate processing steps without the need for a technical understanding of the system is described, along with examples of how this interface can be put to use in research, development and teaching. The usefulness of the modular and transparent design approach is further illustrated with an early prototype of an interface for emotional speech synthesis.
Archive | 2007
Jürgen Trouvain; Ulrike Gut
This volume presents an overview of the state of the art in second language prosody learning and teaching. The first part comprises descriptions of non-native intonation, stress and speech rhythm written by experts in the field in a format accessible to language teachers. In the second part, leading teaching practitioners present a variety of methods and exercises in the area of prosody. The volume is accompanied by a CD-ROM with audio examples.
Neurocase | 2009
Ingo Hertrich; Susanne Dietrich; Anja Moos; Jürgen Trouvain; Hermann Ackermann
Blind individuals may learn to understand ultra-fast synthetic speech at a rate of up to about 25 syllables per second (syl)/s, an accomplishment by far exceeding the maximum performance level of normal-sighted listeners (8–10 syl/s). The present study indicates that this exceptional skill engages distinct regions of the central-visual system. Hemodynamic brain activation during listening to moderately- (8 syl/s) and ultra-fast speech (16 syl/s) was measured in a blind individual and six normal-sighted controls. Moderately-fast speech activated posterior and anterior ‘language zones’ in all subjects. Regarding ultra-fast tokens, the controls showed exclusive activation of supratemporal regions whereas the blind participant exhibited enhanced left inferior frontal and temporoparietal responses as well as significant hemodynamic activation of left fusiform gyrus (FG) and right primary visual cortex. Since left FG is known to be involved in phonological processing, this structure, presumably, provides the functional link betweeen the central-auditory and -visual systems.
Psychophysiology | 2012
Ingo Hertrich; Susanne Dietrich; Jürgen Trouvain; Anja Moos; Hermann Ackermann
During speech perception, acoustic correlates of syllable structure and pitch periodicity are directly reflected in electrophysiological brain activity. Magnetoencephalography (MEG) recordings were made while 10 participants listened to natural or formant-synthesized speech at moderately fast or ultrafast rate. Cross-correlation analysis was applied to show brain activity time-locked to the speech envelope, to an acoustic marker of syllable onsets, and to pitch periodicity. The envelope yielded a right-lateralized M100-like response, syllable onsets gave rise to M50/M100-like fields with an additional anterior M50 component, and pitch (ca. 100 Hz) elicited a neural resonance bound to a central auditory source at a latency of 30 ms. The strength of these MEG components showed differential effects of syllable rate and natural versus synthetic speech. Presumingly, such phase-locking mechanisms serve as neuronal triggers for the extraction of information-bearing elements.
Archive | 2007
Eva Lasarcyk; Jürgen Trouvain
In this study we present initial efforts to model laughter with an articulatory speech synthesizer. We aimed at imitating a real laugh taken from a spontaneous speech database and created several synthetic versions of it using articulatory synthes is and diphone synthesis. In modeling laughter with articulatory synthesis, we also approximated features like breathing noises that do not normally occur in speech. Evaluation with respect to the perceived degree of naturalness indicated that the laugh stimuli would pass as “laughs” in an appropriate conversational context. In isolation, though, significant differences could be measured with regard to the degree of variation (durational patterning, fundamental frequency, intensity) within each laugh.
Journal of Phonetics | 2015
Petra Wagner; Jürgen Trouvain; Frank Zimmerer
Abstract Recently, the debate about what kind of speech data is most appropriate for linguistic research has intensified. Generally, with ‘laboratory speech’ defenders on the one hand and ‘natural speech’ proponents on the other, two seemingly clearly distinct phonetic data types have been identified. In this article, this dichotomy is called into question. Results from previous studies on segmental phonetics, prosody and paralinguistics indicate that the data we are using may indeed have had an immense influence on our results. The research papers in the present Special Issue in Journal of Phonetics provide further evidence for the style-dependency of speech data and hence, on our theories and models. Importantly, they also show that some results remain stable independently of the speaking style under investigation. We claim that these findings do not point to an inherent superiority of one particular type of data used in phonetics research. Instead, we argue for a stronger methodological awareness in investigations of speech phenomena and more cautious interpretations of the findings that we make. We also believe that we need a much better understanding of the extent to which our methods and our ways of collecting speech data influence our results. A generally increased methodological awareness and a higher variety of investigated styles of speech will promote our research progress further than a continuing argument for or against using one particular type of speech data.
agent-directed simulation | 2004
Jürgen Trouvain; Marc Schröder
Laughter is a powerful means of emotion expression which has not yet been used in speech synthesis. The current paper reports on a pilot study in which differently created types of laughter were combined with synthetic speech in a dialogical situation. A perception test assessed the effect on perceived social bonding as well as the appropriateness of the laughter. Results indicate that it is crucial to carefully model the intensity of the laughter, whereas speaker identity and generation method appear less important.
Archive | 2007
Jürgen Trouvain
We report on a pilot study testing the subjective comprehension of tempo-scaled synthetic speech with 9 sighted and 2 blind students. German texts (length, 100 words) were generated with a formant synthesizer and a diphone synthesizer at seven different tempo steps from 3.5 syllables per second (s/s) to 17.5 s/s. The results show that the blind subjects can understand formant synthesis at all offered rates, whereas the performance of their sighted peers declines at a rate of 10.5 s/s. Contrary to our expectations, diphone synthesis is less easy to understand than formant synthesis for both groups at rates faster than 7.5 s/s. The potential reasons for these two main findings are discussed.
Journal of the International Phonetic Association | 2008
William J. Barry; Jürgen Trouvain
The present discussion re-opens an old issue that was ‘officially discussed’ in Kiel in 1989 but has not been offered for debate in the wider phonetic community. It is argued that there is a logical and practical gap in the present IPA vowel chart. The lack of a central open vowel is unsatisfactory, in particular because more languages have a single open vowel with an apparently more central than fronted or backed quality. Arguments and suggestions for a number of alternative solutions to the problem are presented for discussion.
International Journal of Speech Technology | 2003
Caren Brinckmann; Jürgen Trouvain
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in the perceptual quality of a text-to-speech system. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of a symbolic representation which was either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic representation is appropriate. Considering the relative importance of the symbolic representation, post-lexical segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to derive an appropriate phonological symbolic representation in order to improve timing in synthetic speech.