Peter Birkholz
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Birkholz.
international conference on acoustics, speech, and signal processing | 2006
Peter Birkholz; Dietmar Jackèl; K.J. Kroger
We present a novel 3D vocal tract model and a method to control the articulatory movements of the model. The vocal tract model consists of 7 wireframe meshes that represent the three dimensional surfaces of the articulators and the vocal tract walls. 23 parameters determine the shape of the meshes. The articulatory movements in terms of the parameter curves are generated from a gestural description of an utterance. The work presented here is an integral part of a complete articulatory speech synthesizer for high quality synthesis
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Peter Birkholz; Dietmar Jackèl; Bernd J. Kröger
Flow separation in the vocal system at the outlet of a constriction causes turbulence and a fluid dynamic pressure loss. In articulatory synthesizers, the pressure drop associated with such a loss is usually assumed to be concentrated at one specific position near the constriction and is represented by a lumped nonlinear resistance to the flow. This paper highlights discontinuity problems of this simplified loss treatment when the constriction location changes during dynamic articulation. The discontinuities can manifest as undesirable acoustic artifacts in the synthetic speech signal that need to be avoided for high-quality articulatory synthesis. We present a solution to this problem based on a more realistic distributed consideration of fluid dynamic pressure changes. The proposed method was implemented in an articulatory synthesizer where it proved to prevent any acoustic artifacts
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Peter Birkholz; Bernd J. Kröger; Christiane Neuschaefer-Rube
We present a novel quantitative model for the generation of articulatory trajectories based on the concept of sequential target approximation. The model was applied for the detailed reproduction of movements in repeated consonant-vowel syllables measured by electromagnetic articulography (EMA). The trajectories for the constrictor (lower lip, tongue tip, or tongue dorsum) and the jaw were reproduced. Thereby, we tested the following hypotheses about invariant properties of articulatory commands: (1) The target of the primary articulator for a consonant is invariant with respect to phonetic context, stress, and speaking rate. (2) Vowel targets are invariant with respect to speaking rate and stress. (3) The onsets of articulatory commands for the jaw and the constrictor are synchronized. Our results in terms of high-quality matches between observed and model-generated trajectories support these hypotheses. The findings of this study can be applied to the development of control models for articulatory speech synthesis.
COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007
Bernd J. Kröger; Peter Birkholz
An articulatory speech synthesizer comprising a three-dimensional vocal tract model and a gesture-based concept for control of articulatory movements is introduced and discussed in this paper. A modular learning concept based on speech perception is outlined for the creation of gestural control rules. The learning concept includes on sensory feedback information for articulatory states produced by the model itself, and auditory and visual information of speech items produced by external speakers. The complete model (control module and synthesizer) is capable of producing high-quality synthetic speech signals and introduces a scheme for the natural speech production and speech perception processes.
PLOS ONE | 2013
Peter Birkholz
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
PLOS ONE | 2013
Yi Xu; Albert Lee; Wing Li Wu; Xuan Liu; Peter Birkholz
Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language.
NeuroImage | 2013
Jessica Junger; Katharina Pauly; Sabine Bröhr; Peter Birkholz; Christiane Neuschaefer-Rube; Christian G. Kohler; Frank Schneider; Birgit Derntl; Ute Habel
The basis for different neural activations in response to male and female voices as well as the question, whether men and women perceive male and female voices differently, has not been thoroughly investigated. Therefore, the aim of the present study was to examine the behavioral and neural correlates of gender-related voice perception in healthy male and female volunteers. fMRI data were collected while 39 participants (19 female) were asked to indicate the gender of 240 voice stimuli. These stimuli included recordings of 3-syllable nouns as well as the same recordings pitch-shifted in 2, 4 and 6 semitone steps in the direction of the other gender. Data analysis revealed a) equal voice discrimination sensitivity in men and women but better performance in the categorization of opposite-sex stimuli at least in men, b) increased responses to increasing gender ambiguity in the mid cingulate cortex and bilateral inferior frontal gyri, and c) stronger activation in a fronto-temporal neural network in response to voices of the opposite sex. Our results indicate a gender specific processing for male and female voices on a behavioral and neuronal level. We suggest that our results reflect higher sensitivity probably due to the evolutionary relevance of voice perception in mate selection.
PLOS ONE | 2014
Jessica Junger; Ute Habel; Sabine Bröhr; Josef Neulen; Christiane Neuschaefer-Rube; Peter Birkholz; Christian G. Kohler; Frank Schneider; Birgit Derntl; Katharina Pauly
Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex.
SSW | 2007
Peter Birkholz; Ingmar Steiner; Stefan Breuer
We present two concepts for the generation of gestural scores to control an articulatory speech synthesizer. Gestural scores are the common input to the synthesizer and constitute an organized pattern of articulatory gestures. The first concept generates the gestures for an utterance using the phonetic transcriptions, phone durations, and intonation commands predicted by the Bonn Open Synthesis System (BOSS) from an arbitrary input text. This concept extends the synthesizer to a text-to-speech synthesis system. The idea of the second concept is to use timing information extracted from Electromagnetic Articulography signals to generate the articulatory gestures. Therefore, it is a concept for the re-synthesis of natural utterances. Finally, application prospects for the presented synthesizer are discussed.
Journal of the Acoustical Society of America | 2015
Matthias Echternach; Peter Birkholz; Louisa Traser; Tabea Flügge; Robert Kamberger; Fabian Burk; Michael Burdumy; Bernhard Richter
The role of the vocal tract for phonation at very high soprano fundamental frequencies (F0s) is not yet understood in detail. In this investigation, two experiments were carried out with a single professional high soprano subject. First, using two dimensional (2D) dynamic real-time magnetic resonance imaging (MRI) (24 fps) midsagittal and coronal vocal tract shapes were analyzed while the subject sang a scale from Bb5 (932 Hz) to G6 (1568 Hz). In a second experiment, volumetric vocal tract MRI data were recorded from sustained phonations (13 s) for the pitches C6 (1047 Hz) and G6 (1568 Hz). Formant frequencies were measured in physical models created by 3D printing, and calculated from area functions obtained from the 3D vocal tract shapes. The data showed that there were only minor modifications of the vocal tract shape. These changes involved a decrease of the piriform sinus as well as small changes of tongue position. Formant frequencies did not exhibit major differences between C6 and G6 for F1 and F3, respectively. Only F2 was slightly raised for G6. For G6, however, F2 is not excited by any voice source partial. Therefore, this investigation was not able to confirm that the analyzed professional soprano subject adjusted formants to voice source partials for the analyzed F0s.