Hansjörg Mixdorff
Humboldt University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hansjörg Mixdorff.
international conference on acoustics, speech, and signal processing | 2000
Hansjörg Mixdorff
The generation of naturally-sounding F0 contours in TTS is important for the intelligibility and perceived naturalness of synthetic speech. In earlier works the author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0. The extraction of parameters for this model from the extracted F0 contour, however, poses problems since model components are superimposed in a particular contour and cannot be calculated directly. The current paper introduces a novel fully-automatic multi-stage approach which was applied to a larger speech database of German. After explaining the modeling procedure in detail, the paper presents first results concerning the relationship between the Fujisaki parameters and the linguistic information underlying an utterance.
Speech Communication | 2005
Hansjörg Mixdorff; Hartmut R. Pfitzinger
Abstract The current paper reports first results from the analysis of task-oriented dialogs using a Fujisaki model-based parameterization of F0 contours, as well as a model of the perceptual local speech rate. Two versions of map task style dialogs were examined: (1) the recordings made during the map task proper, (2) readings from scripts of the original dialogs by the same subjects. The first part of this paper presents an analysis of phrase boundaries with respect to form and function. A second issue is the problem of processing fillers, hesitations and repairs within the framework of the Fujisaki model-based analysis. The second part of the paper describes the comparative analysis of spontaneous and read versions of the same dialog fragments with respect to Fujisaki model parameters, contours of the perceptual local speech rate, and other features. In a perception test we asked listeners to identify the speaking style of dialog fragments. Apparently this was possible only for part of the data. Analysis of accent commands and perceptual local speech rate contours still suggested differences between the two speaking styles. The number of accented syllables, the associated accent commands’ amplitudes, and the perceptual local speech rate were generally higher in the read than in the spontaneous utterances. These results were almost significant despite the fact that the read version had been well re-enacted by the subjects and therefore did not exactly exhibit typical reading style characteristics. Despite this drawback, the methodology presented here has strong potential for further comparative prosodic studies of speaking styles.
International Journal of Speech Technology | 2003
Hansjörg Mixdorff; Oliver Jokisch
The perceived quality of synthetic speech strongly depends on its prosodic naturalness. Departing from earlier works by Mixdorff on a linguistically motivated model of German intonation based on the Fujisaki model, an integrated approach to predicting F0 along with syllable duration and energy was developed. The current paper first presents some statistical results concerning the relationship between linguistic and phonetic information underlying an utterance and its prosodic features. These results were employed for training the MFN-based integrated prosodic model predicting syllable duration and energy along with syllable-aligned Fujisaki control parameters. The paper then focusses on the method of perceptual evaluation developed, comparing resynthesis stimuli created by controlled prosodic degrading of natural speech with stimuli created using the integrated model. The results indicate that the integrated model generally receives better ratings than degraded stimuli with comparable durational and F0 deviations from the original. An important outcome is the observation that the accuracy of the predicted syllable durations appears to be a stronger factor with respect to the perceived quality than the accuracy of the predicted F0 contour.
2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009
Chia-yu Chiu; Yuan-fu Liao; Daniel Külls; Hansjörg Mixdorff; Shing-lung Chen
This paper reports on the progress of a joint German-Taiwan computer assisted language learning (CALL) project. One major goal of this project is to collect a bi-lingual (both native and second language, i.e., L1 and L2) speech corpus of L2 learners of German and Mandarin across German and Taiwan. In the preparation phase of the database collection, contrastive analysis of German and Mandarin phonetic and prosodic systems is performed, and the potential pronunciation errors predicted to be made by L2 learners are hypothesized in a set of confusion tables. We expect to apply the set of confusable tables to database design. The eventual database collection will be conducted during the next three years.
9th International Conference on Speech Prosody 2018 | 2018
Tan Lee; King Hang Matthew Ma; Albert Rilliard; Hansjörg Mixdorff; Angelika Hönemann
This paper reports results from a free labeling experiment employing short audio-visual utterances of Cantonese produced with varying attitudinal expressions. It is part of a series of such experiments with a cross-language setting between German and Cantonese. Cantonese-speaking perceivers were asked to specify a single word that best described these stimuli, which were presented in audio-visual, audio-only, and video-only modalities. The resulting terms were analyzed with respect to the emotional dimensions of valence, activation and dominance, as well as the linguistic dimension of assertion/interrogation. The analysis results are compared with the outcomes from similar experiments employing German stimuli with Cantonese perceivers, as well as German perceivers assessing both German and Cantonese stimuli. It is found that Cantonese perceivers judge the Cantonese stimuli as more activated than German perceives do. The valence judgments agree relatively well, however, “polite” stimuli were judged less positively by Cantonese perceivers. Generally speaking, valence judgments are mostly influenced by the stimuli whereas activation and dominance judgments depend more on the perceiver group.
Archive | 2015
Hansjörg Mixdorff
The Fujisaki model provides a parsimonious representation of naturally observed fundamental frequency contours. It decomposes these contours into a constant base frequency onto which are superimposed global phrase and local accent gestures whose amplitudes and timings are given by underlying phrase and accent commands. These commands are estimated in a process of Analysis-by-Synthesis. The current chapter describes methods for extraction of Fujisaki model parameters, and then presents applications of the model parameters in several fields of research, especially speech synthesis.
conference of the international speech communication association | 2003
Hansjörg Mixdorff; Hiroya Fujisaki; Gao Peng Chen; Yu Hu
conference of the international speech communication association | 2001
Hansjörg Mixdorff; Oliver Jokisch
Archive | 2002
Hansjörg Mixdorff
conference of the international speech communication association | 2003
Hansjörg Mixdorff; Nguyen Bach; Hiroya Fujisaki; Chi Mai Luong