Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Harald Romsdorfer is active.

Publication


Featured researches published by Harald Romsdorfer.


IEEE Transactions on Multimedia | 2010

A 3-D Audio-Visual Corpus of Affective Communication

Gabriele Fanelli; Jürgen Gall; Harald Romsdorfer; Thibaut Weise; L. Van Gool

Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.


Speech Communication | 2007

Text analysis and language identification for polyglot text-to-speech synthesis

Harald Romsdorfer; Beat Pfister

In multilingual countries, text-to-speech synthesis systems often have to deal with texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such multilingual cultural settings, listeners expect a high-quality text-to-speech synthesis system to read such texts in a way that the origin of the inclusions is heard, i.e., with correct language-specific pronunciation and prosody. The challenge for a text analysis component of a text-to-speech synthesis system is to derive from mixed-lingual sentences the correct polyglot phone sequence and all information necessary to generate natural sounding polyglot prosody. This article presents a new approach to analyze mixed-lingual sentences. This approach centers around a modular, mixed-lingual morphological and syntactic analyzer, which additionally provides accurate language identification on morpheme level and word and sentence boundary identification in mixed-lingual texts. This approach can also be applied to word identification in languages without a designated word boundary symbol like Chinese or Japanese. To date, this mixed-lingual text analysis supports any mixture of English, French, German, Italian, and Spanish. Because of its modular design it is easily extensible to additional languages.


international conference on machine learning | 2004

A mixed-lingual phonological component which drives the statistical prosody control of a polyglot TTS synthesis system

Harald Romsdorfer; Beat Pfister; René Beutler

A polyglot text-to-speech synthesis system which is able to read aloud mixed-lingual text has first of all to derive the correct pronunciation. This is achieved with an accurate morpho-syntactic analyzer that works simultaneously as language detector, followed by a phonological component which performs various phonological transformations. The result of these symbol processing steps is a complete phonological description of the speech to be synthesized. The subsequent processing step, i.e. prosody control, has to generate numerical values for the physical prosodic parameters from this description, a task that is very different from the former ones. This article shows appropriate solutions to both types of tasks, namely a particular rule-based approach for the phonological component and a statistical or machine learning approach to prosody control.


sensor array and multichannel signal processing workshop | 2010

Combining multiband joint position-pitch algorithm and particle filters for speaker localization

Tania Habib; Harald Romsdorfer

We present a combination of the multiband joint position-pitch (M-PoPi) estimation algorithm with the particle filtering framework to enhance the localization accuracy when tracking multiple concurrent speakers. A new likelihood function derived from the M-PoPi algorithm is proposed for the particle filter framework. The performance of the particle filter based tracker is compared with the M-PoPi algorithm. The proposed framework improves localization accuracy for all cases ranging from single upto three concurrent speakers.


international workshop on machine learning for signal processing | 2009

Speech prosody control using weighted neural network ensembles

Harald Romsdorfer

Ensembles of artificial neural networks (ANNs) show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. This paper presents a new statistical model for prosody control that combines weighted ensembles of ANNs with feature relevance determination. This approach allows the individual networks to be accurate and diverse. The weighted neural network ensemble model was applied for both, phone duration modeling and fundamental frequency modeling. A comparison with state-of-the-art prosody models based on classification and regression trees (CART), multivariate adaptive regression splines (MARS), or ANN, shows a 12% improvement compared to the best duration model and a 24% improvement compared to the best F0 model. The neural network ensemble model also outperforms another, recently presented ensemble model based on gradient tree boosting.


conference of the international speech communication association | 2003

Mixed-lingual text analysis for polyglot TTS synthesis.

Beat Pfister; Harald Romsdorfer


conference of the international speech communication association | 2005

Phonetic Labeling and Segmentation of Mixed-Lingual Prosody Databases

Harald Romsdorfer; Beat Pfister


Computer Speech & Language | 2013

Auditory inspired methods for localization of multiple concurrent speakers

Tania Habib; Harald Romsdorfer


Proceedings International workshop on multimodal corpora: advances in capturing, coding and analyzing multimodality | 2010

3D vision technology for capturing multimodal corpora: chances and challenges

Gabriele Fanelli; Juergen Gall; Harald Romsdorfer; Thibaut Weise; Luc Van Gool


BIWI Technical Report | 2010

Acquisition of a 3D Audio-Visual Corpus of Affective Speech

Gabriele Fanelli; Juergen Gall; Harald Romsdorfer; Thibaut Weise; Luc Van Gool

Collaboration


Dive into the Harald Romsdorfer's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tania Habib

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar

Thibaut Weise

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hannes Pessentheiner

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge