Joachim Stegmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joachim Stegmann is active.

Explore More

Publication

Featured researches published by Joachim Stegmann.

international conference on acoustics, speech, and signal processing | 2007

Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications

Florian Metze; Jitendra Ajmera; Roman Englert; Udo Bub; Felix Burkhardt; Joachim Stegmann; Christian A. Müller; Richard Huber; Bernt Andrassy; Josef Bauer; Bernhard Dipl Ing Littel

This paper presents a comparative study of four different approaches to automatic age and gender classification using seven classes on a telephony speech task and also compares the results with human performance on the same data. The automatic approaches compared are based on (1) a parallel phone recognizer, derived from an automatic language identification system; (2) a system using dynamic Bayesian networks to combine several prosodic features; (3) a system based solely on linear prediction analysis; and (4) Gaussian mixture models based on MFCCs for separate recognition of age and gender. On average, the parallel phone recognizer performs as well as Human listeners do, while loosing performance on short utterances. The system based on prosodic features however shows very little dependence on the length of the utterance.

international conference on acoustics, speech, and signal processing | 2009

Detecting real life anger

Felix Burkhardt; Tim Polzehl; Joachim Stegmann; Florian Metze; Richard Huber

Acoustic anger detection in voice portals can help to enhance human computer interaction. A comprehensive voice portal data collection has been carried out and gives new insight on the nature of real life data. Manual labeling revealed a high percentage of non-classifiable data. Experiments with a statistical classifier indicate that, in contrast to pitch and energy related features, duration measures do not play an important role for this data while cepstral information does. Also in a direct comparison between Gaussian Mixture Models and Support Vector Machines the latter gave better results.

international conference on acoustics, speech, and signal processing | 2001

A candidate for the ITU-T 4 kbit/s speech coding standard

Jes Thyssen; Yang Gao; Adil Benyassine; Eyal Shlomot; Carlo Murgia; Huan-Yu Su; Kazunori Mano; Yusuke Hiwasaki; Hiroyuki Ehara; Kazutoshi Yasunaga; Claude Lamblin; Balazs Kovesi; Joachim Stegmann; Hong-Goo Kang

This paper presents the 4 kbit/s speech coding candidate submitted by AT&T, Conexant, Deutsche Telekom, France Telecom, Matsushita, and NTT for the ITU-T 4 kbit/s selection phase. The algorithm was developed jointly based on the qualification version of Conexant. This paper focuses on the development carried out during the collaboration in order to narrow the gap to the requirements in an attempt to provide toll quality at 4 kbit/s. This objective is currently being verified in independent subjective tests coordinated by ITU-T and carried out in multiple languages. Subjective tests carried out during the development indicate that the collaboration work has been successful in improving the quality, and that meeting a majority of the requirements in the extensive selection phase test is a realistic goal.

Universal Access in The Information Society | 2009

Getting closer: tailored human–computer speech dialog

Florian Metze; Roman Englert; Udo Bub; Felix Burkhardt; Joachim Stegmann

This paper presents an advanced call center, which adapts presentation and interaction strategy to properties of the caller such as age, gender, and emotional state. User studies on interactive voice response (IVR) systems have shown that these properties can be used effectively to “tailor” services to users or user groups who do not maintain personal preferences, e.g., because they do not use the service on a regular basis. The adopted approach to achieve individualization of services, without being able to personalize them, is based on the analysis of a caller’s voice. This paper shows how this approach benefits service providers by being able to target entertainment and recommendation options. It also shows how this analysis at the same time benefits the customer, as it can increase accessibility of IVR systems to user segments which have particular expectations or which do not cope well with a “one size fits all” system. The paper summarizes the authors’ current work on component technologies, such as emotion detection, age and gender recognition on telephony speech, and presents results of usability and acceptability tests as well as an architecture to integrate these technologies in future multi-modal contact centers. It is envisioned that these will eventually serve customers with an avatar representation of an agent and tailored interaction strategies, matching powerful output capabilities with advanced analysis of the user’s input.

international conference on acoustics speech and signal processing | 1996

Robust classification of speech based on the dyadic wavelet transform with application to CELP coding

Joachim Stegmann; Gerhard Schröder; Kyrill A. Fischer

This paper describes a new algorithm for the classification of telephone-bandwidth speech that is designed for efficient control of bit allocation in low bit-rate speech coders. The algorithm is based on the dyadic wavelet transform (D/sub y/WT) and classifies each unit subframe into one of the three categories background noise/unvoiced, transients/voicing onsets, periodic/voiced. A set of three parameters is derived from the D/sub y/WT coefficients, each giving a decision score that the associated class is active. Taking the history into account, a finite-state model controlled by these parameters computes the classifiers decision. The proposed algorithm is robust to various types of background noise. In comparison with a classifier based on the long-term autocorrelation function, the D/sub y/WT classifier proves to be superior. To evaluate its performance in CELP-type speech coders, a variety of excitation coding schemes with bit rates between 2200 and 4800 bit/s is investigated.

affective computing and intelligent interaction | 2009

Emotion detection in dialog systems: Applications, strategies and challenges

Felix Burkhardt; Markus Van Ballegooy; Klaus-Peter Engelbrecht; Tim Polzehl; Joachim Stegmann

Emotion plays an important role in human communication and therefore also human machine dialog systems can benefit from affective processing. We present in this paper an overview of our work from the past few years and discuss general considerations, potential applications and experiments that we did with the emotional classification of human machine dialogs. Anger in voice portals as well as problematic dialog situations can be detected to some degree, but the noise in real life data and the issue of unambiguous emotion definition are still challenging. Also, a dialog system reacting emotionally might raise expectations with respect to its intellectual abilities that it can not fulfill.

Archive | 1999