Felix Burkhardt
Deutsche Telekom
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Felix Burkhardt.
international conference on acoustics, speech, and signal processing | 2007
Florian Metze; Jitendra Ajmera; Roman Englert; Udo Bub; Felix Burkhardt; Joachim Stegmann; Christian A. Müller; Richard Huber; Bernt Andrassy; Josef Bauer; Bernhard Dipl Ing Littel
This paper presents a comparative study of four different approaches to automatic age and gender classification using seven classes on a telephony speech task and also compares the results with human performance on the same data. The automatic approaches compared are based on (1) a parallel phone recognizer, derived from an automatic language identification system; (2) a system using dynamic Bayesian networks to combine several prosodic features; (3) a system based solely on linear prediction analysis; and (4) Gaussian mixture models based on MFCCs for separate recognition of age and gender. On average, the parallel phone recognizer performs as well as Human listeners do, while loosing performance on short utterances. The system based on prosodic features however shows very little dependence on the length of the utterance.
Journal of Web Semantics | 2007
Daniel Oberle; Anupriya Ankolekar; Pascal Hitzler; Philipp Cimiano; Michael Sintek; Malte Kiesel; Babak Mougouie; Stephan Baumann; Shankar Vembu; Massimo Romanelli; Paul Buitelaar; Ralf Engel; Daniel Sonntag; Norbert Reithinger; Berenike Loos; Hans-Peter Zorn; Vanessa Micelli; Robert Porzel; Christian Schmidt; Moritz Weiten; Felix Burkhardt; Jianshen Zhou
Increased availability of mobile computing, such as personal digital assistants (PDAs), creates the potential for constant and intelligent access to up-to-date, integrated and detailed information from the Web, regardless of ones actual geographical position. Intelligent question-answering requires the representation of knowledge from various domains, such as the navigational and discourse context of the user, potential user questions, the information provided by Web services and so on, for example in the form of ontologies. Within the context of the SmartWeb project, we have developed a number of domain-specific ontologies that are relevant for mobile and intelligent user interfaces to open-domain question-answering and information services on the Web. To integrate the various domain-specific ontologies, we have developed a foundational ontology, the SmartSUMO ontology, on the basis of the DOLCE and SUMO ontologies. This allows us to combine all the developed ontologies into a single SmartWeb Integrated Ontology (SWIntO) having a common modeling basis with conceptual clarity and the provision of ontology design patterns for modeling consistency. In this paper, we present SWIntO, describe the design choices we made in its construction, illustrate the use of the ontology through a number of applications, and discuss some of the lessons learned from our experiences.
international conference on acoustics, speech, and signal processing | 2008
Tobias Bocklet; Andreas K. Maier; Josef Bauer; Felix Burkhardt; Elmar Nöth
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
international conference on acoustics, speech, and signal processing | 2009
Felix Burkhardt; Tim Polzehl; Joachim Stegmann; Florian Metze; Richard Huber
Acoustic anger detection in voice portals can help to enhance human computer interaction. A comprehensive voice portal data collection has been carried out and gives new insight on the nature of real life data. Manual labeling revealed a high percentage of non-classifiable data. Experiments with a statistical classifier indicate that, in contrast to pitch and energy related features, duration measures do not play an important role for this data while cepstral information does. Also in a direct comparison between Gaussian Mixture Models and Support Vector Machines the latter gave better results.
affective computing and intelligent interaction | 2011
Marc Schröder; Paolo Baggia; Felix Burkhardt; Catherine Pelachaud; Christian Peter; Enrico Zovato
The present paper describes the specification of Emotion Markup Language (EmotionML) 1.0, which is undergoing standardisation at the World Wide Web Consortium (W3C). The language aims to strike a balance between practical applicability and scientific wellfoundedness. We briefly review the history of the process leading to the standardisation of EmotionML. We describe the syntax of EmotionML as well as the vocabularies that are made available to describe emotions in terms of categories, dimensions, appraisals and/or action tendencies. The paper concludes with a number of relevant aspects of emotion that are not covered by the current specification.
Archive | 2011
Marc Schröder; Hannes Pirker; Myriam Lamolle; Felix Burkhardt; Christian Peter; Enrico Zovato
In many cases when technological systems are to operate on emotions and related states, they need to represent these states. Existing representations are limited to application-specific solutions that fall short of representing the full range of concepts that have been identified as relevant in the scientific literature. The present chapter presents a broad conceptual view on the possibility to create a generic representation of emotions that can be used in many contexts and for many purposes. Potential use cases and resulting requirements are identified and compared to the scientific literature on emotions. Options for the practical realisation of an Emotion Markup Language are discussed in the light of the requirement to extend the language to different emotion concepts and vocabularies, and ontologies are investigated as a means to provide limited “mapping” mechanisms between different emotion representations.
international conference on acoustics, speech, and signal processing | 2010
Björn W. Schuller; Felix Burkhardt
Data sparseness is an ever dominating problem in automatic emotion recognition. Using artificially generated speech for training or adapting models could potentially ease this: though less natural than human speech, one could synthesize the exact spoken content in different emotional nuances - of many speakers and even in different languages. To investigate chances, the phonemisation components Txt2Pho and openMary are used with Emofilt and Mbrola for emotional speech synthesis. Analysis is realized with our Munich open Emotion and Affect Recognition toolkit. As test set we gently limit to the acted Berlin and eNTERFACE databases for the moment. In the result synthesized speech can indeed be used for the recognition of human emotional speech.
Universal Access in The Information Society | 2009
Florian Metze; Roman Englert; Udo Bub; Felix Burkhardt; Joachim Stegmann
This paper presents an advanced call center, which adapts presentation and interaction strategy to properties of the caller such as age, gender, and emotional state. User studies on interactive voice response (IVR) systems have shown that these properties can be used effectively to “tailor” services to users or user groups who do not maintain personal preferences, e.g., because they do not use the service on a regular basis. The adopted approach to achieve individualization of services, without being able to personalize them, is based on the analysis of a caller’s voice. This paper shows how this approach benefits service providers by being able to target entertainment and recommendation options. It also shows how this analysis at the same time benefits the customer, as it can increase accessibility of IVR systems to user segments which have particular expectations or which do not cope well with a “one size fits all” system. The paper summarizes the authors’ current work on component technologies, such as emotion detection, age and gender recognition on telephony speech, and presents results of usability and acceptability tests as well as an architecture to integrate these technologies in future multi-modal contact centers. It is envisioned that these will eventually serve customers with an avatar representation of an agent and tailored interaction strategies, matching powerful output capabilities with advanced analysis of the user’s input.
affective computing and intelligent interaction | 2009
Felix Burkhardt; Markus Van Ballegooy; Klaus-Peter Engelbrecht; Tim Polzehl; Joachim Stegmann
Emotion plays an important role in human communication and therefore also human machine dialog systems can benefit from affective processing. We present in this paper an overview of our work from the past few years and discuss general considerations, potential applications and experiments that we did with the emotional classification of human machine dialogs. Anger in voice portals as well as problematic dialog situations can be detected to some degree, but the noise in real life data and the issue of unambiguous emotion definition are still challenging. Also, a dialog system reacting emotionally might raise expectations with respect to its intellectual abilities that it can not fulfill.
Computer Speech & Language | 2015
Björn W. Schuller; Stefan Steidl; Anton Batliner; E. Nöth; Alessandro Vinciarelli; Felix Burkhardt; R.J.J.H. van Son; Felix Weninger; Florian Eyben; Tobias Bocklet; Gelareh Mohammadi; Benjamin Weiss
The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks.