Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anton Batliner is active.

Publication


Featured researches published by Anton Batliner.


Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing 1st | 2013

Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing

Björn W. Schuller; Anton Batliner

This book presents the methods, tools and techniques that are currently being used to recognise (automatically) the affect, emotion, personality and everything else beyond linguistics (paralinguistics) expressed by or embedded in human speech and language.It is the first book to provide such a systematic survey of paralinguistics in speech and language processing. The technology described has evolved mainly from automatic speech and speaker recognition and processing, but also takes into account recent developments within speech signal processing, machine intelligence and data mining.Moreover, the book offers a hands-on approach by integrating actual data sets, software, and open-source utilities which will make the book invaluable as a teaching tool and similarly useful for those professionals already in the field.Key features:Provides an integrated presentation of basic research (in phonetics/linguistics and humanities) with state-of-the-art engineering approaches for speech signal processing and machine intelligence.Explains the history and state of the art of all of the sub-fields which contribute to the topic of computational paralinguistics.C overs the signal processing and machine learning aspects of the actual computational modelling of emotion and personality and explains the detection process from corpus collection to feature extraction and from model testing to system integration.Details aspects of real-world system integration including distribution, weakly supervised learning and confidence measures.Outlines machine learning approaches including static, dynamic and contextsensitive algorithms for classification and regression.Includes a tutorial on freely available toolkits, such as the open-source openEAR toolkit for emotion and affect recognition co-developed by one of the authors, and a listing of standard databases and feature sets used in the field to allow for immediate experimentation enabling the reader to build an emotion detection model on an existing corpus.


international conference on spoken language processing | 1996

Consistency in transcription and labelling of German intonation with GToBI

Martine Grice; Matthias Reyelt; Ralf Benzmüller; Jörg Mayer; Anton Batliner

A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber-consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling.


Archive | 2011

The Automatic Recognition of Emotions in Speech

Anton Batliner; Björn W. Schuller; Dino Seppi; Stefan Steidl; Laurence Devillers; Laurence Vidrascu; Thurid Vogt; Vered Aharonson; Noam Amir

In this chapter, we focus on the automatic recognition of emotional states using acoustic and linguistic parameters as features and classifiers as tools to predict the ‘correct’ emotional states. We first sketch history and state of the art in this field; then we describe the process of ‘corpus engineering’, i.e. the design and the recording of databases, the annotation of emotional states, and further processing such as manual or automatic segmentation. Next, we present an overview of acoustic and linguistic features that are extracted automatically or manually. In the section on classifiers, we deal with topics such as the curse of dimensionality and the sparse data problem, classifiers, and evaluation. At the end of each section, we point out important aspects that should be taken into account for the planning or the assessment of studies. The subject area of this chapter is not emotions in some narrow sense but in a wider sense encompassing emotion-related states such as moods, attitudes, or interpersonal stances as well. We do not aim at an in-depth treatise of some specific aspects or algorithms but at an overview of approaches and strategies that have been used or should be used.


Archive | 1994

MÜSLI: A Classification Scheme For Laryngealizations

Anton Batliner; Susanne Burger; B. Johne; Andreas Kießling

We developed a classification scheme for laryngealizations that can be used to discriminate the many different shapes of laryngealizations with different feature values. Potential applications are phonetic transcription and automatic detection. The scheme was developed and tested with a database from 4 speakers that contains more than 1200 laryngealizations.


Archive | 1995

Filled pauses in spontaneous speech

Anton Batliner; Andreas Kießling; Susanne Burger; Elmar Nöth

Filled pauses as e g uh eh signal dis uencies i e hesitations or repairs They do normally not occur in read speech and were therefore up to now rather seldom investigated they must however be ac counted for in the automatic processing of spontaneous speech We present de scriptive statistics and the results of an au tomatic classi cation of lled pauses in the database of the VERBMOBIL project and discuss the relevancy of di erent prosodic features for the marking of di erent types


international conference on spoken language processing | 1996

Syntactic-prosodic labeling of large spontaneous speech data-bases

Anton Batliner; R. Kompe; A. Kiessling; Heinrich Niemann; Elmar Nöth

In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German VERBMOBIL project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large VERBMOBIL spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%).


Archive | 2011

Why sentence modality in spontaneous speech is more difficult to classify and why this fact is not too bad for prosody

Anton Batliner; C. Weiand; Andreas Kießling; Elmar Nöth

We show in this paper that the labeling of sentence modality in German, esp. of questions vs. non-questions, is more difficult for spontaneous than for read speech and easier for non-elliptic than for elliptic utterances. However, the prosodic marking of sentence modality is more important in elliptic utterances that occur more often in spontaneous speech.


Journal of Pattern Recognition Research | 2009

QMOS - A Robust Visualization Method for Speaker Dependencies with Different Microphones

Andreas K. Maier; Maria Schuster; Ulrich Eysholdt; Tino Haderlein; Tobias Cincarek; Stefan Steidl; Anton Batliner; Stefan Wenhardt; Elmar Nöth

Abstract There are several methods to create visualizations of speech data. All of them, how-ever, lack the ability to remove microphone-dependent distortions. We examined the useof Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and theCOmprehensive Space Map of Objective Signal (COSMOS) method in this work. To solvethe problem of lacking microphone independency of PCA, LDA, and COSMOS, we presenttwo methods to reduce the influence of the recording conditions on the visualization. Thefirst one is a rigid registration of maps created from identical speakers recorded underdifferent conditions, i.e. different microphones and distances. The second method is anextension of the COSMOS method, which performs a non-rigid registration during themapping procedure. As a measure for the quality of the visualization, we computed themapping error which occurs during the dimension reduction and the grouping error as theaverage distance between the representations of the same speaker recorded by different mi-crophones. The best linear method in leave-one-speaker-out evaluation is PCA plus rigidregistration with a mapping error of 47% and a grouping error of 18%. The proposedmethod, however, surpasses this even further with a mapping error of 24% and a groupingerror which is close to zero.Keywords: Speech intelligibility, speech and voice disorders, speech evaluation, dimen-sionality reduction, Sammon mapping, QMOS, COSMOS, COmprehensive Space Map ofObjective Signal


Natural Language Processing and Speech Technology, Results of the 3rd KONVENS Conference | 1996

Prosodische Etikettierung des Deutschen mit ToBI

Matthias Reyelt; Martine Grice; Ralf Benzmüller; Jörg Mayer; Anton Batliner

Geh ort zum Antragsabschnitt: 14.6 Prosodische Etikettierung Die vorliegende Arbeit wurde im Rahmen des Verbundvorhabens Verbmobil vom Bundesministerium f ur Bildung, Wissenschaft, Forschung und Technologie (BMBF) unter dem F orderkennzeichen 01 IV 101 N0 gef ordert. Die Verantwortung f ur den Inhalt dieser Arbeit liegt bei dem Autor.


Speech Communication | 1994

Prosody takes over: towards a prosodically guided dialog system

Ralf Kompe; Elmar Nöth; Andreas Kießling; Thomas Kuhn; Marion Mast; Heinrich Niemann; K. Ott; Anton Batliner

Abstract The domain of the speech recognition and dialog system EVAR is train time table inquiry. We observed that in real human-human dialogs when the officer transmits the information, the customer very often interrupts. Many of these interruptions are just repetitions of the time of day given by the officer. The functional role of these interruptions is often determined by prosodic cues only. An important result of experiments where naive persons used the EVAR system is that it is hard to follow the train connection given via speech synthesis. In this case it is even more important than in human-human dialogs that the user has the opportunity to interact during the answer phase. Therefore we extended the dialog module to allow the user to repeat the time of day and we added a prosody module guiding the continuation of the dialog by analyzing the intonation contour of this utterance.

Collaboration


Dive into the Anton Batliner's collaboration.

Top Co-Authors

Avatar

Elmar Nöth

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Kießling

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Elmar Nth

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Ralf Kompe

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Björn W. Schuller

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Stefan Steidl

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Volker Warnke

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Kiessling

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Björn W. Schuller

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge