Elmar Nöth
University of Erlangen-Nuremberg
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Elmar Nöth.
Speech Communication | 2009
Andreas K. Maier; Tino Haderlein; Ulrich Eysholdt; Frank Rosanowski; Anton Batliner; Maria Schuster; Elmar Nöth
We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p<.001). A correlation of .90 for the evaluation of the laryngectomees and .87 for the evaluation of the childrens data was obtained. This is comparable to human inter-rater correlations.
SmartKom | 2006
Viktor Zeißler; Johann Adelhardt; Anton Batliner; Carmen Frank; Elmar Nöth; Rui Ping Shi; Heinrich Niemann
In multimodal dialogue systems, several input and output modalities are used for user interaction. The most important modality for human computer interaction is speech. Similar to human human interaction, it is necessary in the human computer interaction that the machine recognizes the spoken word chain in the user’s utterance. For better communication with the user it is advantageous to recognize his internal emotional state because it is then possible to adapt the dialogue strategy to the situation in order to reduce, for example, anger or uncertainty of the user.
Bioinformatics | 1999
Uwe Ohler; Stefan Harbeck; Heinrich Niemann; Elmar Nöth; Martin G. Reese
MOTIVATION We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.
international conference on acoustics, speech, and signal processing | 2008
Tobias Bocklet; Andreas K. Maier; Josef Bauer; Felix Burkhardt; Elmar Nöth
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
User Modeling and User-adapted Interaction | 2008
Anton Batliner; Stefan Steidl; Christian Hacker; Elmar Nöth
The ‘traditional’ first two dimensions in emotion research are VALENCE and AROUSAL. Normally, they are obtained by using elicited, acted data. In this paper, we use realistic, spontaneous speech data from our ‘AIBO’ corpus (human-robot communication, children interacting with Sony’s AIBO robot). The recordings were done in a Wizard-of-Oz scenario: the children believed that AIBO obeys their commands; in fact, AIBO followed a fixed script and often disobeyed. Five labellers annotated each word as belonging to one of eleven emotion-related states; seven of these states which occurred frequently enough are dealt with in this paper. The confusion matrices of these labels were used in a Non-Metrical Multi-dimensional Scaling to display two dimensions; the first we interpret as VALENCE, the second, however, not as AROUSAL but as INTERACTION, i.e., addressing oneself (angry, joyful) or the communication partner (motherese, reprimanding). We show that it depends on the specifity of the scenario and on the subjects’ conceptualizations whether this new dimension can be observed, and discuss impacts on the practice of labelling and processing emotional data. Two-dimensional solutions based on acoustic and linguistic features that were used for automatic classification of these emotional states are interpreted along the same lines.
international conference on spoken language processing | 1996
M. Mast; R. Kompe; S. Harbeck; A. Kiessling; Heinrich Niemann; Elmar Nöth; E.G. Schukat-Talamazzini; V. Warnke
This paper presents automatic methods for the segmentation and classification of dialog acts (DA). In VERBMOBIL it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. Since a turn can consist of one or more successive DAs we conduct the classification of DAs in a two step procedure. First each turn has to be segmented into units which correspond to a DA and second the DA categories have to be identified. For the segmentation we use polygrams and multi-layer perceptrons, using prosodic features. The classification of DAs is done with semantic classification trees and polygrams.
Speech Communication | 1998
Anton Batliner; Ralf Kompe; Andreas Kießling; Marion Mast; Heinrich Niemann; Elmar Nöth
In automatic speech understanding, division of continuous running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistical models for prosodic boundaries large databases are necessary. For the German Verbmobil (VM) project (automatic speech-to-speech translation), we developed a syntactic‐prosodic labelling scheme where diAerent types of syntactic boundaries are labelled for a large spontaneous speech corpus. This labelling scheme is presented and compared with other labelling schemes for perceptual‐prosodic, syntactic, and dialogue act boundaries. Interlabeller consistencies and estimation of eAort needed are discussed. We compare the results of classifiers (multi-layer perceptrons (MLPs) and n-gram language models) trained on these syntactic‐prosodic boundary labels with classifiers trained on perceptual‐prosodic and pure syntactic labels. The main advantage of the rough syntactic‐prosodic labels presented in this paper is that large amounts of data can be labelled with relatively little eAort. The classifiers trained with these labels turned out to be superior with respect to purely prosodic or syntactic labelling schemes, yielding recognition rates of up to 96% for the two-class-problem ‘boundary versus no boundary’. The use of boundary information leads to a marked improvement in the syntactic processing of the VM system. ” 1998 Elsevier Science B.V. All rights reserved.
European Archives of Oto-rhino-laryngology | 2006
Maria Schuster; Tino Haderlein; Elmar Nöth; Jörg Lohscheller; Ulrich Eysholdt; Frank Rosanowski
Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7±12.1%) with sufficient discrimination. It complied with experts’ subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.
Verbmobil: Foundations of Speech-to-Speech Translation | 2000
Anton Batliner; Richard Huber; Heinrich Niemann; Elmar Nöth; Jörg Spilker; Kerstin Fischer
To detect emotional user behavior, particularly anger, can be very useful for successful automatic dialog processing. We present databases and prosodic classifiers implemented for the recognition of emotion in Verbmobil. Using a prosodic feature vector alone is, however, not sufficient for the modelling of emotional user behavior. Therefore, a module is described that combines several knowledge sources within an integrated classification of trouble in communication.
Speech Communication | 2002
Florian Gallwitz; Heinrich Niemann; Elmar Nöth; Volker Warnke
Abstract In this paper, we present an integrated approach for recognizing both the word sequence and the syntactic–prosodic structure of a spontaneous utterance. The approach aims at improving the performance of the understanding component of speech understanding systems by exploiting not only acoustic–phonetic and syntactic information, but also prosodic information directly within the speech recognition process. Whereas spoken utterances are typically modelled as unstructured word sequences in the speech recognizer, our approach includes phrase boundary information in the language model and provides HMMs to model the acoustic and prosodic characteristics of phrase boundaries. This methodology has two major advantages compared to purely word-based speech recognizers. First, additional syntactic–prosodic boundaries are determined by the speech recognizer which facilitates parsing and resolve syntactic and semantic ambiguities. Second – after having removed the boundary information from the result of the recognizer – the integrated model yields a 4% relative word error rate (WER) reduction compared to a traditional word recognizer. The boundary classification performance is equal to that of a separate prosodic classifier operating on the word recognizer output, thus making a separate classifier unnecessary for this task and saving the computation time involved. Compared to the baseline word recognizer, the integrated word-and-boundary recognizer does not involve any computational overhead.
