Stephan Reiter
Technische Universität München
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephan Reiter.
international conference on multimedia and expo | 2005
Björn W. Schuller; Stephan Reiter; Ronald Müller; Marc Al-Hames; Manfred K. Lang; Gerhard Rigoll
Emotion recognition grows to an important factor in future media retrieval and man machine interfaces. However, even human deciders often experience problems realizing ones emotion, especially of strangers. In this work we strive to recognize emotion independent of the person concentrating on the speech channel. Single feature relevance of acoustic features is a critical point, which we address by filter-based gain ratio calculation starting at a basis of 276 features. As optimization of a minimum set as a whole in general saves more extraction effort, we furthermore apply an SVM-SFFS wrapper based search. For a more robust estimation we also integrate spoken content information by a Bayesian net analysis of ASR outputs. Overall classification is realized in an early feature fusion by stacked ensembles of diverse base classifiers. Tests ran on a 3,947 movie and automotive interaction dialog-turns database consisting of 35 speakers. Remarkable overall performance can be reported in the discrimination of the seven discrete emotions named in the MPEG-4 standard with added neutrality
international conference on multimedia and expo | 2006
Björn W. Schuller; Stephan Reiter; Gerhard Rigoll
Feature sets are broadly discussed within speech emotion recognition by acoustic analysis. While popular filter and wrapper based search help to retrieve relevant ones, we feel that automatic generation of such allows for more flexibility throughout search. The basis is formed by dynamic low-level descriptors considering intonation, intensity, formants, spectral information and others. Next, systematic derivation of prosodic, articulatory, and voice quality high level functionals is performed by descriptive statistical analysis. From here on feature alterations are automatically fulfilled, to find an optimal representation within feature space in view of a target classifier. To avoid NP-hard exhaustive search, we suggest use of evolutionary programming. Significant overall performance improvement over former works can be reported on two public databases
international conference on multimedia and expo | 2007
Stephan Reiter; Björn W. Schuller; Gerhard Rigoll
Automatic segmentation and classification of recorded meetings provides a basis towards understanding the content of a meeting. It enables effective browsing and querying in a meeting archive. Though robustness of existing approaches is often not reliable enough. We therefore strive to improve on this task by applying conditional random fields augmented by hidden states. These hidden conditional random fields have been proven to be efficient in low level pattern recognition tasks. Now we propose to use these novel models to segment a pre-recorded meeting into meeting events. Since they can also be seen as an extension to hidden Markov models an elaborate comparison of the two approaches is provided. Extensive test runs on the public M4 Scripted Meeting Corpus prove the great performance of applying our suggested novel approach compared to other similar methods.
international conference on pattern recognition | 2004
Stephan Reiter; Gerhard Rigoll
In this paper, the segmentation of a meeting into meeting events is investigated as well as the recognition of the detected segments. First, the classification of a meeting event is examined. Five different classifiers are combined through multiple classifier fusion. Then, a way for finding the optimal segment boundaries is presented. With a dynamic programming approach quite encouraging results can be obtained. The results show further that by classifier fusion a more stable result can be achieved than using only one single classifier.
international conference on acoustics, speech, and signal processing | 2005
Stephan Reiter; Sascha Schreiber; Gerhard Rigoll
This paper encompasses the analysis of meetings for segmentation into sub-genres. Therefore, an approach on a higher semantic level has been chosen. The algorithms make use of the results of specialized recognizers like a speaker turn detector and a gesture recognizer. Basically, the goal of this investigation was to answer the question, how well meeting analysis is possible if only the results of these recognizers are available. After introducing briefly the basics of these recognizers, two slightly different methods for the segmentation are presented. The results show the potential of the used methods to find the segment boundaries and to categorize the detected segments into sub-genres (also called meeting events or group actions). Based on this segmentation, further analysis regarding topic detection and content extraction can be accomplished.
Helvetica Chimica Acta | 2002
Stephan Reiter; Bernd Assmann; Stefan Nogai; Norbert W. Mitzel; Hubert Schmidbaur
The prolonged photo-Arbuzov reaction (3 weeks, Hg lamp) of 1,3,5-trichloro-benzene with a large excess of trimethyl phosphite (as a solvent) at 50degrees gives moderate yields of dimethyl (3,5-dichlorophenyl)phosphon ate (1; 14.5%), tetramethyl (5-chloro-1,3-phenylene)bis[phosphonate] (2; 35.4%), and hexamethyl (benzene-1,3,5-triyl)tris[phosphonate] (3; 30.1%). The products can be separated by fractional distillation. Acid hydrolysis of the esters gives almost quantitative yields of the corresponding phosphonic acids 4-6. Reduction of the esters 1-3 by LiAlH4 in tetrahydrofuran affords the primary phosphines (3,5-dichlorophenyl)phosphine (7; 46.5%), (5-chloro-1,3-phenylene)bis[phosphine] (8; 34.5%) and (benzene-1,3,5-triyl)tris[phosphine] (9; 25.2% yield), In the crude reduction products from 2 (preparation of 8) and from 3 (preparation of 9), (3-chlorophenyl)phosphine and (1,3-phenylene)bis[phosphine], respectively, are observed as by-products. All compounds are characterized by standard analytical, spectroscopic, and (for 1, 7, and 8) structural techniques. The arrangement of the molecules in the crystal structures of 7 and 8 suggest that H-bonding between the primary arylphosphines is virtually insignificant for the packing of the components. This is in marked contrast to the importance of H-bonding for the supramolecular chemistry of arylamines The new primary polyphosphines and polyphosphonic acids are to be employed in the construction of extended arrays.
international conference on multimedia and expo | 2005
Björn W. Schuller; Bernardo José Brüning Schmitt; Dejan Arsic; Stephan Reiter; Manfred K. Lang; Gerhard Rigoll
In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas
international conference on multimedia and expo | 2007
Florian Eyben; Björn W. Schuller; Stephan Reiter; Gerhard Rigoll
Automated retrieval of high level information from ballroom dance music is challenging, but has many practical applications. These include, for example, a fully automatic ballroom dance D.J., robots capable of performing ballroom dances, or wearable dance-assistance, as considered herein. It is necessary, for such a system, to retrieve information about the songs quarter note tempo, meter and beat positions. Further, the system must be able to discriminate between the nine standard and Latin ballroom dances. In this paper we present a model that combines all these requirements in one holistic approach. The polyphonic input is processed by a simplified psychoacoustic model. Tatum, tempo and meter features are extracted using resonant filters. The filter output is used for beat tracking. The extracted features are used for a ballroom dance-style classification by support-vector-machines. To show the high effectiveness regarding dance-style recognition and beat tracking, test-runs are carried out on a database containing 1.8k titles.
international conference on machine learning | 2005
Marc Al-Hames; Alfred Dielmann; Daniel Gatica-Perez; Stephan Reiter; Steve Renals; Gerhard Rigoll; Dong Zhang
We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and our modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multi-stream DBN, as well as a multi-stream mixed-state DBN for disturbed data.
international conference on image processing | 2007
Marc Al-Hames; Claus Lenz; Stephan Reiter; Joachim Schenk; Frank Wallhoff; Gerhard Rigoll
The asynchronous hidden Markov model (AHMM) models the joint likelihood of two observation sequences, even if the streams are not synchronised. We explain this concept and how the model is trained by the EM algorithm. We then show how the AHMM can be applied to the analysis of group action events in meetings from both clear and disturbed data. The AHMM outperforms an early fusion HMM by 5.7% recognition rate (a rel. error reduction of 38.5%) for clear data. For occluded data, the improvement is in average 6.5% recognition rate (rel. error red. 40%). Thus asynchronity is a dominant factor in meeting analysis, even if the data is disturbed. The AHMM exploits this and is therefore much more robust against disturbances.