Björn W. Schuller | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Björn W. Schuller is active.

Explore More

Publication

Featured researches published by Björn W. Schuller.

Image and Vision Computing | 2013

Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Hatice Gunes; Björn W. Schuller

In the context of affective human behavior analysis, we use the term continuous input to refer to naturalistic settings where explicit or implicit input from the subject is continuously available, where in a human-human or human-computer interaction setting, the subject plays the role of a producer of the communicative behavior or the role of a recipient of the communicative behavior. As a result, the analysis and the response provided by the automatic system are also envisioned to be continuous over the course of time, within the boundaries of digital machine output. The term continuous affect analysis is used as analysis that is continuous in time as well as analysis that uses affect phenomenon represented in dimensional space. The former refers to acquiring and processing long unsegmented recordings for detection of an affective state or event (e.g., nod, laughter, pain), and the latter refers to prediction of an affect dimension (e.g., valence, arousal, power). In line with the Special Issue on Affect Analysis in Continuous Input, this survey paper aims to put the continuity aspect of affect under the spotlight by investigating the current trends and provide guidance towards possible future directions.

Neurocomputing | 2009

A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Martin Wöllmer; Marc Al-Hames; Florian Eyben; Björn W. Schuller; Gerhard Rigoll

To overcome the computational complexity of the asynchronous hidden Markov model (AHMM), we present a novel multidimensional dynamic time warping (DTW) algorithm for hybrid fusion of asynchronous data. We show that our newly introduced multidimensional DTW concept requires significantly less decoding time while providing the same data fusion flexibility as the AHMM. Thus, it can be applied in a wide range of real-time multimodal classification tasks. Optimally exploiting mutual information during decoding even if the input streams are not synchronous, our algorithm outperforms late and early fusion techniques in a challenging bimodal speech and gesture fusion experiment.

international conference on acoustics, speech, and signal processing | 2009

Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

Martin Wöllmer; Florian Eyben; Joseph Keshet; Alex Graves; Björn W. Schuller; Gerhard Rigoll

In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.

signal processing systems | 2012

Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit

Felix Weninger; Björn W. Schuller

We describe the implementation of monaural audio source separation algorithms in our toolkit openBliSSART (Blind Source Separation for Audio Recognition Tasks). To our knowledge, it provides the first freely available C+u2009+ implementation of Non-Negative Matrix Factorization (NMF) supporting the Compute Unified Device Architecture (CUDA) for fast parallel processing on graphics processing units (GPUs). Besides integrating parallel processing, openBliSSART introduces several numerical optimizations of commonly used monaural source separation algorithms that reduce both computation time and memory usage. By illustrating a variety of use-cases from audio effects in music processing to speech enhancement and feature extraction, we demonstrate the wide applicability of our application framework for a multiplicity of research and end-user applications. We evaluate the toolkit by benchmark results of the NMF algorithms and discuss the influence of their parameterization on source separation quality and real-time factor. In the result, the GPU parallelization in openBliSSART introduces double-digit speedups with respect to conventional CPU computation, enabling real-time processing on a desktop PC even for high matrix dimensions.

ieee automatic speech recognition and understanding workshop | 2009

Robust vocabulary independent keyword spotting with graphical models

Martin Wöllmer; Florian Eyben; Björn W. Schuller; Gerhard Rigoll

This paper introduces a novel graphical model architecture for robust and vocabulary independent keyword spotting which does not require the training of an explicit garbage model. We show how a graphical model structure for phoneme recognition can be extended to a keyword spotter that is robust with respect to phoneme recognition errors. We use a hidden garbage variable together with the concept of switching parents to model keywords as well as arbitrary speech. This implies that keywords can be added to the vocabulary without having to re-train the model. Thereby the design of our model architecture is optimised to reliably detect keywords rather than to decode keyword phoneme sequences as arbitrary speech, while offering a parameter to adjust the operating point on the receiver operating characteristics curve. Experiments on the TIMIT corpus reveal that our graphical model outperforms a comparable hidden Markov model based keyword spotter that uses conventional garbage modelling.

Neurocomputing | 2014

Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Martin Wöllmer; Björn W. Schuller

We introduce a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike the previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. By combining BLSTM modeling and Bottleneck (BN) feature generation, we propose a novel front-end that allows us to produce context-sensitive probabilistic feature vectors of arbitrary size, independent of the network training targets. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling.

Archive | 2013

Audio Source Separation

Björn W. Schuller

In order to enhance the (audio) signal of interest in the case of added audio sources, one can aim at their separation. Albeit being very demanding, Audio Source Separation of audio signals has many interesting applications: for example, in Music Information Retrieval, it allows for polyphonic transcription or recognition of lyrics in singing after decomposing the original recording into voices and/or instruments such as drums or guitars, or vocals, e.g., for ’query by humming’. Here, non-negative matrix factorisation-based (NMF) approaches are explained. Further, ’NMF Activation Features’ are introduced and exemplified in the speech processing domain.

conference of the international speech communication association | 2009