Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jörn Anemüller is active.

Publication


Featured researches published by Jörn Anemüller.


Neural Networks | 2003

Complex independent component analysis of frequency-domain electroencephalographic data

Jörn Anemüller; Terrence J. Sejnowski; Scott Makeig

Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g. trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral-domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit: (1). sources of spatio-temporal dynamics in the data, (2). links to subject behavior, (3). sources with a limited spectral extent, and (4). a higher degree of independence compared to sources derived by standard ICA.


Journal of the Acoustical Society of America | 2000

Convolutive blind source separation of speech signals based on amplitude modulation decorrelation

Jörn Anemüller; Birger Kollmeier

The problem of blind separation of a convolutive mixture of speech signals is considered. Signal separation is performed in the frequency domain. The amplitude modulation decorrelation algorithm for convolutive blind source separation of speech is described, the first ideas of which were presented by Anemuller at the joint ASA/EAA/DAGA meeting, Berlin, 1999. The algorithm is based on the fact that the frequency‐specific envelopes of speech signals exhibit correlation across different frequencies. This feature can be used to solve the ‘‘permutation problem’’ of frequency‐domain‐based blind source separation algorithms. Furthermore, it leads to separation of good quality since it results in a high number of constraints which must be fulfilled by the unmixed signals. Results for the separation of speech signals are presented for different mixing scenarios, including real‐room reverberant mixing.


Neurocomputing | 2006

Spatio-temporal dynamics in fMRI recordings revealed with complex independent component analysis

Jörn Anemüller; Jeng-Ren Duann; Terrence J. Sejnowski; Scott Makeig

Independent component analysis (ICA) of functional magnetic resonance imaging (fMRI) data is commonly carried out under the assumption that each source may be represented as a spatially fixed pattern of activation, which leads to the instantaneous mixing model. To allow modeling patterns of spatio-temporal dynamics, in particular, the flow of oxygenated blood, we have developed a convolutive ICA approach: spatial complex ICA applied to frequency-domain fMRI data. In several frequency-bands, we identify components pertaining to activity in primary visual cortex (V1) and blood supply vessels. One such component, obtained in the 0.10 Hz band, is analyzed in detail and found to likely reflect flow of oxygenated blood in V1.


international conference on acoustics, speech, and signal processing | 2011

Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments

Niko Moritz; Jörn Anemüller; Birger Kollmeier

In this contribution we present a feature extraction method that relies on the modulation-spectral analysis of amplitude fluctuations within sub-bands of the acoustic spectrum by a STFT. The experimental results indicate that the optimal temporal filter extension for amplitude modulation analysis is around 310 ms. It is also demonstrated that the phase information of the modulation spectrum contains important cues for speech recognition. In this context, the advantage of an odd analysis basis function is considered. The best presented features reached a total relative improvement of 53,5 % for clean-condition training on Aurora-2. Furthermore, it is shown that modulation features are more robust against room reverberation than conventional cepstral and dynamic features and that they strongly benefit from a high early-to-late energy ratio of the characteristic RIR.


international conference on acoustics, speech, and signal processing | 2010

Modulation-based detection of speech in real background noise: Generalization to novel background classes

Jörg-Hendrik Bach; Birger Kollmeier; Jörn Anemüller

Robust detection of speech embedded in real acoustic background noise is considered using an approach based on subband amplitude modulation spectral (AMS) features and trained discriminative classifiers. Performance is evaluated in particular for situations in which speech is embedded in acoustic backgrounds not presented during classifier training, and for signal-to-noise ratios (SNR) from -10 dB to 20 dB. The results show that (1) Generalization to novel background classes with AMS features yields better performance in 84% of investigated situations, corresponding to an SNR benefit of about 10 dB compared to mel-frequency cepstral coefficient (MFCC) features. (2) On known backgrounds, AMS and MFCCs achieve similar performance, with a small advantage for AMS in negative SNR regimes. (3) Standard voice activity detection (ITU G729.B) performs significantly worse than the classification-based approach.


international conference on acoustics, speech, and signal processing | 2013

Automatic acoustic siren detection in traffic noise by part-based models

Jens Schröder; Stefan Goetze; Volker Grützmacher; Jörn Anemüller

State-of-the-art classifiers like hidden Markov models (HMMs) in combination with mel-frequency cepstral coefficients (MFCCs) are flexible in time but rigid in the spectral dimension. In contrast, part-based models (PBMs) originally proposed in computer vision consist of parts in a fully deformable configuration. The present contribution proposes to employ PBMs in the spectro-temporal domain for detection of emergency siren sounds in traffic noise,standard generative training resulting in a classifier that is robust to shifts in frequency induced, e.g., by Doppler-shift effects. Two improvements over standard machine learning techniques for PBM estimation are proposed: (i) Spectro-temporal part (“appearance”) extraction is initialized by interest point detection instead of random initialization and (ii) a discriminative training approach in addition to standard generative training is implemented. Evaluation with self-recorded police sirens and traffic noise gathered on-line demonstrates that PBMs are successful in acoustic siren detection. One hand-labeled and two machine learned PBMs are compared to standard HMMs employing mel-spectrograms and MFCCs in clean and multi condition (multiple SNR) training settings. Results show that in clean condition training, hand-labeled PBMs and HMMs outperform machine-learned PBMs already for test data with moderate additive noise. In multi condition training, the machine learned PBMs outperform HMMs on most SNRs, achieving high accuracies and being nearly optimal up to 5 dB SNR. Thus, our simulation results show that PBMs are a promising approach for acoustic event detection (AED).


Speech Communication | 2011

Robust speech detection in real acoustic backgrounds with perceptually motivated features

Jörg-Hendrik Bach; Jörn Anemüller; Birger Kollmeier

The current study presents an analysis of the robustness of a speech detector in real background sounds. One of the most important aspects of automatic speech/nonspeech classification is robustness in the presence of strongly varying external conditions. These include variations of the signal-to-noise ratio as well as fluctuations of the background noise. These variations are systematically evaluated by choosing different mismatched conditions between training and testing of the speech/nonspeech classifiers. The detection performance of the classifier with respect to these mismatched conditions is used as a measure of robustness and generalisation. The generalisation towards un-trained SNR conditions and unknown background noises is evaluated and compared to a matched baseline condition. The classifier consists of a feature front-end, which computes amplitude modulation spectral features (AMS), and a support vector machine (SVM) back-end. The AMS features are based on Fourier decomposition over time of short-term spectrograms. Mel-frequency cepstral coefficients (MFCC) as well as relative spectral features (RASTA) based on perceptual linear prediction (PLP) serve as baseline. The results show that RASTA-filtered PLP features perform best in the matched task. In the generalisation tasks however, the AMS features emerge as more robust in most cases, while MFCC features are outperformed by both other feature types. In a second set of experiments, a hierarchical approach is analysed which employs a background classification step prior to the speech/nonspeech classifier in order to improve the robustness of the detection scores in novel backgrounds. The background sounds used are recorded in typical everyday scenarios. The hierarchy provides a benefit in overall performance if the robust AMS features are employed. The generalisation capabilities of the hierarchy towards novel backgrounds and SNRs is found to be optimal when a limited number of training backgrounds is used (compared to the inclusion of all available background data). The best backgrounds in terms of generalisation capabilities are found to be backgrounds in which some component of speech (such as unintelligible background babble) is present, which corroborates the hypothesis that the AMS features provide a decomposition of signals which is by itself very suitable for training very general speech/nonspeech detectors. This is also supported by the finding that the SVMs combined with RASTA-PLPs require nonlinear kernels to reach a similar performance as the AMS patterns with linear kernels.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition

Niko Moritz; Jörn Anemüller; Birger Kollmeier

The human ability to classify acoustic sounds is still unmatched compared to recent methods in machine learning. Psychoacoustic and physiological studies indicate that the auditory system of mammals decomposes audio signals into their acoustic and modulation frequency components prior to further analysis. Since it is known that most linguistic information is coded in amplitude fluctuations, mimicking temporal processing strategies of the auditory system in automatic speech recognition (ASR) promises to increase recognition accuracies. We present an amplitude modulation filter bank (AMFB) that is used as a feature extraction scheme in ASR systems. The time-frequency resolution of the employed FIR filters, i.e., bandwidth and modulation frequency settings, are adopted from a psychophysically inspired model of Dau (1997) that was originally proposed to describe data from human psychoacoustics. Investigations on modulation phase indicate the need for preserving such information in amplitude modulation features. We show that the filter symmetry has an important impact on ASR performance. The proposed feature extraction scheme exhibits significant word error rate (WER) reductions using the Aurora-2, Aurora-4, and REVERB ASR tasks compared to other recent feature extraction methods, such as MFCC, FDLP, and PNCC features. Thereby, AMFB features reveal high robustness against additive noise, different transmission channel characteristics, and room reverberation. Using the Aurora-4 benchmark, for instance, an average WER of 12.33% with raw and 11.31% with bottleneck transformed features is attained, which constitutes a relative improvement of 19.6% and 29.2% over raw MFCC features, respectively.


international conference on computer vision systems | 2008

Object category detection using audio-visual cues

Jie Luo; Barbara Caputo; Alon Zweig; Jörg-Hendrik Bach; Jörn Anemüller

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.


international workshop on acoustic signal enhancement | 2014

A discriminative learning approach to probabilistic acoustic source localization

Hendrik Kayser; Jörn Anemüller

Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classification-based method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.

Collaboration


Dive into the Jörn Anemüller's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frank W. Ohl

Leibniz Institute for Neurobiology

View shared research outputs
Top Co-Authors

Avatar

Scott Makeig

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Terrence J. Sejnowski

Salk Institute for Biological Studies

View shared research outputs
Top Co-Authors

Avatar

Jan-Philipp Diepenbrock

Leibniz Institute for Neurobiology

View shared research outputs
Top Co-Authors

Avatar

Max F. K. Happel

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge