Radoslaw Mazur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Radoslaw Mazur is active.

Explore More

Publication

Featured researches published by Radoslaw Mazur.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Random regression forests for acoustic event detection and classification

Huy Phan; Marco Maaß; Radoslaw Mazur; Alfred Mertins

Despite the success of the automatic speech recognition framework in its own application field, its adaptation to the problem of acoustic event detection has resulted in limited success. In this paper, instead of treating the problem similar to the segmentation and classification tasks in speech recognition, we pose it as a regression task and propose an approach based on random forest regression. Furthermore, event localization in time can be efficiently handled as a joint problem. We first decompose the training audio signals into multiple interleaved superframes which are annotated with the corresponding event class labels and their displacements to the temporal onsets and offsets of the events. For a specific event category, a random-forest regression model is learned using the displacement information. Given an unseen superframe, the learned regressor will output the continuous estimates of the onset and offset locations of the events. To deal with multiple event categories, prior to the category-specific regression phase, a superframe-wise recognition phase is performed to reject the background superframes and to classify the event superframes into different event categories. While jointly posing event detection and localization as a regression problem is novel, the superior performance on two databases ITC-Irst and UPC-TALP demonstrates the efficiency and potential of the proposed approach.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

An Approach for Solving the Permutation Problem of Convolutive Blind Source Separation Based on Statistical Signal Models

Radoslaw Mazur; Alfred Mertins

In this paper, we present a new algorithm for solving the permutation ambiguity in convolutive blind source separation. Transformed to the frequency domain, existing algorithms can efficiently solve the reduction of the source separation problem into independent instantaneous separation in each frequency bin. However, this independency leads to the problem of correctly aligning these single bins. The new algorithm models the frequency-domain separated signals by means of the generalized Gaussian distribution and employs the small deviation of the parameters between neighboring bins for the detection of correct permutations. The performance of the algorithm will be demonstrated on synthetic and real-world data.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Combined Acoustic MIMO Channel Crosstalk Cancellation and Room Impulse Response Reshaping

Jan Ole Jungmann; Radoslaw Mazur; Markus Kallinger; Tiemin Mei; Alfred Mertins

Virtual 3-D sound can be easily delivered to a listener by binaural audio signals that are reproduced via headphones, which guarantees that only the correct signals reach the corresponding ears. Reproducing the binaural audio signal by two or more loudspeakers introduces the problems of crosstalk on the one hand, and, of reverberation on the other hand. In crosstalk cancellation, the audio signals are fed through a network of prefilters prior to loudspeaker reproduction to ensure that only the designated signal reaches the corresponding ear of the listener. Since room impulse responses are very sensitive to spatial mismatch, and since listeners might slightly move while listening, robust designs are needed. In this paper, we present a method that jointly handles the three problems of crosstalk, reverberation reduction, and spatial robustness with respect to varying listening positions for one or more binaural source signals and multiple listeners. The proposed method is based on a multichannel room impulse response reshaping approach by optimizing a -norm based criterion. Replacing the well-known least-squares technique by a -norm based method employing a large value for allows us to explicitly control the amount of crosstalk and to shape the remaining reverberation effects according to a desired decay.

workshop on applications of signal processing to audio and acoustics | 2011

On CUDA implementation of a multichannel room impulse response reshaping algorithm based on p-norm optimization

Radoslaw Mazur; Jan Ole Jungmann; Alfred Mertins

By using room impulse response shortening and shaping it is possible to reduce the reverberation effects and therefore improve speech intelligibility. This may be achieved by a prefilter that modifies the overall impulse response to have a stronger attenuation. For achieving a spatial robustness, multichannel approaches have been proposed. Unfortunately, these approaches suffer from a very high computational cost and are far too slow for being of practical use in applications where filters have to be designed in real-time. In this work we tackle this drawback using a CUDA implementation and achieve a speedup of over 130 times.

international conference on acoustics, speech, and signal processing | 2009

Using the scaling ambiguity for filter shortening in convolutive blind source separation

Radoslaw Mazur; Alfred Mertins

In this paper, we propose to use the scaling ambiguity of convolutive blind source separation for shortening the unmixing filters. An often used approach for separating convolutive mixtures is the transformation to the time-frequency domain where an instantaneous ICA algorithm can be applied for each frequency separately. This approach leads to the so called permutation and scaling ambiguity. While different methods for the permutation problem have been widely studied, the solution for the scaling problem is usually based on the minimal distortion principle. We propose an alternative approach that allows the unmixing filters to be as short as possible. Shorter unmixing filters will suffer less from circular-convolution effects that are inherent to unmixing approaches based on bin-wise ICA followed by permutation and scaling correction. The results for the new algorithm will be shown on a real-world example.

workshop on applications of signal processing to audio and acoustics | 2015

A multi-channel fusion framework for audio event detection

Huy Phan; Marco Maass; Lars Hertel; Radoslaw Mazur; Alfred Mertins

We propose in this paper a simple, yet efficient multi-channel fusion framework for joint acoustic event detection and classification. The joint problem on individual channels is posed as a regression problem to estimate event onset and offset positions. As an intermediate result, we also obtain the posterior probabilities which measure the confidence that event onsets and offsets are present at a temporal position. It facilitates the fusion problem by accumulating the posterior probabilities of different channels. The detection hypotheses are then determined based on the summed posterior probabilities. While the proposed fusion framework appears to be simple and natural, it significantly outperforms all the single-channel baseline systems on the ITC-Irst database. We also show that adding channels one by one into the fusion system yields performance improvements, and the performance of the fusion system is always better than those of the individual-channel counterparts.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Learning representations for nonspeech audio events through their similarities to speech patterns

Huy Phan; Lars Hertel; Marco Maass; Radoslaw Mazur; Alfred Mertins

The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, e.g., in a classification task. In order to answer this question, we consider speech patterns as basic acoustic concepts, which embody and represent the target nonspeech signal. To find out how similar the nonspeech signal is to speech, we classify it with a classifier trained on the speech patterns and use the classification posteriors to represent the closeness to the speech bases. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. Furthermore, these descriptors are generic. That is, once the speech classifier has been learned, it can be employed as a feature extractor for different datasets without retraining. Lastly, we propose an algorithm to select a sufficient subset, which provides an approximate representation capability of the entire set of available speech patterns. We conduct experiments for the application of audio event analysis. Phone triplets from the TIMIT dataset were used as speech patterns to learn the descriptors for audio events of three different datasets with different complexity, including UPC-TALP, Freiburg-106, and NAR. The experimental results on the event classification task show that a good performance can be easily obtained even if a simple linear classifier is used. Furthermore, fusion of the learned descriptors as an additional source leads to state-of-the-art performance on all the three target datasets.

international conference on multimedia and expo | 2015

Early event detection in audio streams

Huy Phan; Marco Maass; Radoslaw Mazur; Alfred Mertins

Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed.

IEEE Transactions on Audio, Speech, and Language Processing | 2017

Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks

Huy Phan; Lars Hertel; Marco Maass; Philipp Koch; Radoslaw Mazur; Alfred Mertins

In this paper, we present an efficient approach for audio scene classification. We aim at learning representations for scene examples by exploring the structure of their class labels. A category taxonomy is automatically learned by collectively optimizing a tree-structured clustering of the given labels into multiple metaclasses. A scene recording is then transformed into a label-tree embedding image. Elements of the image represent the likelihoods that the scene instance belongs to the metaclasses. We investigate classification with label-tree embedding features learned from different low-level features as well as their fusion. We show that the combination of multiple features is essential to obtain good performance. While averaging label-tree embedding images over time yields good performance, we argue that average pooling possesses an intrinsic shortcoming. We alternatively propose an improved classification scheme to bypass this limitation. We aim at automatically learning common templates that are useful for the classification task from these images using simple but tailored convolutional neural networks. The trained networks are then employed as a feature extractor that matches the learned templates across a label-tree embedding image and produce the maximum matching scores as features for classification. Since audio scenes exhibit rich content, template learning and matching on low-level features would be inefficient. With label-tree embedding features, we have quantized and reduced the low-level features into the likelihoods of the metaclasses, on which the template learning and matching are efficient. We study both training convolutional neural networks on stacked label-tree embedding images and multistream networks. Experimental results on the DCASE2016 and LITIS Rouen datasets demonstrate the efficiency of the proposed methods.

international conference on independent component analysis and signal separation | 2007

Solving the permutation problem in convolutive blind source separation

Radoslaw Mazur; Alfred Mertins

This paper presents a new algorithm for solving the permutation ambiguity in convolutive blind source separation. When transformed to the frequency domain, the source separation problem reduces to independent instantaneous separation in each frequency bin, which can be efficiently solved by existing algorithms. But this independency leads to the problem of correct alignment of these single bins which is still not entirely solved. The algorithm proposed in this paper models the frequency-domain separated signals using the generalized Gaussian distribution and utilizes the small deviation of the exponent between neighboring bins for the detection of correct permutations.

Explore More