Steven van de Par
University of Oldenburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steven van de Par.
EURASIP Journal on Advances in Signal Processing | 2005
Steven van de Par; Ag Armin Kohlrausch; Richard Heusdens; Jesper Jensen; Søren Holdt Jensen
Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant) norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.
international conference on acoustics, speech, and signal processing | 2002
Steven van de Par; Ag Armin Kohlrausch; Ghassan Charestan; Richard Heusdens
The use of psychoacoustical masking models for audio coding applications has been wide spread over the past decades. In such applications, it is typically assumed that the original input signal serves as a masker for the distortions that are introduced by the lossy coding method that is used. Such masking models are based on the peripheral bandpass filtering properties of the auditory system and basically evaluate the distortion-to-masker ratio within each auditory filter. Up to now these models have been based on the assumption that the masking of distortions is governed by the auditory filter for which the ratio between distortion and masker is largest. This assumption, however, is not in line with some new findings within the field of psychoacoustics. A more accurate assumption would be that the human auditory system is able to integrate distortions that are present within a range of auditory filters. In this contribution a new model is presented which is in line with new psychoacoustical studies and which is suitable for application within an audio codec. Although this model can be used to derive a masking curve, the model also gives a measure for the detectability of distortions provided that distortions are not too large.
EURASIP Journal on Advances in Signal Processing | 2016
Joachim Thiemann; Menno Müller; Daniel Marquardt; Simon Doclo; Steven van de Par
Modern binaural hearing aids utilize multimicrophone speech enhancement algorithms to enhance signals in terms of signal-to-noise ratio, but they may distort the interaural cues that allow the user to localize sources, in particular, suppressed interfering sources or background noise. In this paper, we present a novel algorithm that enhances the target signal while aiming to maintain the correct spatial rendering of both the target signal as well as the background noise. We use a bimodal approach, where a signal-to-noise ratio (SNR) estimator controls a binary decision mask, switching between the output signals of a binaural minimum variance distortionless response (MVDR) beamformer and scaled reference microphone signals. We show that the proposed selective binaural beamformer (SBB) can enhance the target signal while maintaining the overall spatial rendering of the acoustic scene.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Eleftheria Georganti; Tobias May; Steven van de Par; Aki Härmä; John Mourjopoulos
A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detection. The method is tested using a database of speech recordings in four rooms with different acoustical properties. Performance is shown to be independent of the signal gain and level, but depends on the reverberation time and the characteristics of the room. Overall, the system performs well especially for close distances and for rooms with low reverberation time and it appears to be robust to small distance mismatches. Finally, a listening test is conducted in order to compare the results of the proposed method to the performance of human listeners.
acm symposium on applied perception | 2017
Andreas Löcken; Sarah Blum; Tim Claudius Stratmann; Uwe Gruenefeld; Wilko Heuten; Susanne Boll; Steven van de Par
While performing multiple competing tasks at the same time, e.g., when driving, assistant systems can be used to create cues to direct attention towards required information. However, poorly designed cues will interrupt or annoy users and affect their performance. Therefore, we aim to identify cues that are not missed and trigger a quick reaction without changing the primary task performance. We conducted a dual-task experiment in an anechoic chamber with LED-based stimuli that faded in or turned on abruptly and were placed in the periphery or front of a subject. Additionally, a white noise sound was triggered in a third of the trials. The primary task was to react to visual stimuli placed on a screen in front. We observed significant effects on the response times in the screen task when adding sound. Further, participants responded faster to LED stimuli when they faded in.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Eugen Rasumow; Martin Hansen; Steven van de Par; Dirk Püschel; Volker Mellert; Simon Doclo; Matthias Blau
As an alternative to traditional artificial heads, it is possible to synthesize individual head-related transfer functions (HRTFs) using a so-called virtual artificial head (VAH), consisting of a microphone array with an appropriate topology and filter coefficients optimized using a narrowband least squares cost function. The resulting spatial directivity pattern of such a VAH is known to be sensitive to small deviations of the assumed microphone characteristics, e.g., gain, phase and/or the positions of the microphones. In many beamformer design procedures, this sensitivity is reduced by imposing a white noise gain (WNG) constraint on the filter coefficients for a single desired look direction. In this paper, this constraint is shown to be inappropriate for regularizing the HRTF synthesis with multiple desired directions and three alternative different regularization approaches are proposed and evaluated. In the first approach, the measured deviations of the microphone characteristics are taken into account in the filter design. In the second approach, the filter coefficients are regularized using the mean WNG for all directions. The third approach additionally takes into account several frequency bins into both the optimization and the regularization. The different proposed regularization approaches are compared using analytic and measured transfer functions, including random deviations. Experimental results show that the approach using multiple frequency bands mimicking the spectral resolution of the human auditory system yields the best robustness among the considered regularization approaches.
workshop on applications of signal processing to audio and acoustics | 2011
Tobias May; Steven van de Par; Ag Armin Kohlrausch
In this paper we present a novel system that is able to simultaneously localize and detect a predefined number of speech sources in complex acoustic scenes based on binaural signals. The system operates in two steps: First, the acoustic scene is analyzed by a binaural front-end that detects relevant sound source activity. Second, a speech detection module selects source positions from a set of candidate positions that are most likely speech. The proposed method is evaluated in simulated multi-source scenarios consisting of two speech sources, three interfering noise sources and reverberation.
european signal processing conference | 2015
Joachim Thiemann; Simon Doclo; Steven van de Par
Modern hearing aids often contain multiple microphones to enable the use of spatial filtering techniques for signal enhancement. To steer the spatial filtering algorithm it is necessary to localize sources of interest, which can be intelligently achieved using computational auditory scene analysis (CASA). In this article, we describe a CASA system using a binaural auditory processing model that has been extended to six channels to allow reliable localization in both azimuth and elevation, thus also distinguishing between front and back. The features used to estimate the direction are one level difference and five inter-microphone time differences of arrival (TDOA). Initial experiments are presented that show the localization errors that can be expected with this set of features on a typical multichannel hearing aid in anechoic conditions with diffuse noise.
international conference on multimedia and expo | 2017
Christopher Seifert; Joachim Thiemann; Lukas Gerlach; Tobias Volkmar; Guillermo Payá-Vayá; Holger Blume; Steven van de Par
Localization algorithms have become of considerable interest for robot audition, acoustic navigation, teleconferencing, speaker localization, and many other applications over the last decade. In this paper, we present a real-time implementation of a Gaussian mixture model (GMM) based probabilistic sound source localization algorithm for a low-power VLIW-SIMD processor for hearing devices. The algorithm has been proven to allow for robust localization of multiple sound sources simultaneously in reverberant and noisy environments. Real-time computation for audio frames of 512 samples at 16 kHz was achieved by introducing algorithmic optimizations and hardware customizations. To the best of our knowledge, this is the first real-time capable implementation of a computationally complex GMM-based sound source localization algorithm on a low-power processor. The resulting estimated core area without consideration of memory in 40nm low-power TSMC technology is 188,511 pm2.
international workshop on machine learning for signal processing | 2016
Asger Heidemann Andersen; Esther Schoenmaker; Steven van de Par
Speech Intelligibility Prediction (SIP) algorithms are becoming increasingly popular for objective evaluation of speech processing algorithms and transmission systems. Most often, SIP algorithms aim to predict the average intelligibility of an average listener in some specific listening condition. In the present work, we instead consider the aim of predicting the intelligibility of single words. I.e. we attempt to predict whether or not a subject in a listening experiment was able to correctly repeat a particular word. We base the prediction on a noisy and potentially processed/degraded recording of the spoken word (as presented to a subject), as well as a clean reference recording of the spoken word. The problem can be treated as a supervised binary classification problem of predicting whether a specific word will or will not be understood. We investigate a number of different ways to extract features from the degraded and clean speech samples. The classification is carried out by means of Fisher discriminant analysis. Despite the large variability of speech intelligibility experiments, it is possible to obtain a considerable degree of predictive power.