Friedrich Faubel
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Friedrich Faubel.
IEEE Signal Processing Letters | 2009
Friedrich Faubel; John W. McDonough; Dietrich Klakow
In this work we present a novel approach to nonlinear, non-Gaussian tracking problems based on splitting and merging Gaussian filters in order to increase the level of detail of the filtering density in likely regions of the state space and reduce it in unlikely ones. As this is only effective in the presence of nonlinearities, we describe a split control technique that prevents filters from being split if they operate in linear regions of state space. In simulations with polar measurements, the new algorithm reduced the mean square error by nearly 50% compared to the unscented Kalman filter.
international conference on acoustics, speech, and signal processing | 2009
Friedrich Faubel; John W. McDonough; Dietrich Klakow
In this work we show how conditional mean imputation can be bounded through the use of box-truncated Gaussian distributions. That is of interest when signals or features are partly occluded by a superimposed interference, as then the noisy observation poses an upper bound. Unfortunately, the occurring integrals are not analytic. Hence an approximate solution has to be used. In the experimental section we apply the bounded approach to the reconstruction of partly occluded speech spectra and demonstrate its superiority over the unbounded case with respect to automatic speech recognition performance.
international conference on acoustics, speech, and signal processing | 2010
Friedrich Faubel; John W. McDonough; Dietrich Klakow
In this work, we show how expectation maximization based simultaneous channel and noise estimation can be derived without a vector Taylor series expansion. The central idea is to approximate the distribution of all the random variables involved — that is noisy speech, clean speech, channel and noise — as one large, joint Gaussian distribution. Consequently, instantaneous estimates of the noise and channel distribution parameters can be obtained by conditioning the joint distribution on observed, noisy speech spectra. This approach allows for the combination of expectation maximization based channel and noise estimation with the unscented transform.
2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011
Friedrich Faubel; Munir Georges; Kenichi Kumatani; Andrés Bruhn; Dietrich Klakow
In this work, we show how the speech recognition performance in a noisy car environment can be improved by combining audio-visual voice activity detection (VAD) with microphone array processing techniques. That is accomplished by enhancing the multi-channel audio signal in the speaker localization step, through per channel power spectral subtraction whose noise estimates are obtained from the non-speech segments identified by VAD. This noise reduction step improves the accuracy of the estimated speaker positions and thereby the quality of the beamformed signal of the consecutive array processing step. Audio-visual voice activity detection has the advantage of being more robust in acoustically demanding environments. This claim is substantiated through speech recognition experiments on the AVICAR corpus, where the proposed localization framework gave a WER of 7.1% in combination with delay-and-sum beamforming. This compares to a WER of 8.9% for speaker localizing with audio-only VAD and 11.6% without VAD and 15.6 for a single distant channel.
international conference on acoustics, speech, and signal processing | 2013
Erich Zwyssig; Friedrich Faubel; Steve Renals; Mike Lincoln
This paper presents a new corpus comprising single and overlapping speech recorded using digital MEMS and analogue microphone arrays. In addition to this, the paper presents results from speech separation and recognition experiments on this data. The corpus is a reproduction of the multi-channel Wall Street Journal audio-visual corpus (MC-WSJAV), containing recorded speech in both a meeting room and an anechoic chamber using two different microphone types as well as two different array geometries. The speech separation and speech recognition experiments were performed using SRP-PHAT-based speaker localisation, superdirective beamforming and multiple post-processing schemes, such as residual echo suppression and binary masking. Our simple, cMLLR-based recognition system matches the performance of state-of-the-art ASR systems on the single speaker task and outperforms them on overlapping speech. The corpus will be made publicly available via the LDC in spring 2013.
international conference on acoustics, speech, and signal processing | 2013
Youssef Oualil; Mathew Magimai-Doss; Friedrich Faubel; Dietrich Klakow
This paper presents a novel probabilistic framework for localizing multiple speakers with a microphone array. In this framework, the generalized cross correlation function (GCC) of each microphone pair is interpreted as a probability distribution of the time difference of arrival (TDOA) and subsequently approximated as a Gaussian mixture. The distribution parameters are estimated with a weighted expectation maximization algorithm. Then, the joint distribution of the TDOA Gaussian mixtures is mapped to a multimodal distribution in the location space, where each mode represents a potential source location. The approach taken here performs the localization by 1) reducing the search space to some regions that are likely to contain a source and then 2) extracting the actual speaker locations with a numerical optimization algorithm. The effectiveness of the proposed approach is shown using the AV16.3 corpus.
international conference on acoustics, speech, and signal processing | 2010
Friedrich Faubel; Dietrich Klakow
In this work, we present a general method for approximating non-linear transformations of Gaussian mixture random variables. It is based on transforming the individual Gaussians with the unscented transform. The level of detail is adapted by iteratively splitting those components of the initial mixture that exhibited a high degree of nonlinearity during transformation. After each splitting operation, the affected components are re-transformed. This procedure gives more accurate results in cases where a Gaussian fit does not well represent the true distribution. Hence, it is of interest in a number of signal processing fields, ranging from nonlinear adaptive filtering to speech feature enhancement. In simulations, the proposed approach achieved a 48-fold reduction of the approximation error, compared to a single unscented transform.
2009 IEEE/SP 15th Workshop on Statistical Signal Processing | 2009
Friedrich Faubel; Dietrich Klakow
In the unscented Kalman filter (UKF), the state vector is typically augmented with process and measurement noise in order to approximate the joint predictive distribution of state and observation. For that, the unscented transform is used. As its point selection mechanism changes the higher order moments between the random variables, statistical independence is not preserved. In this work, we show how statistical independence can be preserved by representing independent variables by separate point-sets. In addition to that, we show how the Kalman filter (KF) can be derived based on a particular type of linear transform that allows for a more uniform treatment of KF and UKF.
international conference on acoustics, speech, and signal processing | 2013
Kenichi Kumatani; Rita Singh; Friedrich Faubel; John W. McDonough; Youssef Oualil
Adaptation techniques for speech recognition are very effective in single-speaker scenarios. However, when distant microphones capture overlapping speech from multiple speakers, conventional speaker adaptation methods are less effective. The putative signal for any speaker contains interference from other speakers. Consequently, any adaptation technique adapts the model to the interfering speakers as well, which leads to degradation of recognition performance for the desired speaker. In this work, we develop a new feature-space adaptation method for overlapping speech. We first build a beamformer to enhance speech from each active speaker. After that, we compute speech feature vectors from the output of each beamformer. We then jointly transform the feature vectors from all speakers to maximize the likelihood of their respective acoustic models. Experiments run on the speech separation challenge data collected under the AMI project demonstrate the effectiveness of our adaptation method. An absolute word error rate (WER) reduction up to 14 % was achieved in the case of delay-and-sum beamforming. With minimum mutual information (MMI) beamforming, our adaptation method achieved a WER of 31.5 %. To the best of our knowledge, this is the lowest WER reported on this task.
2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011
Barbara Rauch; Friedrich Faubel; Dietrich Klakow
This work extends a beamforming algorithm intended for automatic recognition of speech data captured with an array of distant microphones. In addition to enforcing a distortionless constraint in a desired direction, the algorithm adjusts the sensor weights so as to maximize a negentropy criterion. Negentropy is a measure of how non-Gaussian the probability density function (pdf) of a random variable is, and thus its computation depends on a number of pdf parameters. Here time-dependent pdf parameters are introduced to account for the nonstationarity of speech. Several methods are evaluated in a set of far-field ASR experiments. It is found that phone-length windows for the estimation lead to an increase of word error rate, and an analysis is provided that clarifies the reason for this behavior. Most importantly, we provide evidence that negentropy may not be an ideal cost criterion, not only when using phone-dependent parameters, but also in the original system.