Balázs Fodor
Braunschweig University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Balázs Fodor.
international conference on acoustics, speech, and signal processing | 2012
Balázs Fodor; Tim Fingscheidt
Several investigations showed that speech enhancement approaches can be improved by speech presence uncertainty (SPU) estimation. Although there has been a strong focus on the use of correct statistical models for spectral weighting rules for the last few decades, there is just a few publications about SPU estimation based on a speech prior consistent with the spectral weighting rule. This contribution presents a new consistent solution for MMSE speech amplitude (SA) estimation under SPU, being based on the generalized gamma distribution representing a variety of speech priors. Employing the gamma speech model which is a special case of the generalized gamma distribution, the new approach is shown to outperform both the SPU-based MMSE-SA estimator relying on a Gaussian speech prior, and the gamma MMSE-SA estimation without SPU.
international conference on acoustics, speech, and signal processing | 2011
Balázs Fodor; Tim Fingscheidt
In many applications non-stationary Gaussian or stationary non-Gaussian noises can be observed. In this paper we present a maximum a posteriori estimation jointly of spectral amplitude and phase (JMAP). It principally allows for arbitrary speech models (Gaussian, super-Gaussian, …), while the noise DFT coefficients pdf is modeled as Gaussian mixture (GMM). Such a GMM covers both a non-Gaussian stationary noise process, but also a non-stationary process that changes between Gaussian noise modes of different variance with probability of the GMM weight. Accordingly, we provide results for these two types of noise, showing superiority over the Gaussian noise model JMAP estimator even in case of ideal noise power estimation.
EURASIP Journal on Advances in Signal Processing | 2015
Balázs Fodor; Florian Pflug; Tim Fingscheidt
Speech enhancement and error concealment have seen a considerable progress over the past decades. Although both fields deal with distorted speech signals, there has rarely been an attempt to relate respective approaches to each other. In this paper, for the first time, a clear synopsis of recursive minimum mean square error (MMSE) estimation in both fields is provided. Our work intentionally does not propose a certain algorithm furthering the state of the art, nor does it provide simulation results of algorithms. Instead, our aim is threefold: First we revisit the basics of Bayes estimation in a recursive manner, covering both kinds of distortion acoustic noise as well as transmission channel noise. Second, we present recursive MMSE estimation applied to speech enhancement (in the frequency domain, as typical) and applied to error concealment (in the time domain, as typical) in strictly coherent notations and provide respective overview diagrams. Finally, we discuss commonalities and differences between both approaches, identify a particular strength of error concealment in general, and provide possible research directions for speech enhancement. A particularly interesting observation is that noise introduced by error concealment is far from being Gaussian and that additive acoustic noise can be expressed in terms of bit errors in DFT coefficients providing a potential interface to error concealment approaches.
international workshop on acoustic signal enhancement | 2014
Balázs Fodor; Timo Gerkmann
Explicit information about speech presence or absence is needed in many speech processing applications. In a Bayesian estimation framework, this information can be provided by an a posteriori speech presence probability (SPP) estimator. Recent improvements in SPP estimation include likelihoods of speech presence based on a super-Gaussian speech model or, alternatively, based on averaged observations. In this paper, we combine these aspects and derive a closed form solution for the likelihood of speech presence based on both averaged observations and a super-Gaussian speech model. The new approach is shown to outperform competing methods that either include averaging or super-Gaussian speech models.
Archive | 2012
Balázs Fodor; David Scheler; Tim Fingscheidt
The obligation to press a push-to-speak button before issuing a voice command to a speech dialog system is not only inconvenient but it also leads to decreased recognition accuracy if the user starts speaking prematurely. In this chapter, we investigate the performance of a so-called talk-and-push (TAP) system, which permits the user to begin an utterance within a certain time frame before or after pressing the button. This is achieved using a speech signal buffer in conjunction with an acoustic echo cancelation unit and a combined noise reduction and start-of-utterance detection. In comparison with a state-of-the-art system employing loudspeaker muting, the TAP system delivers significant improvements in the word error rate.
Speech Communication; 10. ITG Symposium; Proceedings of | 2012
Balázs Fodor; Tim Fingscheidt
european signal processing conference | 2011
Balázs Fodor; Tim Fingscheidt
european signal processing conference | 2014
Balázs Fodor; Timo Gerkmann
Speech Communication; 10. ITG Symposium; Proceedings of | 2012
Balázs Fodor; Tim Fingscheidt
Acoustic Signal Enhancement; Proceedings of IWAENC 2012; International Workshop on | 2012
Balázs Fodor; Tim Fingscheidt