Maja Taseska
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maja Taseska.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Oliver Thiergart; Maja Taseska; Emanuel A. P. Habets
Extracting desired source signals in noisy and reverberant environments is required in many hands-free communication systems. In practical situations, where the position and number of active sources may be unknown and time-varying, conventional implementations of spatial filters do not provide sufficiently good performance. Recently, informed spatial filters have been introduced that incorporate almost instantaneous parametric information on the sound field, thereby enabling adaptation to new acoustic conditions and moving sources. In this contribution, we propose a spatial filter which generalizes the recently proposed informed linearly constrained minimum variance filter and informed minimum mean square error filter. The proposed filter uses multiple direction-of-arrival estimates and second-order statistics of the noise and diffuse sound. To determine those statistics, an optimal diffuse power estimator is proposed that outperforms state-of-the-art estimators. Extensive performance evaluation demonstrates the effectiveness of the proposed filter in dynamic acoustic conditions. For this purpose, we have considered a challenging scenario which consists of quickly moving sound sources during double-talk. The performance of the proposed spatial filter was evaluated in terms of objective measures including segmental signal-to-reverberation ratio and log spectral distance, and by means of a listening test confirming the objective results.
Journal of the Acoustical Society of America | 2012
Giovanni Del Galdo; Maja Taseska; Oliver Thiergart; Jukka Ahonen; Ville Pulkki
Measuring the degree of diffuseness of a sound field is crucial in many modern parametric spatial audio techniques. In these applications, intensity-based diffuseness estimators are particularly convenient, as the sound intensity can also be used to obtain, e.g., the direction of arrival of the sound. This contribution reviews different diffuseness estimators comparing them under the conditions found in practice, i.e., with arrays of noisy microphones and with the expectation operators substituted by finite temporal averages. The estimators show a similar performance, however, each with specific advantages and disadvantages depending on the scenario. Furthermore, the paper derives an estimator and highlights the possibility of using spatial averaging to improve the temporal resolution of the estimates.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Maja Taseska; Emanuel A. P. Habets
Hands-free acquisition of speech is required in many human-machine interfaces and communication systems. The signals received by integrated microphones contain a desired speech signal, spatially coherent interfering signals, and background noise. In order to enhance the desired speech signal, state-of-the-art techniques apply data-dependent spatial filters which require the second order statistics (SOS) of the desired signal, the interfering signals and the background noise. As the number of sources and the reverberation time increase, the estimation accuracy of the SOS deteriorates, often resulting in insufficient noise and interference reduction. In this paper, a signal extraction framework with distributed microphone arrays is developed. An expectation maximization (EM)-based algorithm detects the number of coherent speech sources and estimates source clusters using time-frequency (TF) bin-wise position estimates. Subsequently, the second order statistics (SOS) are estimated using bin-wise speech presence probability (SPP) and a source probability for each source. Finally, a desired source is extracted using a minimum variance distortionless response (MVDR) filter, a multichannel Wiener filter (MWF) and a parametric multichannel Wiener filter (PMWF). The same framework can be employed for source separation, where a spatial filter is computed for each source considering the remaining sources as interferers. Evaluation using simulated and measured data demonstrates the effectiveness of the framework in estimating the number of sources, clustering, signal enhancement, and source separation.
IEEE Signal Processing Magazine | 2015
Konrad Kowalczyk; Oliver Thiergart; Maja Taseska; Giovanni Del Galdo; Ville Pulkki; Emanuel A. P. Habets
Flexible and efficient spatial sound acquisition and subsequent processing are of paramount importance in communication and assisted listening devices such as mobile phones, hearing aids, smart TVs, and emerging wearable devices (e.g., smart watches and glasses). In application scenarios where the number of sound sources quickly varies, sources move, and nonstationary noise and reverberation are commonly encountered, it remains a challenge to capture sounds in such a way that they can be reproduced with a high and invariable sound quality. In addition, the objective in terms of what needs to be captured, and how it should be reproduced, depends on the application and on the user?s preferences. Parametric spatial sound processing has been around for two decades and provides a flexible and efficient solution to capture, code, and transmit, as well as manipulate and reproduce spatial sounds.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Oliver Thiergart; Giovanni Del Galdo; Maja Taseska; Emanuel A. P. Habets
Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can be used to capture the sound field. From a geometric point of view, the perspective of the sound field is fixed when using such techniques. In this paper, a geometry-based spatial sound acquisition technique is proposed to compute virtual microphone signals that manifest a different perspective of the sound field. The proposed technique uses a parametric sound field model that is formulated in the time-frequency domain. It is assumed that each time-frequency instant of a microphone signal can be decomposed into one direct and one diffuse sound component. It is further assumed that the direct component is the response of a single isotropic point-like source (IPLS) of which the position is estimated for each time-frequency instant using distributed microphone arrays. Given the sound components and the position of the IPLS, it is possible to synthesize a signal that corresponds to a virtual microphone at an arbitrary position and with an arbitrary pick-up pattern.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Maja Taseska; Emanuel A. P. Habets
Hands-free capture of speech often requires extraction of sources from a certain spot of interest (SOI), while reducing interferers and background noise. Although state-of-the-art spatial filters are fully data-dependent and computed using the power spectral density (PSD) matrices of the desired and the undesired signals, the existing solutions to extract sources from a SOI are only partially data-dependent. Estimating the time-varying PSD matrices from the data is a challenging problem, especially in dynamic and quickly time-varying acoustic scenes. Hence, the spot signal statistics are often pre-computed based on a near-field propagation model, resulting in suboptimal filters. In this work, we propose a fully data-dependent spatial filtering framework for extraction of speech signals that originate from a SOI. To achieve position-based spatial selectivity, distributed arrays are used, which offer larger spatial diversity compared to arrays of closely spaced microphones. The PSD matrices of the desired and the undesired signals are updated at each time-frequency bin by using a minimum Bayes risk detector that is based on a probabilistic model of narrowband position estimates. The proposed framework is applicable in challenging multitalk situations, without requiring any prior information, except the geometry, location, and orientation of the arrays.
workshop on applications of signal processing to audio and acoustics | 2013
Maja Taseska; Emanuel A. P. Habets
Extracting sounds that originate from a specific location, while reducing noise and interferers is required in many hands-free communications systems. We propose a spotforming approach that uses distributed microphone arrays and aims at extracting sounds that originate from a pre-defined spot of interest (SOI), while reducing background noise and sounds that originate from outside the SOI. The spotformer is realized as a linear spatial filter, which is based on the signal statistics of sounds from the SOI, the signal statistics of sounds outside the SOI and the background noise signal statistics. The required signal statistics are estimated from the microphone signals, while taking into account the uncertainty in the location estimates of the desired and the interfering sound sources. The applicability of the method is demonstrated by simulations and the quality of the extracted signal is evaluated in different scenarios.
international conference on acoustics, speech, and signal processing | 2013
Maja Taseska; Emanuel A. P. Habets
A scenario with multiple talkers and additive background noise is considered, where some talkers are active simultaneously and the activity of the talkers changes with time. We propose an MMSE-based method to blindly extract any talker using bin-wise position estimates obtained from distributed microphone arrays. In order to distinguish between different talkers, the position estimates are clustered using the expectation maximization algorithm. The resulting posterior probabilities allow to estimate the PSD matrices of the talkers and compute an MMSE-optimal linear filter for extracting each talker. We evaluate the performance of the proposed method in terms of noise and interference reduction and distortion of the desired speech signal at the output of a multichannel Wiener filter.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Daniel P. Jarrett; Maja Taseska; Emanuel A. P. Habets; Patrick A. Naylor
In noise reduction, a common approach is to use a microphone array with a beamformer that combines the individual microphone signals to extract a desired speech signal. The beamformer weights usually depend on the statistics of the noise and desired speech signals, which cannot be directly observed and must be estimated. Estimators based on the speech presence probability (SPP) seek to update the statistics estimates only when desired speech is known to be absent or present. However, they do not normally distinguish between desired and undesired speech sources. In this contribution, an algorithm is proposed to distinguish between these two types of sources using additional spatial information, by estimating a desired speech presence probability based on the combination of a multichannel SPP and a direction of arrival (DOA) based probability. The DOA-based probability is computed using DOA estimates for each time-frequency bin. The estimated statistics are then used to compute the weights of a spherical harmonic domain tradeoff beamformer, which achieves a balance between noise reduction and speech distortion. The performance evaluation demonstrates the effectiveness of the proposed approach at suppressing both background noise and spatially coherent noise. A number of audio examples and sample spectrograms are also provided.
international conference on latent variable analysis and signal separation | 2015
Affan Hasan Khan; Maja Taseska; Emanuel A. P. Habets
In this paper, an online constrained independent vector analysis IVA algorithm that extracts the desired speech signal given the direction of arrival DOA of the desired source and the array geometry is proposed. The far-field array steering vector calculated using the DOA of the desired source is used to add a penalty term to the standard cost function of IVA. The penalty term ensures that the speech signal originating from the given DOA is extracted with small distortion. In contrast to unconstrained IVA, the proposed algorithm can be used to extract the desired speech signal online when the number of interferers is unknown or time varying. The applicability of the algorithm in various scenarios is demonstrated using simulations.