Nasser Mohammadiha
Royal Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nasser Mohammadiha.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Nasser Mohammadiha; Paris Smaragdis; Arne Leijon
Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.
IEEE Signal Processing Magazine | 2014
Paris Smaragdis; Cédric Févotte; Gautham J. Mysore; Nasser Mohammadiha; Matthew D. Hoffman
Source separation models that make use of nonnegativity in their parameters have been gaining increasing popularity in the last few years, spawning a significant number of publications on the topic. Although these techniques are conceptually similar to other matrix decompositions, they are surprisingly more effective in extracting perceptually meaningful sources from complex mixtures. In this article, we will examine the various methodologies and extensions that make up this family of approaches and present them under a unified framework. We will begin with a short description of the basic concepts and in the subsequent sections we will delve in more details and explore some of the latest extensions.
international conference on acoustics, speech, and signal processing | 2011
Jalal Taghia; Jalil Taghia; Nasser Mohammadiha; Jinqiu Sang; Vaclav Bouse; Rainer Martin
Noise power spectral density estimation is an important component of speech enhancement systems due to its considerable effect on the quality and the intelligibility of the enhanced speech. Recently, many new algorithms have been proposed and significant progress in noise tracking has been made. In this paper, we present an evaluation framework for measuring the performance of some recently proposed and some well-known noise power spectral density estimators and compare their performance in adverse acoustic environments. In this investigation we do not only consider the performance in the mean of a spectral distance measure but also evaluate the variance of the estimators as the latter is related to undesirable fluctuations also known as musical noise. By providing a variety of different non-stationary noises, the robustness of noise estimators in adverse environments is examined.
workshop on applications of signal processing to audio and acoustics | 2011
Nasser Mohammadiha; Timo Gerkmann; Arne Leijon
In this paper, a linear MMSE filter is derived for single-channel speech enhancement which is based on Nonnegative Matrix Factorization (NMF). Assuming an additive model for the noisy observation, an estimator is obtained by minimizing the mean square error between the clean speech and the estimated speech components in the frequency domain. In addition, the noise power spectral density (PSD) is estimated using NMF and the obtained noise PSD is used in a Wiener filtering framework to enhance the noisy speech. The results of the both algorithms are compared to the result of the same Wiener filtering framework in which the noise PSD is estimated using a recently developed MMSE-based method. NMF based approaches outperform the Wiener filter with the MMSE-based noise PSD tracker for different measures. Compared to the NMF-based Wiener filtering approach, Source to Distortion Ratio (SDR) is improved for the evaluated noise types for different input SNRs using the proposed linear MMSE filter.
international conference on acoustics, speech, and signal processing | 2012
Nasser Mohammadiha; Jalil Taghia; Arne Leijon
We present a speech enhancement algorithm which is based on a Bayesian Nonnegative Matrix Factorization (NMF). Both Minimum Mean Square Error (MMSE) and Maximum a-Posteriori (MAP) estimates of the magnitude of the clean speech DFT coefficients are derived. To exploit the temporal continuity of the speech and noise signals, a proper prior distribution is introduced by widening the posterior distribution of the NMF coefficients at the previous time frames. To do so, a recursive temporal update scheme is proposed to obtain the mean value of the prior distribution; also, the uncertainty of the prior information is governed by the shape parameter of the distribution which is learnt automatically based on the nonstationarity of the signals. Simulations show a considerable improvement compared to the maximum likelihood NMF based speech enhancement algorithm for different input SNRs.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Nasser Mohammadiha; Arne Leijon
Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g., noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, the fact that the babble waveform is generated as a sum of N different speech waveforms is not exploited explicitly. In this paper, first we develop a gamma hidden Markov model for power spectra of the speech signal, and then formulate it as a sparse nonnegative matrix factorization (NMF). Second, the sparse NMF is extended by relaxing the sparsity constraint, and a novel model for babble noise (gamma nonnegative HMM) is proposed in which the babble basis matrix is the same as the speech basis matrix, and only the activation factors (weights) of the basis vectors are different for the two signals over time. Finally, a noise reduction algorithm is proposed using the derived speech and babble models. All of the stationary model parameters are estimated using the expectation-maximization (EM) algorithm, whereas the time-varying parameters, i.e., the gain parameters of speech and babble signals, are estimated using a recursive EM algorithm. The objective and subjective listening evaluations show that the proposed babble model and the final noise reduction algorithm significantly outperform the conventional methods.
international conference on acoustics, speech, and signal processing | 2012
Jalil Taghia; Nasser Mohammadiha; Arne Leijon
In this paper, we propose a variational Bayes approach to the underdetermined blind source separation and show how a variational treatment can open up the possibility of determining the actual number of sources. The procedure is performed in a frequency bin-wise manner. In every frequency bin, we model the time-frequency mixture by a variational mixture of Gaussians with a circular-symmetric complex-Gaussian density function. In the Bayesian inference, we consider appropriate conjugate prior distributions for modeling the parameters of this distribution. The learning task consists of estimating the hyper-parameters characterizing the parameter distributions for the optimization of the variational posterior distribution. The proposed approach requires no prior knowledge on the number of sources in a mixture.
IEEE Signal Processing Letters | 2013
Nasser Mohammadiha; Rainer Martin; Arne Leijon
The derivation of MMSE estimators for the DFT coefficients of speech signals, given an observed noisy signal and super-Gaussian prior distributions, has received a lot of interest recently. In this letter, we look at the distribution of the periodogram coefficients of different phonemes, and show that they have a gamma distribution with shape parameters less than one. This verifies that the DFT coefficients for not only the whole speech signal but also for individual phonemes have super-Gaussian distributions. We develop a spectral domain speech enhancement algorithm, and derive hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain. The simulations show that the performance is improved using a gamma distribution compared to the exponential case. Moreover, we show that, even though beneficial in some aspects, the Mel-domain processing does not lead to better results than the algorithms in the high-resolution domain.
international conference on acoustics, speech, and signal processing | 2013
Nasser Mohammadiha; Paris Smaragdis; Arne Leijon
Nonnegative matrix factorization is an appealing technique for many audio applications. However, in its basic form it does not use temporal structure, which is an important source of information in speech processing. In this paper, we propose NMF-based filtering and smoothing algorithms that are related to Kalman filtering and smoothing. While our prediction step is similar to that of Kalman filtering, we develop a multiplicative update step which is more convenient for nonnegative data analysis and in line with existing NMF literature. The proposed smoothing approach introduces an unavoidable processing delay, but the filtering algorithm does not and can be readily used for on-line applications. Our experiments using the proposed algorithms show a significant improvement over the baseline NMF approaches. In the case of speech denoising with factory noise at 0 dB input SNR, the smoothing algorithm outperforms NMF with 3.2 dB in SDR and around 0.5 MOS in PESQ, likewise source separation experiments result in improved performance due to taking advantage of the temporal regularities in speech.
intelligent robots and systems | 2012
Ghazaleh Panahandeh; Nasser Mohammadiha; Magnus Jansson
In this paper, a method for determining ground plane features in a sequence of images captured by a mobile camera is presented. The hardware of the mobile system consists of a monocular camera that is mounted on an inertial measurement unit (IMU). An image processing procedure is proposed, first to extract image features and match them across consecutive image frames, and second to detect the ground plane features using a two-step algorithm. In the first step, the planar homography of the ground plane is constructed using an IMU-camera motion estimation approach. The obtained homography constraints are used to detect the most likely ground features in the sequence of images. To reject the remaining outliers, as the second step, a new plane normal vector computation approach is proposed. To obtain the normal vector of the ground plane, only three pairs of corresponding features are used for a general camera transformation. The normal-based computation approach generalizes the existing methods that are developed for specific camera transformations. Experimental results on real data validate the reliability of the proposed method.