Jalal Taghia
Ruhr University Bochum
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jalal Taghia.
international conference on acoustics, speech, and signal processing | 2011
Jalal Taghia; Jalil Taghia; Nasser Mohammadiha; Jinqiu Sang; Vaclav Bouse; Rainer Martin
Noise power spectral density estimation is an important component of speech enhancement systems due to its considerable effect on the quality and the intelligibility of the enhanced speech. Recently, many new algorithms have been proposed and significant progress in noise tracking has been made. In this paper, we present an evaluation framework for measuring the performance of some recently proposed and some well-known noise power spectral density estimators and compare their performance in adverse acoustic environments. In this investigation we do not only consider the performance in the mean of a spectral distance measure but also evaluate the variance of the estimators as the latter is related to undesirable fluctuations also known as musical noise. By providing a variety of different non-stationary noises, the robustness of noise estimators in adverse environments is examined.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Jalal Taghia; Rainer Martin
We propose a novel method for objective speech intelligibility prediction which can be useful in many application domains such as hearing instruments and forensics. Most objective intelligibility measures available in the literature employ some kind of signal-to-noise ratio (SNR) or a correlation-based comparison between the spectro-temporal representations of clean and processed speech. In this paper, we investigate the speech intelligibility prediction from the viewpoint of information theory and introduce novel objective intelligibility measures based on the estimated mutual information between the temporal envelopes of clean speech and processed speech in the subband domain. Mutual information allows to account for higher order statistics and hence to consider dependencies beyond the conventional second order statistics. Using data from three different listening tests it is shown that the proposed objective intelligibility measures provide promising results for speech intelligibility prediction in different scenarios of speech enhancement where speech is processed by non-linear modification strategies.
congress on image and signal processing | 2008
Jalil Taghia; Mohammad Ali Doostari; Jalal Taghia
In this paper, we propose a blind image watermarking scheme based on bidimensional empirical mode decomposition (BEMD). BEMD is a possible 2D extension of empirical mode decomposition (EMD). We employ BEMD in watermark embedding and watermark extraction. In watermark embedding scheme at first, the original image is divided into K sub-images then in order to obtain a set of 2D-IMFs BEMD is applied to each sub-image and watermark. For watermark embedding each 2D-IMF, which is extracted from watermark, is placed instead of one of the 2D-IMFs which are extracted from each sub-image in a special procedure. On the other hand the proposed method in watermark extraction is based on BEMD and clustering method with metric, local linear structure and affine symmetry to extract watermark blindly. We perform two classes of tests in our experiments: First, we measure imperceptibility of watermark and then we examine the performance against different kinds of attacks.
international conference on acoustics, speech, and signal processing | 2012
Jalal Taghia; Rainer Martin; Richard C. Hendriks
Speech intelligibility prediction of noisy and processed noisy speech is important in a number of application domains such as hearing instruments and forensics. Most available objective intelligibility measures employ either a signal-to-noise ratio (SNR)-based or correlation-based comparison between frequency bands of the clean and the processed speech. In this paper, we approach the speech intelligibility prediction from the angle of information theory and show that an information theoretic concept provides a unified viewpoint on both the SNR and the correlation based approaches. Two objective intelligibility measures are introduced based on estimated mutual information between the clean speech and the processed speech in the time and the frequency subband domain. Our proposed measures show high correlation with subjective intelligibility measure (i.e. word correct scores) and comparative results with the short-term objective intelligibility measure (STOI).
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Jalal Taghia; Rainer Martin
We propose an adaptive line enhancer with a frequency-dependent step-size. The proposed frequency-domain adaptive line enhancer is used as a single-channel noise reduction system for removing harmonic noise from noisy speech. Our main contribution is to exploit the temporal dependence in the log-magnitude and phase spectra of the noisy speech using mutual information, and to derive a frequency-dependent step-size which detects the presence of harmonic noise in different frequency bins. Our proposed step-size control allows the suppression of harmonic noise and the preservation of speech components. The experiments are performed with different real-life acoustic noises which contain harmonic components. Using instrumental speech intelligibility and quality measures, we demonstrate that the proposed approach can outperform the conventional frequency-domain adaptive line enhancer with a fixed step-size for harmonic noise reduction.
international conference on acoustics, speech, and signal processing | 2013
Aleksej Chinaev; Jalal Taghia; Rainer Martin
In this paper we present an improved version of the recently proposed Maximum A-Posteriori (MAP) based noise power spectral density estimator. An empirical bias compensation and bandwidth adjustment reduce bias and variance of the noise variance estimates. The main advantage of the MAP-based postprocessor is its low estimation variance. The estimator is employed in the second stage of a two-stage single-channel speech enhancement system, where eight different state-of-the-art noise tracking algorithms were tested in the first stage. While the postprocessor hardly affects the results in stationary noise scenarios, it becomes the more effective the more nonstationary the noise is. The proposed postprocessor was able to improve all systems in babble noise w.r.t. the perceptual evaluation of speech quality performance.
international conference on acoustics, speech, and signal processing | 2013
Jalal Taghia; Rainer Martin; Arne Leijon
In this paper a model-based dual-channel noise reduction approach is presented which is an alternative to conventional noise reduction algorithms essentially due to its independence of the noise power spectral density estimation and of any prior knowledge about the spatial noise field characteristics. We use a mixture of circular-symmetric complex-Gaussian distributions projected on the unit hypersphere for modeling the complex discrete Fourier transform coefficients of noisy speech signals in the frequency domain. According to the derived mixture model, clustering of the noise and the target speech components is performed depending on their direction of arrival. A soft masking strategy is proposed for speech enhancement based on responsibilities assigned to the target speech class in each time-frequency bin. Our experimental results show that the proposed approach is more robust than conventional dual-channel noise reduction systems based on the single- and dual-channel noise power spectral density estimators.
international conference on acoustics, speech, and signal processing | 2014
Jalal Taghia; Rainer Martin
In this paper, we propose an adaptive line enhancer based on negentropy for single-channel noise reduction. Our proposed approach can be integrated in a speech enhancement system as a preprocessor to be combined with other noise reduction approaches. The proposed method performs the noise reduction by splitting the noisy speech components into the deterministic and the stochastic parts through the minimization of negentropy in an adaptive manner. We consider the negentropy as a cost function, and we derive a learning rule via Newtons method to minimize the negentropy of the error signal. By the experimental results, we demonstrate that exploiting the proposed approach can be potentially useful as a preprocessor for improving the performance of conventional single-channel noise reduction approaches at low signal-to-noise ratio (SNR) conditions. Moreover, it is shown that our approach by itself can also enhance the noisy speech in an adverse noisy environment.
ieee international conference on intelligent systems and knowledge engineering | 2011
Hongmei Hu; Jalil Taghia; Jinqiu Sang; Jalal Taghia; Nasser Mohammadiha; Masoumeh Azarpour; Rajyalakshmi Dokku; Shouyan Wang; Mark E. Lutman; Stefan Bleeck
Automatic speech recognition (ASR) often fails in acoustically noisy environments. Aimed to improve speech recognition scores of an ASR in a real-life like acoustical environment, a speech pre-processing system is proposed in this paper, which consists of several stages: First, a convolutive blind source separation (BSS) is applied to the spectrogram of the signals that are pre-processed by binaural Wiener filtering (BWF). Secondly, the target speech is detected by an ASR system recognition rate based on a Hidden Markov Model (HMM). To evaluate the performance of the proposed algorithm, the signal-to-interference ratio (SIR), the improvement signal-to-noise ratio (ISNR) and the speech recognition rates of the output signals were calculated using the signal corpus of the CHiME database. The results show an improvement in SIR and ISNR, but no obvious improvement of speech recognition scores. Improvements for future research are suggested.
international workshop on acoustic signal enhancement | 2016
Jalal Taghia; Dorothea Kolossa; Rainer Martin
In this paper we investigate an Adaptive Line Enhancer (ALE) for the cancellation of robot self-noise. In contrast to many other methods, it requires only a single microphone and can be combined with any other single- and multi-channel noise reduction method. The proposed ALE is implemented in the frequency domain (FDALE) and performs noise cancellation with respect to magnitude and phase. It therefore has the potential to reduce noise components without introducing distortions of the target signal. We combine the ALE with a traditional single-channel noise reduction filter, where the former cancels predictable noise components and the latter suppresses the random noise components. We apply this approach to an automatic speech recognition task and show that significant improvements can be obtained.