Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pejman Mowlaee is active.

Publication


Featured researches published by Pejman Mowlaee.


IEEE Signal Processing Letters | 2013

Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement

Pejman Mowlaee; Rahim Saeidi

Many short-time Fourier transform (STFT) based single-channel speech enhancement algorithms are focused on estimating the clean speech spectral amplitude from the noisy observed signal in order to suppress the additive noise. To this end, they utilize the noisy amplitude information and the corresponding a priori and a posteriori SNRs while they employ the observed noisy phase when reconstructing enhanced speech signal. This paper presents two contributions: i) reconsidering the relation between the phase group delay deviation and phase deviation, and ii) proposing a closed-loop single-channel speech enhancement approach to estimate both amplitude and phase spectra of the speech signal. To this end, we combine a group-delay based phase estimator with a phase-aware amplitude estimator in a closed loop design. Our experimental results on various noise scenarios show considerable improvement in the objective perceived signal quality obtained by the proposed iterative phase-aware approach compared to conventional Wiener filtering which uses the noisy phase in signal reconstruction.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Pejman Mowlaee; Mads Græsbøll Christensen; Søren Holdt Jensen

We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mixture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at finding sinusoidal parameters in the form of codevectors from vector quantization (VQ) codebooks pre-trained for speakers that, when combined, best fit the observed mixed signal. The selected codevectors are then used to reconstruct the recovered signals for the speakers in the mixture. Compared to the log-max mixture estimator used in binary masks and the Wiener filtering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational complexity of the separation by replacing the short-time Fourier transform (STFT) feature vectors of high dimensionality with sinusoidal feature vectors. We report separation results for the proposed method and compare them with respect to other benchmark methods. The improvements made by applying the proposed method over other methods are confirmed by employing perceptual evaluation of speech quality (PESQ) as an objective measure and a MUSHRA listening test as a subjective evaluation for both speaker-dependent and gender-dependent scenarios.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

A Joint Approach for Single-Channel Speaker Identification and Speech Separation

Pejman Mowlaee; Rahim Saeidi; Mads Græsbøll Christensen; Zheng-Hua Tan; Tomi Kinnunen; Pasi Fränti; Søren Holdt Jensen

In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results are generally lower. It outperforms the state-of-the-art in terms of intelligibility showing that the ASR results are not conclusive. The proposed method achieves on average, 52.3% ASR accuracy, 41.2 points in MUSHRA and 85.9% in speech intelligibility.


Speech Communication | 2016

Advances in phase-aware signal processing in speech communication

Pejman Mowlaee; Rahim Saeidi; Yannis Stylianou

Abstract During the past three decades, the issue of processing spectral phase has been largely neglected in speech applications. There is no doubt that the interest of speech processing community towards the use of phase information in a big spectrum of speech technologies, from automatic speech and speaker recognition to speech synthesis, from speech enhancement and source separation to speech coding, is constantly increasing. In this paper, we elaborate on why phase was believed to be unimportant in each application. We provide an overview of advancements in phase-aware signal processing with applications to speech, showing that considering phase-aware speech processing can be beneficial in many cases, while it can complement the possible solutions that magnitude-only methods suggest. Our goal is to show that phase-aware signal processing is an important emerging field with high potential in the current speech communication applications. The paper provides an extended and up-to-date bibliography on the topic of phase aware speech processing aiming at providing the necessary background to the interested readers for following the recent advancements in the area. Our review expands the step initiated by our organized special session and exemplifies the usefulness of spectral phase information in a wide range of speech processing applications. Finally, the overview will provide some future work directions.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Phase estimation in single-channel speech enhancement: limits-potential

Pejman Mowlaee; Josef Kulmer

In this paper, we present an overview on the previous and recent methods proposed to estimate a clean spectral phase from a noisy observation in the context of single-channel speech enhancement. The importance of phase estimation in speech enhancement is inspired by the recent reports on its usefulness in finding a phase-sensitive amplitude estimation. We present a comparative study of the recent phase estimation methods and elaborate their limits. We propose a new phase enhancement method relying on phase decomposition and time-frequency smoothing filters. We demonstrate that the proposed time-frequency phase smoothing method successfully reduces the variance of the noisy phase at harmonics. Our results on different speech and noise databases and different signal-to-noise ratios show that in contrast to the existing benchmark methods only the proposed method balances a tradeoff between a joint improvement in perceived quality of 0.2 in PESQ score and speech intelligibility of 2% by phase-only enhancement.


IEEE Signal Processing Letters | 2015

Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition

Josef Kulmer; Pejman Mowlaee

Conventional speech enhancement methods typically utilize the noisy phase spectrum for signal reconstruction. This letter presents a novel method to estimate the clean speech phase spectrum, given the noisy speech observation in single-channel speech enhancement. The proposed method relies on the phase decomposition of the instantaneous noisy phase spectrum followed by temporal smoothing in order to reduce the large variance of noisy phase, and consequently reconstructs an enhanced instantaneous phase spectrum for signal reconstruction. The effectiveness of the proposed method is evaluated in two ways: phase enhancement-only and by quantifying the additional improvement on top of the conventional amplitude enhancement scheme where noisy phase is often used in signal reconstruction. The instrumental metrics predict a consistent improvement in perceived speech quality and speech intelligibility when the noisy phase is enhanced using the proposed phase estimation method.


international conference on acoustics, speech, and signal processing | 2014

Modeling speech with sum-product networks: Application to bandwidth extension

Robert Peharz; Georg Kapeller; Pejman Mowlaee; Franz Pernkopf

Sum-product networks (SPNs) are a recently proposed type of probabilistic graphical models allowing complex variable interactions while still granting efficient inference. In this paper we demonstrate the suitability of SPNs for modeling log-spectra of speech signals using the application of artificial bandwidth extension, i.e. artificially replacing the high-frequency content which is lost in telephone signals. We use SPNs as observation models in hidden Markov models (HMMs), which model the temporal evolution of log short-time spectra. Missing frequency bins are replaced by the SPNs using most-probable-explanation inference, where the state-dependent reconstructions are weighted with the HMM state posterior. According to subjective listening and objective evaluation, our system consistently and significantly improves the state of the art.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information

Pejman Mowlaee; Josef Kulmer

In conventional single-channel speech enhancement, typically the noisy spectral amplitude is modified while the noisy phase is used to reconstruct the enhanced signal. Several recent attempts have shown the effectiveness of utilizing an improved spectral phase for phase-aware speech enhancement and consequently its positive impact on the perceived speech quality. In this paper, we present a harmonic phase estimation method relying on fundamental frequency and signal-to-noise ratio (SNR) information estimated from noisy speech. The proposed method relies on SNR-based time-frequency smoothing of the unwrapped phase obtained from the decomposition of the noisy phase. To incorporate the uncertainty in the estimated phase due to unreliable voicing decision and SNR estimate, we propose a binary hypothesis test assuming speech-present and speech-absent classes representing high and low SNRs. The effectiveness of the proposed phase estimation method is evaluated for both phase-only enhancement of noisy speech and in combination with an amplitude-only enhancement scheme. We show that by enhancing the noisy phase both perceived speech quality as well as speech intelligibility are improved as predicted by the instrumental metrics and justified by subjective listening tests.


international conference on pattern recognition | 2010

Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

Rahim Saeidi; Pejman Mowlaee; Tomi Kinnunen; Zheng-Hua Tan; Mads Græsbøll Christensen; Søren Holdt Jensen; Pasi Fränti

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper, we approach the problem without estimating SSR. We show that a simple method based on fusion of adapted Gaussian mixture models and Kullback-Leibler divergence calculated between models, achieves an accuracy of 97% and 93% when the two target speakers enlisted as three and two most probable speakers, respectively.


international conference on acoustics, speech, and signal processing | 2013

On phase importance in parameter estimation in single-channel speech enhancement

Pejman Mowlaee; Rahim Saeidi

In this paper, we study the impact of exploiting the spectral phase information to further improve the speech quality of the single-channel speech enhancement algorithms. In particular, we focus on the two required steps in a typical single-channel speech enhancement system, namely: parameter estimation solved by a minimum mean square error (MMSE) estimator of the speech spectral amplitude, followed by signal reconstruction stage, where the observed noisy phase is often used. For the parameter estimation stage, in contrast to conventional Wiener filter, a new MMSE estimator is derived which takes into account the clean phase information as a prior information. In our experiments, we show that by including the phase information in the two steps, it is possible to improve the perceived signal quality of the enhanced signal significantly with respect to the methods that do not employ the phase information.

Collaboration


Dive into the Pejman Mowlaee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Josef Kulmer

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Johannes Stahl

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mario Kaoru Watanabe

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pasi Fränti

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Researchain Logo
Decentralizing Knowledge