Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mohammad H. Radfar is active.

Publication


Featured researches published by Mohammad H. Radfar.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Single-Channel Speech Separation Using Soft Mask Filtering

Mohammad H. Radfar; Richard M. Dansereau

We present an approach for separating two speech signals when only one single recording of their linear mixture is available. For this purpose, we derive a filter, which we call the soft mask filter, using minimum mean square error (MMSE) estimation of the log spectral vectors of sources given the mixtures log spectral vectors. The soft mask filters parameters are estimated using the mean and variance of the underlying sources which are modeled using the Gaussian composite source modeling (CSM) approach. It is also shown that the binary mask filter which has been empirically and extensively used in single-channel speech separation techniques is, in fact, a simplified form of the soft mask filter. The soft mask filtering technique is compared with the binary mask and Wiener filtering approaches when the input consists of male+male, female+female, and male+female mixtures. The experimental results in terms of signal-to-noise ratio (SNR) and segmental SNR show that soft mask filtering outperforms binary mask and Wiener filtering.


Eurasip Journal on Audio, Speech, and Music Processing | 2006

A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speechs log spectral vectors in terms of the PDFs of the underlying speech signals vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.


Speech Communication | 2007

Monaural speech segregation based on fusion of source-driven with model-driven techniques

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

In this paper by exploiting the prevalent methods in speech coding and synthesis, a new single channel speech segregation technique is presented. The technique integrates a model-driven method with a source-driven method to take advantage of both individual approaches and reduce their pitfalls significantly. We apply harmonic modelling in which the pitch and spectrum envelope are the main components for the analysis and synthesis stages. Pitch values of two speakers are obtained by using a source-driven method. The spectrum envelope, is obtained by using a new model-driven technique consisting of four components: a trained codebook of the vector quantized envelopes (VQ-based separation), a mixture-maximum approximation (MIXMAX), minimum mean square error estimator (MMSE), and a harmonic synthesizer. In contrast with previous model-driven techniques, this approach is speaker independent and can separate out the unvoiced regions as well as suppress the crosstalk effect which both are the drawbacks of source-driven or equivalently computational auditory scene analysis (CASA) models. We compare our fused model with both model- and source-driven techniques by conducting subjective and objective experiments. The results show that although for the speaker-dependent case, model-based separation delivers the best quality, for a speaker independent scenario the integrated model outperforms the individual approaches. This result supports the idea that the human auditory system takes on both grouping cues (e.g., pitch tracking) and a priori knowledge (e.g., trained quantized envelopes) to segregate speech signals.


international conference on acoustics, speech, and signal processing | 2010

Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separation

Mohammad H. Radfar; Willy Wong; Richard M. Dansereau; Wai-Yip Chan

In model-based single channel speech separation, factorial hidden Markov models (FHMM) have been successfully applied to model the mixture signal Y(t) = X(t) + V(t) in terms of trained patterns of the speech signals X(t) and V(t). Nonetheless, when the test signals are scaled versions of the trained patterns (i.e. gxX(t) and gvV(t)), the performance of FHMM degrades significantly. In this paper, we introduce a modification to FHMM, called scaled FHMM, which compensates gain difference. In this technique, first, the scale factors are expressed in terms of the target-to-interference ratio (TIR). Then, an iteration quadratic optimization approach is coupled with FHMM to estimate TIR which with the decoded HMM sequences maximize the likelihood of the mixture signal. Experimental results, conducted on 180 mixtures with TIRs from 0 to 15 dB, show that the proposed technique significantly outperforms unscaled FHMM, and scaled/unscaled vector quantization speech separation techniques.


signal processing systems | 2010

Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation

Mohammad H. Radfar; Richard M. Dansereau; Wai-Yip Chan

We present a new model-based monaural speech separation technique for separating two speech signals from a single recording of their mixture. This work is an attempt to solve a fundamental limitation in current model-based monaural speech separation techniques in which it is assumed that the data used in the training and test phases of the separation model have the same energy level. To overcome this limitation, a gain adapted minimum mean square error estimator is derived which estimates sources under different signal-to-signal ratios. Specifically, the speakers’ gains are incorporated as unknown parameters into the separation model and then the estimator is derived in terms of the source distributions and the signal-to-signal ratio. Experimental results show that the proposed system improves the separation performance significantly when compared with a similar model without gain adaptation as well as a maximum likelihood estimator with gain estimation.


international symposium on circuits and systems | 2005

An FPGA based implementation of G.729

N. Mobini; Mohammad H. Radfar

The main objective of this article is to present the implementation and simulation of a conjugate structure algebraic code excited linear prediction speech coder (CS-ACELP) based upon ITU-Ts G.729 recommendation and to optimize it for real-time implementation on an FPGA. The suggested architecture is characterized by pipelining and parallel operation of functional units, using fixed point twos complement representation for integers. The design was functionally verified by utilizing the ModelSim software package from Mentor Graphics Corporation Company and then synthesized by Xilinx Integrated Software Environment (ISE) 6.1 software. Preliminary results show that the overall system delay is less than 2 ms for each frame.


international symposium on signal processing and information technology | 2006

A Novel Low Complexity VQ-Based Single Channel Speech Separation Technique

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

In this paper, a new single channel speech separation technique based on vector quantization (VQ) and the MIXMAX approximation is presented. At the core of this approach are two trained codebooks of the quantized feature vectors of speakers, whereby the main evaluation for separation is performed. The performance of the VQ-based approach is evaluated by applying three separate features: log spectrum, modulated lapped transform (MLT) coefficients, and a fusion of pitch and envelop information. The experiments are conducted in two different scenarios: speaker-dependent and speaker independent. The results show that the log spectrum outperforms the other features for speaker-dependent scenario. However, for the speaker-independent scenario, the best results are obtained from applying the pitch-envelop feature


2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing | 2006

A Joint Probabilistic-Deterministic Approach using Source-Filter Modeling of Speech Signal for Single Channel Speech Separation

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

In this paper, we present a new technique for separating two speech signals from a single recording. For this purpose, we decompose the speech signal into the excitation signal and the vocal tract function and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speechs log spectral vectors in terms of the PDFs of the underlying speech signals vocal tract functions. Then, the mean vectors of PDFs of the vocal tract functions are obtained using a Maximum Likelihood estimator given the mixed signal. Finally, the estimated vocal tract function along with the extracted pitch values are used to reconstruct estimates of the individual speech signals. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.


international conference on acoustics, speech, and signal processing | 2011

MPtracker: A new multi-pitch detection and separation algorithm for mixed speech signals

Mohammad H. Radfar; Richard M. Dansereau; Wai-Yip Chan; Willy Wong

We present MPtracker, a new algorithm for tracking and separating the pitch frequencies of two speakers from their mixture. The pitch frequencies are detected by introducing a novel spectral distortion optimization which takes into account the sinusoidal modeling of the speech signal. The detected pitch frequencies are grouped, separated, and finally an interpolation method is applied to estimate missing pitch frequencies. We evaluated the performance of the proposed technique on 196 mixtures including 48 male-male, 48 female-female, and 96 male-female mixtures with target-to-interference ratios (TIR) ranging from 0 dB to +18 dB. The results show our simple but effective and fast technique significantly outperforms two widely-used approaches 1.


canadian conference on electrical and computer engineering | 2011

A voice activated device for insulin dosage calculations for visually impaired diabetes

Mohammad H. Radfar; M. Hamilton; A. Ming

Diabetes places dietary burdens on those who suffer from it. Managing these burdens often requires performing calculations. Bolus calculators have been developed to facilitate these calculations, but fail to address accessibility issues presented by visual impairment, often associated with diabetes. The device presented herein is designed to provide an accessible bolus calculator for the visually impaired. This is primarily accomplished through a voice based interface. The user may speak the name and portion size of the food to be consumed. A word recognition algorithm matches the spoken name with one stored in a food database on the device. From the database the quantity of carbohydrates contained in the food are found. This data, along with a blood sugar measurement, is used to compute a recommended bolus dose, which is then presented to the user. It is expected that this device will improve the visually impaireds ability to manage their diabetes.

Collaboration


Dive into the Mohammad H. Radfar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A. Ming

University of Toronto

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. Wong

University of Toronto

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge