Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Richard M. Dansereau is active.

Publication


Featured researches published by Richard M. Dansereau.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Single-Channel Speech Separation Using Soft Mask Filtering

Mohammad H. Radfar; Richard M. Dansereau

We present an approach for separating two speech signals when only one single recording of their linear mixture is available. For this purpose, we derive a filter, which we call the soft mask filter, using minimum mean square error (MMSE) estimation of the log spectral vectors of sources given the mixtures log spectral vectors. The soft mask filters parameters are estimated using the mean and variance of the underlying sources which are modeled using the Gaussian composite source modeling (CSM) approach. It is also shown that the binary mask filter which has been empirically and extensively used in single-channel speech separation techniques is, in fact, a simplified form of the soft mask filter. The soft mask filtering technique is compared with the binary mask and Wiener filtering approaches when the input consists of male+male, female+female, and male+female mixtures. The experimental results in terms of signal-to-noise ratio (SNR) and segmental SNR show that soft mask filtering outperforms binary mask and Wiener filtering.


Eurasip Journal on Audio, Speech, and Music Processing | 2006

A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speechs log spectral vectors in terms of the PDFs of the underlying speech signals vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.


instrumentation and measurement technology conference | 2003

Robust joint audio-video localization in video conferencing using reliability information

David Lo; Rafik A. Goubran; Richard M. Dansereau; Graham Thompson; Dieter Schulz

This study builds on our previous IMTC03 paper (D. Lo. R. Goubran et al, Proc. 20th IEEE Instrument. and Meas., vol.2. p.1414-1418, 2003). Both this study and our previous paper use data fusion to combine results from multiple audio and video localizers. The two studies differ in the type of data fusion engine used. The former study explored the use of a summing voter, whereas this current study employs the use of a Bayesian network. The novelty of both papers is the use of reliability estimates to improve the overall localization performance and robustness. Reliability estimates, that are derived based on known physical properties of each individual localizer, were introduced into the fusion engines to achieve better performance. Although the summing voter fusion engine used in the last paper improves the overall localization performance, it does not take into account the unique characteristics of each localizer. The Bayesian network allows these characteristics to be included as part of the fusion process. In this study, we investigate the impact of (1) using a Bayesian network as the data fusion engine, and (2) adding reliability estimates into the fusion engine.This paper proposes a new method for performing joint audio-video localization that explores the reliability of the individual localization estimates. The reliability information is estimated from the audio and video data separately. The proposed method uses this reliability information in conjunction with a simple summing voter to dynamically discriminate erroneous outputs from the audio and video localizers while performing fusion on the localization results. Based on the voter output, a majority rule is then used to make the final decision of the active talkers current location. The results show that adding the reliability information during fusion improves localization performance when compared with audio only, video only and joint audio-video using straight summing fusion localization methods. The computational complexity of the proposed method is comparable to the existing ones.


Speech Communication | 2007

Monaural speech segregation based on fusion of source-driven with model-driven techniques

Mohammad H. Radfar; Richard M. Dansereau; Abolghasem Sayadiyan

In this paper by exploiting the prevalent methods in speech coding and synthesis, a new single channel speech segregation technique is presented. The technique integrates a model-driven method with a source-driven method to take advantage of both individual approaches and reduce their pitfalls significantly. We apply harmonic modelling in which the pitch and spectrum envelope are the main components for the analysis and synthesis stages. Pitch values of two speakers are obtained by using a source-driven method. The spectrum envelope, is obtained by using a new model-driven technique consisting of four components: a trained codebook of the vector quantized envelopes (VQ-based separation), a mixture-maximum approximation (MIXMAX), minimum mean square error estimator (MMSE), and a harmonic synthesizer. In contrast with previous model-driven techniques, this approach is speaker independent and can separate out the unvoiced regions as well as suppress the crosstalk effect which both are the drawbacks of source-driven or equivalently computational auditory scene analysis (CASA) models. We compare our fused model with both model- and source-driven techniques by conducting subjective and objective experiments. The results show that although for the speaker-dependent case, model-based separation delivers the best quality, for a speaker independent scenario the integrated model outperforms the individual approaches. This result supports the idea that the human auditory system takes on both grouping cues (e.g., pitch tracking) and a priori knowledge (e.g., trained quantized envelopes) to segregate speech signals.


IEEE Transactions on Image Processing | 2012

Image Registration Under Illumination Variations Using Region-Based Confidence Weighted

Mohamed M. Fouad; Richard M. Dansereau; Anthony Whitehead

We present an image registration model for image sets with arbitrarily shaped local illumination variations between images. Any nongeometric variations tend to degrade the geometric registration precision and impact subsequent processing. Traditional image registration approaches do not typically account for changes and movement of light sources, which result in interimage illumination differences with arbitrary shape. In addition, these approaches typically use a least-square estimator that is sensitive to outliers, where interimage illumination variations are often large enough to act as outliers. In this paper, we propose an image registration approach that compensates for arbitrarily shaped interimage illumination variations, which are processed using robust M-estimators tuned to that region. Each M-estimator for each illumination region has a distinct cost function by which small and large interimage residuals are unevenly penalized. Since the segmentation of the interimage illumination variations may not be perfect, a segmentation confidence weighting is also imposed to reduce the negative effect of mis-segmentation around illumination region boundaries. The proposed approach is cast in an iterative coarse-to-fine framework, which allows a convergence rate similar to competing intensity-based image registration approaches. The overall proposed approach is presented in a general framework, but experimental results use the bisquare M-estimator with region segmentation confidence weighting. A nearly tenfold improvement in subpixel registration precision is seen with the proposed technique when convergence is attained, as compared with competing techniques using both simulated and real data sets with interimage illumination variations.


instrumentation and measurement technology conference | 2004

M

David Lo; Rafik A. Goubran; Richard M. Dansereau

This study builds on our previous IMTC03 paper (D. Lo. R. Goubran et al, Proc. 20th IEEE Instrument. and Meas., vol.2. p.1414-1418, 2003). Both this study and our previous paper use data fusion to combine results from multiple audio and video localizers. The two studies differ in the type of data fusion engine used. The former study explored the use of a summing voter, whereas this current study employs the use of a Bayesian network. The novelty of both papers is the use of reliability estimates to improve the overall localization performance and robustness. Reliability estimates, that are derived based on known physical properties of each individual localizer, were introduced into the fusion engines to achieve better performance. Although the summing voter fusion engine used in the last paper improves the overall localization performance, it does not take into account the unique characteristics of each localizer. The Bayesian network allows these characteristics to be included as part of the fusion process. In this study, we investigate the impact of (1) using a Bayesian network as the data fusion engine, and (2) adding reliability estimates into the fusion engine.


international conference on acoustics, speech, and signal processing | 2010

-Estimators

Mohammad H. Radfar; Willy Wong; Richard M. Dansereau; Wai-Yip Chan

In model-based single channel speech separation, factorial hidden Markov models (FHMM) have been successfully applied to model the mixture signal Y(t) = X(t) + V(t) in terms of trained patterns of the speech signals X(t) and V(t). Nonetheless, when the test signals are scaled versions of the trained patterns (i.e. gxX(t) and gvV(t)), the performance of FHMM degrades significantly. In this paper, we introduce a modification to FHMM, called scaled FHMM, which compensates gain difference. In this technique, first, the scale factors are expressed in terms of the target-to-interference ratio (TIR). Then, an iteration quadratic optimization approach is coupled with FHMM to estimate TIR which with the decoded HMM sequences maximize the likelihood of the mixture signal. Experimental results, conducted on 180 mixtures with TIRs from 0 to 15 dB, show that the proposed technique significantly outperforms unscaled FHMM, and scaled/unscaled vector quantization speech separation techniques.


Speech Communication | 2007

Robust joint audio-video talker localization in video conferencing using reliability information-II: Bayesian network fusion

Zhong Lin; Rafik A. Goubran; Richard M. Dansereau

As a fundamental part of single microphone speech quality enhancement, noise power spectrum estimation is particularly challenging in adverse environments with low signal-to-noise ratio (SNR) and highly non-stationary background noise. In this paper, we propose a novel scheme which applies human speech properties, such as pitch properties of voiced speech and statistical properties of durations of unvoiced speech, into subband spectral tracking to estimate the power spectrum of non-stationary noise. We show that our proposed method is able to estimate the power spectrum more accurately and faster when the noise is highly non-stationary and the proposed method tracks bursts of noise 4-6 times faster than competitive methods. We also show that the mean square error of the estimated noise spectrum by the proposed method is 15% lower on average than competitive methods. The proposed algorithm is then combined with conventional MMSE-STSA and its overall performance is tested in a speech enhancement application. Simulation results justify that the segmental SNR improvement of the proposed system is on average 0.9dB higher than the competitive system, and the mean opinion score (MOS) improvement is on average 0.17 higher than the competitive system.


international conference on image processing | 2009

Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separation

Mohamed M. Fouad; Richard M. Dansereau; Anthony Whitehead

In this paper, we focus on the sub-pixel geometric registration of images with arbitrarily-shaped local intensity variations, particularly due to shadows. Intensity variations tend to degrade the performance of geometric registration, thereby degrading subsequent processing. To handle intensity variations, we propose a model with illumination correction that can handle arbitrarily-shaped regions of local intensity variations. The approach is set in an iterative coarse-to-fine framework with steps to estimate the geometric registration with illumination correction and steps to refine the arbitrarily-shaped local intensity regions. The results show that this model outperforms linear scalar model by a factor of 6.8 in sub-pixel registration accuracy.


IEEE Transactions on Circuits and Systems for Video Technology | 2012

Noise estimation using speech/non-speech frame decision and subband spectral tracking

Jie Zhu; Richard M. Dansereau

In this paper, we present a multiple description video coding algorithm based on error-resilient and error concealment set partitioning in hierarchical trees (ERC-SPIHT). In this proposed approach, additional redundancy is generated by wavelet decomposing the spatial root subband and such redundancy is then intentionally inserted into the substreams. As a result, the novelty of the proposed approach is that the root subband coefficients lost during transmission in any substream can be reconstructed by exploiting both inherent redundancy and inserted redundancy. This reconstruction procedure is implemented in two steps, first by using existing 2-D error concealment techniques, and second with the proposed root subband recovery approach. The former step is used to estimate the missing coefficients in the spatial root and high frequency subbands by exploiting the inherent redundancy, while the latter attempts to utilize the inserted redundancy to further improve the precision in the estimation of the missing spatial root subband coefficients. The proposed root subband recovery method can be iteratively applied and accuracy of the reconstruction can be gradually increased with each iteration. Experimental results on different video sequences show that the proposed method maintains error-resilience with high coding efficiency. In particular, our results demonstrate that the proposed algorithm achieves a significant improvement on video quality by up to 2.5753 dB in the presence of a substream loss compared to ERC-SPIHT.

Collaboration


Dive into the Richard M. Dansereau's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge