Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean Rouat is active.

Publication


Featured researches published by Jean Rouat.


Robotics and Autonomous Systems | 2007

Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering

Jean-Marc Valin; François Michaud; Jean Rouat

Mobile robots in real-life settings would benefit from being able to localize and track sound sources. Such a capability can help localizing a person or an interesting event in the environment, and also provides enhanced processing for other capabilities such as speech recognition. To give this capability to a robot, the challenge is not only to localize simultaneous sound sources, but to track them over time. In this paper we propose a robust sound source localization and tracking method using an array of eight microphones. The method is based on a frequency-domain implementation of a steered beamformer along with a particle filter-based tracking algorithm. Results show that a mobile robot can localize and track in real-time multiple moving sources of different types over a range of 7 m. These new capabilities allow a mobile robot to interact using more natural means with people in real-life settings.


IEEE Signal Processing Letters | 2001

Wavelet speech enhancement based on the Teager energy operator

Mohammed Bahoura; Jean Rouat

We propose a new speech enhancement method based on the time adaption of wavelet thresholds. The time dependence is introduced by approximating the Teager energy of the wavelets coefficients. This technique does not require an explicit estimation of the noise level or of the a priori knowledge of the SNR, which is usually needed in most of the popular enhancement methods. Performance of the proposed method is evaluated on speech recorded in real conditions and with artificial noise.


intelligent robots and systems | 2004

Enhanced robot audition based on microphone array source separation with post-filter

Jean-Marc Valin; Jean Rouat; François Michaud

We propose a system that gives a mobile robot the ability to separate simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of geometric source separation and a post-filter that gives us a further reduction of interferences from other sources. We present results and comparisons for separation of multiple non-stationary speech sources combined with noise sources. The main advantage of our approach for mobile robots resides in the fact that both the frequency domain geometric source separation algorithm and the post-filter are able to adapt rapidly to new sources and non-stationarity. Separation results are presented for three simultaneous interfering speakers in the presence of noise. A reduction of log spectral distortion (LSD) and increase of signal-to-noise ratio (SNR) of approximately 10 dB and 14 dB are observed.


Speech Communication | 1997

A pitch determination and voiced/unvoiced decision algorithm for noisy speech

Jean Rouat; Yong Chun Liu; Daniel Morissette

Abstract The design of a pitch tracking system for noisy speech is a challenging and yet unsolved issue due to the association of “traditional” pitch determination problems with those of noise processing. We have developed a multi-channel pitch determination algorithm (PDA) that has been tested on three speech databases (0 dB SNR telephone speech, speech recorded in a car and clean speech) involving fifty-eight speakers. Our system has been compared to a multi-channel PDA based on auditory modelling (AMPEX), to hand-labelled and to laryngograph pitch contours. Our PDA is comprised of an automatic channel selection module and a pitch extraction module that relies on a pseudo-periodic histogram (combination of normalised scalar products for the less corrupted channels) in order to find pitch. Our PDA excelled in performance over the reference system on 0 dB telephone and car speech. The automatic selection of channels was effective on the very noisy telephone speech (0 dB) but performed less significantly on car speech where the robustness of the system is mainly due to the pitch extraction module in comparison to AMPEX. This paper reports in details the voiced/unvoiced, unvoiced/voiced performance and pitch estimation errors for the proposed PDA and the reference system while utilising three speech databases.


international conference on robotics and automation | 2004

Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach

Jean-Marc Valin; François Michaud; Brahim Hadjou; Jean Rouat

Mobile robots in real-life settings would benefit from being able to localize sound sources. Such a capability can nicely complement vision to help localize a person or an interesting event in the environment, and also to provide enhanced processing for other capabilities such as speech recognition. We present a robust sound source localization method in three-dimensional space using an array of 8 microphones. The method is based on a frequency-domain implementation of a steered beamformer along with a probabilistic post-processor. Results show that a mobile robot can localize in real time multiple moving sources of different types over a range of 5 meters with a response time of 200 ms.


IEEE Transactions on Robotics | 2007

Robust Recognition of Simultaneous Speech by a Mobile Robot

Jean Marc Valin; Shunichi Yamamoto; Jean Rouat; François Michaud; Kazuhiro Nakadai; Hiroshi G. Okuno

This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfllter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10deg to 90deg. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24% with the postfllter and of 42% when the missing features approach is combined with the postfllter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.


international conference on robotics and automation | 2005

Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory

Shun’ichi Yamamoto; Jean-Marc Valin; Kazuhiro Nakadai; Jean Rouat; François Michaud; Tetsuya Ogata; Hiroshi G. Okuno

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).


international conference on acoustics, speech, and signal processing | 2006

Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering

Jean-Marc Valin; François Michaud; Jean Rouat

In this paper we present a new robust sound source localization and tracking method using an array of eight microphones (US patent pending). The method uses a steered beamformer based on the reliability-weighted phase transform (RWPHAT) along with a particle filter-based tracking algorithm. The proposed system is able to estimate both the direction and the distance of the sources. In a videoconferencing context, the direction was estimated with an accuracy better than one degree while the distance was accurate within 10% RMS. Tracking of up to three simultaneous moving speakers is demonstrated in a noisy environment


international symposium on neural networks | 2005

Exploration of rank order coding with spiking neural networks for speech recognition

Stéphane Loiselle; Jean Rouat; Daniel Pressnitzer; Simon J. Thorpe

Speech recognition is very difficult in the context of noisy and corrupted speech. Most conventional techniques need huge databases to estimate speech (or noise) density probabilities to perform recognition. We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. We illustrate the potential of such non-linear processing of speech by means of a preliminary test with recognition of French spoken digits from a small speech database


non linear speech processing | 2006

Wavelet speech enhancement based on time-scale adaptation

Mohammed Bahoura; Jean Rouat

We propose a new speech enhancement method based on time and scale adaptation of wavelet thresholds. The time dependency is introduced by approximating the Teager energy of the wavelet coefficients, while the scale dependency is introduced by extending the principle of level dependent threshold to wavelet packet thresholding. This technique does not require an explicit estimation of the noise level or of the a priori knowledge of the SNR, as is usually needed in most of the popular enhancement methods. Performance of the proposed method is evaluated on speech recorded in real conditions (plane, sawmill, tank, subway, babble, car, exhibition hall, restaurant, street, airport, and train station) and artificially added noise. MEL-scale decomposition based on wavelet packets is also compared to the common wavelet packet scale. Comparison in terms of signal-to-noise ratio (SNR) is reported for time adaptation and time-scale adaptation of the wavelet coefficients thresholds. Visual inspection of spectrograms and listening experiments are also used to support the results. Hidden Markov Models speech recognition experiments are conducted on the AURORA-2 database and show that the proposed method improves the speech recognition rates for low SNRs.

Collaboration


Dive into the Jean Rouat's collaboration.

Top Co-Authors

Avatar

Ramin Pichevar

Université de Sherbrooke

View shared research outputs
Top Co-Authors

Avatar

Hassan Ezzaidi

Université du Québec à Chicoutimi

View shared research outputs
Top Co-Authors

Avatar

Lyes Bachatene

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Simon Brodeur

Université de Sherbrooke

View shared research outputs
Top Co-Authors

Avatar

Sarah Cattan

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohammed Bahoura

Université du Québec à Rimouski

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Marc Valin

Université de Sherbrooke

View shared research outputs
Researchain Logo
Decentralizing Knowledge