Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Driss Matrouf is active.

Publication


Featured researches published by Driss Matrouf.


international conference on multimedia and expo | 2012

Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data

Chris McCool; Sébastien Marcel; Abdenour Hadid; Matti Pietikäinen; Pavel Matejka; Jan Cernock ; x Fd; Norman Poh; Josef Kittler; Anthony Larcher; Christophe Lévy; Driss Matrouf; Jean-François Bonastre; Phil Tresadern; Timothy F. Cootes

This paper presents a novel fully automatic bi-modal, face and speaker, recognition system which runs in real-time on a mobile phone. The implemented system runs in real-time on a Nokia N900 and demonstrates the feasibility of performing both automatic face and speaker recognition on a mobile phone. We evaluate this recognition system on a novel publicly-available mobile phone database and provide a well defined evaluation protocol. This database was captured almost exclusively using mobile phones and aims to improve research into deploying biometric techniques to mobile devices. We show, on this mobile phone database, that face and speaker recognition can be performed in a mobile environment and using score fusion can improve the performance by more than 25% in terms of error rates.


IEEE Signal Processing Magazine | 2009

Forensic speaker recognition

Joseph P. Campbell; Wade Shen; William M. Campbell; Reva Schwartz; Jean-François Bonastre; Driss Matrouf

Looking at the different points highlighted in this article, we affirm that forensic applications of speaker recognition should still be taken under a necessary need for caution. Disseminating this message remains one of the most important responsibilities of speaker recognition researchers.


text speech and dialogue | 2007

The LIA speech recognition system: from 10xRT to 1xRT

Georges Linarès; Pascal Nocera; Dominique Massonié; Driss Matrouf

The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This toolbox allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ([1]), on unconstrained transcription and real-time transcription tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both efficiency and accuracy. Then, we evaluate the impact of the different system components (lexicon, language models and acoustic models) to the trade-off between efficiency and accuracy. Experiments are carried out in framework of the ESTER evaluation campaign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.


international conference on acoustics, speech, and signal processing | 2012

I-vectors in the context of phonetically-constrained short utterances for speaker verification

Anthony Larcher; Pierre-Michel Bousquet; Kong Aik Lee; Driss Matrouf; Haizhou Li; Jean-François Bonastre

Short speech duration remains a critical factor of performance degradation when deploying a speaker verification system. To overcome this difficulty, a large number of commercial applications impose the use of fixed pass-phrases. In this context, we show that the performance of the popular i-vector approach can be greatly improved by taking advantage of the phonetic information that they convey. Moreover, as i-vectors require a conditioning process to reach high accuracy, we show that further improvements are possible by taking advantage of this phonetic information within the normalisation process. We compare two methods, Within Class Covariance Normalization (WCCN) and Eigen Factor Radial (EFR), both relying on parameters estimated on the same development data. Our study suggests that WCCN is more robust to data mismatch but less efficient than EFR when the development data has a better match with the test data.


international conference on acoustics, speech, and signal processing | 2006

Effect of Speech Transformation on Impostor Acceptance

Driss Matrouf; Jean-François Bonastre; Corinne Fredouille

This paper investigates the effect of voice transformation on automatic speaker recognition system performance. We focus on increasing the impostor acceptance rate, by modifying the voice of an impostor in order to target a specific speaker. This paper is based on the following idea: in several applications and particularly in forensic situations, it is reasonable to think that some organizations have a knowledge on the speaker recognition method used and could impersonate a given, well known speaker. This paper presents some experiments based on NIST SRE 2005 protocol and a simple impostor voice transformation method. The results show that this simple voice transformation allows a drastic increase of the false acceptance rate, without a degradation of the natural aspect of the voice


international conference on pattern recognition | 2010

Model and Score Adaptation for Biometric Systems: Coping With Device Interoperability and Changing Acquisition Conditions

Norman Poh; Josef Kittler; Sébastien Marcel; Driss Matrouf; Jean-François Bonastre

The performance of biometric systems can be significantly affected by changes in signal quality. In this paper, two types of changes are considered: change in acquisition environment and in sensing devices. We investigated three solutions: (i) model-level adaptation, (ii) score-level adaptation (normalisation), and (iii) the combination of the two, called “compound” adaptation. In order to cope with the above changing conditions, the model-level adaptation attempts to update the parameters of the expert systems (classifiers). This approach requires the authenticity of the candidate samples used for adaptation be known (corresponding to supervised adaptation), or can be estimated (unsupervised adaptation). In comparison, the score-level adaptation merely involves post processing the expert output, with the objective of rendering the associated decision threshold to be dependent only on the class priors despite the changing acquisition conditions. Since the above adaptation strategies treat the underlying biometric experts/classifiers as a black-box, they can be applied to any unimodal or multimodal biometric system, thus facilitating system-level integration and performance optimisation. Our contributions are: (i) proposal of compound adaptation; (ii) investigation and comparison of two different quality-dependent score normalisation strategies; and, (iii) empirical comparison of the merit of the above three solutions on the BANCA face (video) and speech database.


international conference on acoustics speech and signal processing | 1996

Developments in continuous speech dictation using the 1995 ARPA NAB news task

Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Driss Matrouf

We report on the LIMSI recognizer evaluated in the ARPA 1995 North American Business (NAB) news benchmark test. In contrast to previous evaluations, the new Hub 3 test aims at improving basic SI, CSR performance on unlimited-vocabulary read speech recorded under more varied acoustical conditions (background environmental noise and unknown microphones). The LIMSI recognizer is an HMM-based system with a Gaussian mixture. Decoding is carried out in multiple forward acoustic passes, where more refined acoustic and language models are used in successive passes and information is transmitted via word graphs. In order to deal with the varied acoustic conditions, channel compensation is performed iteratively, refining the noise estimates before the first three decoding passes. The final decoding pass is carried out with speaker-adapted models obtained via unsupervised adaptation using the MLLR method. On the Sennheiser microphone (average SNR 29 dB) a word error of 9.1% was obtained, which can be compared to 17.5% on the secondary microphone data (average SNR 15 dB) using the same recognition system.


2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Transfer Function-Based Voice Transformation for Speaker Recognition

Jean-François Bonastre; Driss Matrouf; Corinne Fredouille

This paper investigates the effect of a transfer function-based voice transformation on automatic speaker recognition system performance. We focus on increasing the impostor acceptance rate, by modifying the voice of an impostor in order to target a specific speaker. This paper is based on the following idea: in several applications and particularly in forensic situations, it is reasonable to think that some organizations have a knowledge on the speaker recognition method used and could impersonate a given, well known speaker. We also evaluate the effect of the voice transformation when the transformation is applied both on client and impostor trials. This paper presents some experiments based on NIST SRE 2005 protocol. The results show that the voice transformation allows a drastic increase of the false acceptance rate, without damaging the natural aspect of the voice. It seems also that this kind of voice transformation could be efficient for reducing the inter-session mismatch


international conference on acoustics, speech, and signal processing | 2008

Frame-based acoustic feature integration for speech understanding

Loïc Barrault; Christophe Servan; Driss Matrouf; Georges Linarès; R. De Mori

With the purpose of improving spoken language understanding (SLU) performance, a combination of different acoustic speech recognition (ASR) systems is proposed. State a posteriori probabilities obtained with systems using different acoustic feature sets are combined with log-linear interpolation. In order to perform a coherent combination of these probabilities, acoustic models must have the same topology (i.e. same set of states). For this purpose, a fast and efficient twin model training protocol is proposed. By a wise choice of acoustic feature sets and log-linear interpolation of their likelihood ratios, a substantial concept error rate (CER) reduction has been observed on the test part of the French MEDIA corpus.


international conference on acoustics, speech, and signal processing | 2015

Additive noise compensation in the i-vector space for speaker recognition

Waad Ben Kheder; Driss Matrouf; Jean-François Bonastre; Moez Ajili; Pierre-Michel Bousquet

State-of-the-art speaker recognition systems performance degrades considerably in noisy environments even though they achieve very good results in clean conditions. In order to deal with this strong limitation, we aim in this work to remove the noisy part of an i-vector directly in the i-vector space. Our approach offers the advantage to operate only at the i-vector extraction level, letting the other steps of the system unchanged. A maximum a posteriori (MAP) procedure is applied in order to obtain clean version of the noisy i-vectors taking advantage of prior knowledge about clean i-vectors distribution. To perform this MAP estimation, Gaussian assumptions over clean and noise i-vectors distributions are made. Operating on NIST 2008 data, we show a relative improvement up to 60% compared with baseline system. Our approach also outperforms the “multi-style” backend training technique. The efficiency of the proposed method is obtained at the price of relative high computational cost. We present at the end some ideas to improve this aspect.

Collaboration


Dive into the Driss Matrouf's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge