Jean-François Bonastre
University of Avignon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jean-François Bonastre.
EURASIP Journal on Advances in Signal Processing | 2004
Frédéric Bimbot; Jean-François Bonastre; Corinne Fredouille; Guillaume Gravier; Ivan Magrin-Chagnolleau; Sylvain Meignier; Teva Merlin; Javier Ortega-Garcia; Dijana Petrovska-Delacrétaz; Douglas A. Reynolds
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
international conference on acoustics, speech, and signal processing | 2005
Jean-François Bonastre; Frédéric Wils; Sylvain Meignier
This paper presents the ALIZE free speaker recognition toolkit. ALIZE is designed and developed within the framework of the ALIZE project, a part of the French Research Ministry Technolangue program. The paper focuses on the innovative aspects of ALIZE and illustrates them by some examples. An experimental validation of the toolkit during the NIST 2004 speaker recognition evaluation campaign is also proposed.
international conference on multimedia and expo | 2012
Chris McCool; Sébastien Marcel; Abdenour Hadid; Matti Pietikäinen; Pavel Matejka; Jan Cernock ; x Fd; Norman Poh; Josef Kittler; Anthony Larcher; Christophe Lévy; Driss Matrouf; Jean-François Bonastre; Phil Tresadern; Timothy F. Cootes
This paper presents a novel fully automatic bi-modal, face and speaker, recognition system which runs in real-time on a mobile phone. The implemented system runs in real-time on a Nokia N900 and demonstrates the feasibility of performing both automatic face and speaker recognition on a mobile phone. We evaluate this recognition system on a novel publicly-available mobile phone database and provide a well defined evaluation protocol. This database was captured almost exclusively using mobile phones and aims to improve research into deploying biometric techniques to mobile devices. We show, on this mobile phone database, that face and speaker recognition can be performed in a mobile environment and using score fusion can improve the performance by more than 25% in terms of error rates.
Computer Speech & Language | 2006
Sylvain Meignier; Daniel Moraru; Corinne Fredouille; Jean-François Bonastre; Laurent Besacier
This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RTO03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes. Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results. Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diariza-tion process. Along with enriching the resulting segmentation with information about speaker gender,
IEEE Signal Processing Magazine | 2009
Joseph P. Campbell; Wade Shen; William M. Campbell; Reva Schwartz; Jean-François Bonastre; Driss Matrouf
Looking at the different points highlighted in this article, we affirm that forensic applications of speaker recognition should still be taken under a necessary need for caution. Disseminating this message remains one of the most important responsibilities of speaker recognition researchers.
international conference on acoustics, speech, and signal processing | 2000
Jean-François Bonastre; Perrine Delacourt; Corinne Fredouille; Teva Merlin; Christian Wellekens
A speaker tracking system (STS) is built by using successively a speaker change detector and a speaker verification system. The aim of the STS is to find in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a first step, speech is segmented into homogeneous segments containing only one speaker, without any use of a priori knowledge about speakers. Then, the resulting segments are checked to belong to one of the target speakers. The system has been used in a NIST evaluation test with satisfactory results.
international conference on acoustics, speech, and signal processing | 2004
Daniel Moraru; Sylvain Meignier; Corinne Fredouille; Laurent Besacier; Jean-François Bonastre
The paper presents the ELISA consortium activities in automatic speaker segmentation, also known as speaker diarization, during the NIST rich transcription (RT), 2003, evaluation. The experiments were conducted on real broadcast news data (HUB4). Two different approaches from the CLIPS and LIA laboratories are presented and different possibilities of combining them are investigated, in the framework of the ELISA consortium. The system submitted as an ELISA primary system obtained the second lowest segmentation error rate compared to the other RT03-participant primary systems. Another ELISA system submitted as a secondary system outperformed the best primary system and obtained the lowest speaker segmentation error rate.
Speech Communication | 2000
Laurent Besacier; Jean-François Bonastre; Corinne Fredouille
Abstract Statistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve the current systems, a better understanding of the nature of the information used by statistical methods is needed. This knowledge should allow to select only the relevant information or to add new sources of information. The first part of this paper presents experiments that aim at localizing the most useful acoustic events for speaker recognition. The relation between the discriminant ability and the speechs events nature is studied. Particularly, the phonetic content, the signal stability and the frequency domain are explored. Finally, the potential of dynamic information contained in the relation between a frame and its p neighbours is investigated. In the second part, the authors suggest a new selection procedure designed to select the pertinent features. Conventional feature selection techniques (ascendant selection, knock-out) allow only global and a posteriori knowledge about the relevance of an information source. However, some speech clusters may be very efficient to recognize a particular speaker, whereas they can be non-informative for another one. Moreover, some information classes may be corrupted or even missing for particular recording conditions. This necessity for speaker-specific processing and for adaptability to the environment (with no a priori knowledge of the degradation affecting the signal) leads the authors to propose a system that automatically selects the most discriminant parts of a speech utterance. The proposed architecture divides the signal into different time–frequency blocks. The likelihood is calculated after dynamically selecting the most useful blocks. This information selection leads to a significative error rate reduction (up to 41% of relative error rate decrease on TIMIT) for short training and test durations. Finally, experiments in the case of simulated noise degradation show that this approach is a very efficient way to deal with partially corrupted speech.
international conference on acoustics, speech, and signal processing | 2012
Anthony Larcher; Pierre-Michel Bousquet; Kong Aik Lee; Driss Matrouf; Haizhou Li; Jean-François Bonastre
Short speech duration remains a critical factor of performance degradation when deploying a speaker verification system. To overcome this difficulty, a large number of commercial applications impose the use of fixed pass-phrases. In this context, we show that the performance of the popular i-vector approach can be greatly improved by taking advantage of the phonetic information that they convey. Moreover, as i-vectors require a conditioning process to reach high accuracy, we show that further improvements are possible by taking advantage of this phonetic information within the normalisation process. We compare two methods, Within Class Covariance Normalization (WCCN) and Eigen Factor Radial (EFR), both relying on parameters estimated on the same development data. Our study suggests that WCCN is more robust to data mismatch but less efficient than EFR when the development data has a better match with the test data.
international conference on acoustics, speech, and signal processing | 2006
Driss Matrouf; Jean-François Bonastre; Corinne Fredouille
This paper investigates the effect of voice transformation on automatic speaker recognition system performance. We focus on increasing the impostor acceptance rate, by modifying the voice of an impostor in order to target a specific speaker. This paper is based on the following idea: in several applications and particularly in forensic situations, it is reasonable to think that some organizations have a knowledge on the speaker recognition method used and could impersonate a given, well known speaker. This paper presents some experiments based on NIST SRE 2005 protocol and a simple impostor voice transformation method. The results show that this simple voice transformation allows a drastic increase of the false acceptance rate, without a degradation of the natural aspect of the voice
Collaboration
Dive into the Jean-François Bonastre's collaboration.
French Institute for Research in Computer Science and Automation
View shared research outputs