Corinne Fredouille | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Corinne Fredouille is active.

Explore More

Publication

Featured researches published by Corinne Fredouille.

EURASIP Journal on Advances in Signal Processing | 2004

A tutorial on text-independent speaker verification

Frédéric Bimbot; Jean-François Bonastre; Corinne Fredouille; Guillaume Gravier; Ivan Magrin-Chagnolleau; Sylvain Meignier; Teva Merlin; Javier Ortega-Garcia; Dijana Petrovska-Delacrétaz; Douglas A. Reynolds

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Speaker Diarization: A Review of Recent Research

Xavier Anguera Miro; Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille; Gerald Friedland; Oriol Vinyals

Speaker diarization is the task of determining “who spoke when?” in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.

Computer Speech & Language | 2006

Step-by-step and integrated approaches in broadcast news speaker diarization

Sylvain Meignier; Daniel Moraru; Corinne Fredouille; Jean-François Bonastre; Laurent Besacier

This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RTO03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes. Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results. Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diariza-tion process. Along with enriching the resulting segmentation with information about speaker gender,

international conference on acoustics, speech, and signal processing | 2000

A speaker tracking system based on speaker turn detection for NIST evaluation

Jean-François Bonastre; Perrine Delacourt; Corinne Fredouille; Teva Merlin; Christian Wellekens

A speaker tracking system (STS) is built by using successively a speaker change detector and a speaker verification system. The aim of the STS is to find in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a first step, speech is segmented into homogeneous segments containing only one speaker, without any use of a priori knowledge about speakers. Then, the resulting segments are checked to belong to one of the target speakers. The system has been used in a NIST evaluation test with satisfactory results.

international conference on acoustics, speech, and signal processing | 2004

The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation

Daniel Moraru; Sylvain Meignier; Corinne Fredouille; Laurent Besacier; Jean-François Bonastre

The paper presents the ELISA consortium activities in automatic speaker segmentation, also known as speaker diarization, during the NIST rich transcription (RT), 2003, evaluation. The experiments were conducted on real broadcast news data (HUB4). Two different approaches from the CLIPS and LIA laboratories are presented and different possibilities of combining them are investigated, in the framework of the ELISA consortium. The system submitted as an ELISA primary system obtained the second lowest segmentation error rate compared to the other RT03-participant primary systems. Another ELISA system submitted as a secondary system outperformed the best primary system and obtained the lowest speaker segmentation error rate.

Speech Communication | 2000

Localization and selection of speaker-specific information with statistical modeling

Laurent Besacier; Jean-François Bonastre; Corinne Fredouille

Abstract Statistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve the current systems, a better understanding of the nature of the information used by statistical methods is needed. This knowledge should allow to select only the relevant information or to add new sources of information. The first part of this paper presents experiments that aim at localizing the most useful acoustic events for speaker recognition. The relation between the discriminant ability and the speechs events nature is studied. Particularly, the phonetic content, the signal stability and the frequency domain are explored. Finally, the potential of dynamic information contained in the relation between a frame and its p neighbours is investigated. In the second part, the authors suggest a new selection procedure designed to select the pertinent features. Conventional feature selection techniques (ascendant selection, knock-out) allow only global and a posteriori knowledge about the relevance of an information source. However, some speech clusters may be very efficient to recognize a particular speaker, whereas they can be non-informative for another one. Moreover, some information classes may be corrupted or even missing for particular recording conditions. This necessity for speaker-specific processing and for adaptability to the environment (with no a priori knowledge of the degradation affecting the signal) leads the authors to propose a system that automatically selects the most discriminant parts of a speech utterance. The proposed architecture divides the signal into different time–frequency blocks. The likelihood is calculated after dynamically selecting the most useful blocks. This information selection leads to a significative error rate reduction (up to 41% of relative error rate decrease on TIMIT) for short training and test durations. Finally, experiments in the case of simulated noise degradation show that this approach is a very efficient way to deal with partially corrupted speech.

international conference on acoustics, speech, and signal processing | 2006

Effect of Speech Transformation on Impostor Acceptance

Driss Matrouf; Jean-François Bonastre; Corinne Fredouille

This paper investigates the effect of voice transformation on automatic speaker recognition system performance. We focus on increasing the impostor acceptance rate, by modifying the voice of an impostor in order to target a specific speaker. This paper is based on the following idea: in several applications and particularly in forensic situations, it is reasonable to think that some organizations have a knowledge on the speaker recognition method used and could impersonate a given, well known speaker. This paper presents some experiments based on NIST SRE 2005 protocol and a simple impostor voice transformation method. The results show that this simple voice transformation allows a drastic increase of the false acceptance rate, without a degradation of the natural aspect of the voice

international conference on acoustics, speech, and signal processing | 2010

The lia-eurecom RT'09 speaker diarization system: Enhancements in speaker modelling and cluster purification

Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille

There are two approaches to speaker diarization. They are bottom-up and top-down. Our work on top-down systems show that they can deliver competitive results compared to bottom-up systems and that they are extremely computationally efficient, but also that they are particularly prone to poor model initialisation and cluster impurities. In this paper we present enhancements to our state-of-the-art, top-down approach to speaker diarization that deliver improved stability across three different datasets composed of conference meetings from five standard NIST RT evaluations. We report an improved approach to speaker modelling which, despite having greater chances for cluster impurities, delivers a 35% relative improvement in DER for the MDM condition. We also describe new work to incorporate cluster purification into a top-down system which delivers relative improvements of 44% over the baseline system without compromising computational efficiency.

international conference on acoustics, speech, and signal processing | 2011

Speaker diarization of heterogeneous web video files: A preliminary study

Pierre Clément; Thierry Bazillon; Corinne Fredouille

In the last ten years, internet as well as its applications changed significantly, mainly thanks to the raising of available personal resources. Concerning multimedia, the most impressive evolution is the continuous growing success of the video sharing websites. But with this success come the difficulties to efficiently search, index and access relevant information about these documents. Speaker diarization is an important task in the overall information retrieval process. This paper describes an audio/video database, especially built for the speaker diarization task, based on different video genres. Through some preliminary experiments, it highlights the difficulties encountered in this context, mainly linked to the database heterogeneity.

international conference on acoustics, speech, and signal processing | 2009

Speaker diarization using unsupervised discriminant analysis of inter-channel delay features

Nicholas W. D. Evans; Corinne Fredouille; Jean-François Bonastre

When multiple microphones are available estimates of inter-channel delay, which characterise a speaker-s location, can be used as features for speaker diarization. Background noise and reverberation can, however, lead to noisy features and poor performance. To ameliorate these problems, this paper presents a new approach to the discriminant analysis of delay features for speaker diarization. This novel and nonetheless unsupervised approach aims to increase speaker separability in delay-space. We assess the approach on subsets of four standard NIST RT datasets and demonstrate a relative improvement in diarization error rate of 25% on a separate evaluation set using delay features alone.

Explore More