Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hervé Bredin is active.

Publication


Featured researches published by Hervé Bredin.


international conference on acoustics, speech, and signal processing | 2007

Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification

Hervé Bredin; Gérard Chollet

We investigate the use of audio-visual speech synchrony measure in the framework of identity verification based on talking faces. Two synchrony measures based on canonical correlation analysis and co-inertia analysis respectively are introduced and their performances are evaluated on the specific task of detecting synchronized and not-synchronized audio-visual speech sequences. The notion of high-effort impostor attacks is also introduced as a dangerous threat for current biometric system based on speaker verification and face recognition. A novel biometric modality based on synchrony measures is introduced in order to improve the overall performance of identity verification, and more specifically its robustness to replay attacks.


Pattern Analysis and Applications | 2009

Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models

Enrique Argones Rúa; Hervé Bredin; Carmen García Mateo; Gérard Chollet; Daniel Jiménez

This paper addresses the subject of liveness detection, which is a test that ensures that biometric cues are acquired from a live person who is actually present at the time of capture. The liveness check is performed by measuring the degree of synchrony between the lips and the voice extracted from a video sequence. Three new methods for asynchrony detection based on co-inertia analysis (CoIA) and a fourth based on coupled hidden Markov models (CHMMs) are derived. Experimental comparisons are made with several methods previously used in the literature for asynchrony detection and speaker location. The reported results demonstrate the effectiveness and superiority of the proposed new methods based on both CoIA and CHMMs as asynchrony detection methods.


international conference on pattern recognition | 2006

GMM-based SVM for face recognition

Hervé Bredin; Najim Dehak; Gérard Chollet

A new face recognition algorithm is presented. It supposes that a video sequence of a person is available both at enrollment and test time. During enrollment, a client Gaussian mixture model (GMM) is adapted from a world GMM using eigenface features extracted from each frame of the video. Then, a support vector machine (SVM) is used to find a decision border between the client GMM and pseudo-impostors GMMs. At test time, a GMM is adapted from the test video and a decision is taken using the previously learned client SVM. This algorithm brings a 3.5% equal error rate (EER) improvement over the biosecure reference system on the Pooled protocol of the BANCA database


international conference on computer vision | 2012

Fusion of speech, faces and text for person identification in TV broadcast

Hervé Bredin; Johann Poignant; Makarand Tapaswi; Guillaume Fortier; Viet Bac Le; Thibault Napoléon; Hua Gao; Claude Barras; Sophie Rosset; Laurent Besacier; Jakob J. Verbeek; Georges Quénot; Frédéric Jurie; Hazim Kemal Ekenel

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).


International Journal of Multimedia Information Retrieval | 2014

Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast

Hervé Bredin; Anindya Roy; Viet Bac Le; Claude Barras

This work introduces a unified framework for mono-, cross- and multi-modal person recognition in multimedia data. Dubbed person instance graph models the person recognition task as a graph mining problem: i.e., finding the best mapping between person instance vertices and identity vertices. Practically, we describe how the approach can be applied to speaker identification in TV broadcast. Then, a solution to the above-mentioned mapping problem is proposed. It relies on integer linear programming to model the problem of clustering person instances based on their identity. We provide an in-depth theoretical definition of the optimization problem. Moreover, we improve two fundamental aspects of our previous related work: the problem constraints and the optimized objective function. Finally, a thorough experimental evaluation of the proposed framework is performed on a publicly available benchmark database. Depending on the graph configuration (i.e., the choice of its vertices and edges), we show that multiple tasks can be addressed interchangeably (e.g., speaker diarization, supervised or unsupervised speaker identification), significantly outperforming state-of-the-art mono-modal approaches.


Applied Artificial Intelligence | 2012

A PUBLIC AUDIO IDENTIFICATION EVALUATION FRAMEWORK FOR BROADCAST MONITORING

Mathieu Ramona; Sébastien Fenet; Raphaël Blouet; Hervé Bredin; Thomas Fillon; Geoffroy Peeters

This paper presents the first public framework for the evaluation of audio fingerprinting techniques. Although the domain of audio identification is very active, both in the industry and the academic world, there is at present no common basis to compare the proposed techniques. This is because corpuses and evaluation protocols differ among the authors. The framework we present here corresponds to a use-case in which audio excerpts have to be detected in a radio broadcast stream. This scenario, indeed, naturally provides a large variety of audio distortions that makes this task a real challenge for fingerprinting systems. Scoring metrics are discussed with regard to this particular scenario. We then describe a whole evaluation framework including an audio corpus, together with the related groundtruth annotation, and a toolkit for the computation of the score metrics. An example of an application of this framework is finally detailed, that took place during the evaluation campaign of the Quaero project. This evaluation framework is publicly available for download and constitutes a simple, yet thorough, platform that can be used by the community in the field of audio identification to encourage reproducible results.


international conference on acoustics, speech, and signal processing | 2008

Making talking-face authentication robust to deliberate imposture

Hervé Bredin; Gérard Chollet

We expose the limitations of existing frameworks designed for the evaluation of audiovisual biometric authentication algorithms. The weakness of a classical audiovisual authentication system is uncovered when confronted to realistic deliberate impostors. A client-dependent audiovisual synchrony measure is used in order to deal with deliberate impostors and three new fusion strategies and their performance against random and deliberate impostors are studied.


international conference on acoustics, speech, and signal processing | 2012

Segmentation of TV shows into scenes using speaker diarization and speech recognition

Hervé Bredin

We investigate the use of speaker diarization (SD) and automatic speech recognition (ASR) for the segmentation of audiovisual documents into scenes. We introduce multiple monomodal and multimodal approaches based on a state-of-the-art algorithm called generalized scene transition graph (GSTG). First, we extend the latter with the use of semantic information derived from both SD and ASR. Then, multimodal fusion of color histograms, SD and ASR is investigated at various point of the GSTG pipeline (early, late or intermediate fusion). Experiments driven on a few episodes of a popular TV show indicate that SD and ASR can be successfully combined with visual information and bring an additional +11% relative increase in terms of F1-measure for scene boundary detection over the state-of-the-art baseline.


EURASIP Journal on Advances in Signal Processing | 2009

Talking-face identity verification, audiovisual forgery, and robustness issues

Walid Karam; Hervé Bredin; Hanna Greige; Gérard Chollet; Chafic Mokbel

The robustness of a biometric identity verification (IV) system is best evaluated by monitoring its behavior under impostor attacks. Such attacks may include the transformation of one, many, or all of the biometric modalities. In this paper, we present the transformation of both speech and visual appearance of a speaker and evaluate its effects on the IV system. We propose MixTrans, a novel method for voice transformation. MixTrans is a mixture-structured bias voice transformation technique in the cepstral domain, which allows a transformed audio signal to be estimated and reconstructed in the temporal domain. We also propose a face transformation technique that allows a frontal face image of a client speaker to be animated. This technique employs principal warps to deform defined MPEG-4 facial feature points based on determined facial animation parameters (FAPs). The robustness of the IV system is evaluated under these attacks.


acm multimedia | 2016

Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering

Hervé Bredin; Grégory Gelly

While successful on broadcast news, meetings or telephone conversation, state-of-the-art speaker diarization techniques tend to perform poorly on TV series or movies. In this paper, we propose to rely on state-of-the-art face clustering techniques to guide acoustic speaker diarization. Two approaches are tested and evaluated on the first season of Game Of Thrones TV series. The second (better) approach relies on a novel talking-face detection module based on bi-directional long short-term memory recurrent neural network. Both audio-visual approaches outperform the audio-only baseline. A detailed study of the behavior of these approaches is also provided and paves the way to future improvements.

Collaboration


Dive into the Hervé Bredin's collaboration.

Top Co-Authors

Avatar

Claude Barras

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Johann Poignant

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Georges Quénot

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Laurent Besacier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Anindya Roy

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexandre Benoit

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Bahjat Safadi

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Denis Pellerin

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge