Simon Bozonnet
Institut Eurécom
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simon Bozonnet.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Xavier Anguera Miro; Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille; Gerald Friedland; Oriol Vinyals
Speaker diarization is the task of determining “who spoke when?” in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.
international conference on acoustics, speech, and signal processing | 2010
Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille
There are two approaches to speaker diarization. They are bottom-up and top-down. Our work on top-down systems show that they can deliver competitive results compared to bottom-up systems and that they are extremely computationally efficient, but also that they are particularly prone to poor model initialisation and cluster impurities. In this paper we present enhancements to our state-of-the-art, top-down approach to speaker diarization that deliver improved stability across three different datasets composed of conference meetings from five standard NIST RT evaluations. We report an improved approach to speaker modelling which, despite having greater chances for cluster impurities, delivers a 35% relative improvement in DER for the MDM condition. We also describe new work to incorporate cluster purification into a top-down system which delivers relative improvements of 44% over the baseline system without compromising computational efficiency.
international conference on acoustics, speech, and signal processing | 2012
Ravichander Vipperla; Jürgen T. Geiger; Simon Bozonnet; Dong Wang; Nicholas W. D. Evans; Björn W. Schuller; Gerhard Rigoll
Overlapping speech is known to degrade speaker diarization performance with impacts on speaker clustering and segmentation. While previous work made important advances in detecting overlapping speech intervals and in attributing them to relevant speakers, the problem remains largely unsolved. This paper reports the first application of convolutive non-negative sparse coding (CNSC) to the overlap problem. CNSC aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to overlap detection and attribution. Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector. In a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.
international conference on acoustics, speech, and signal processing | 2011
Simon Bozonnet; Dong Wang; Nicholas W. D. Evans; Raphaël Troncy
While bottom-up approaches have emerged as the standard, default approach to clustering for speaker diarization we have always found the top-down approach gives equivalent or superior performance. Our recent work shows that significant gains in performance can be obtained when cluster purification is applied to the output of top-down systems but that it can degrade performance when applied to the output of bottom-up systems. This paper demonstrates that these observations can be accounted for by factors unrelated to the speaker and that they can impact more strongly on the performance of bottom-up clustering strategies than top-down strategies. Experimental results confirm that clusters produced through top-down clustering are better normalized against phone variation than those produced through bottom-up clustering and that this accounts for the observed inconsistencies in purification performance. The work highlights the need for marginalization strategies which should encourage convergence toward different speakers rather than toward nuisance factors such as that those related to the linguistic content.
european signal processing conference | 2015
Giovanni Soldi; Simon Bozonnet; Christophe Beaugeant; Nicholas W. D. Evans
Phone adaptive training (PAT) aims to derive a new acoustic feature space in which the influence of phone variation is minimised while that of speaker variation is maximised. Originally proposed in the context of speaker diarization, our most recent work showed the utility of PAT in short-duration, automatic speaker verification where phone variation typically degrades performance. New to this contribution is the assessment of PAT utilising automatically generated acoustic class transcriptions whose number is controlled by regression tree analysis. Experimental results using a standard database show that PAT delivers significant improvements in the performance of a state-of-the-art iVector speaker verification system.
conference of the international speech communication association | 2010
Simon Bozonnet
conference of the international speech communication association | 2012
Jürgen T. Geiger; Ravichander Vipperla; Simon Bozonnet; Nicholas W. D. Evans; Björn W. Schuller; Gerhard Rigoll
conference of the international speech communication association | 2012
Simon Bozonnet
european signal processing conference | 2010
Simon Bozonnet; Félicien Vallet; Nicholas W. D. Evans; Slim Essid; Gaël Richard; Jean Carrive
conference of the international speech communication association | 2010
Simon Bozonnet