Héctor Delgado
Institut Eurécom
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Héctor Delgado.
Odyssey 2016 | 2016
Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans
Efforts to develop new countermeasures in order to protect automatic speaker verification from spoofing have intensified over recent years. The ASVspoof 2015 initiative showed that there is great potential to detect spoofing attacks, but also that the detection of previously unforeseen spoofing attacks remains challenging. This paper argues that there is more to be gained from the study of features rather than classifiers and introduces a new feature for spoofing detection based on the constant Q transform, a perceptually-inspired time-frequency analysis tool popular in the study of music. Experimental results obtained using the standard ASVspoof 2015 database show that, when coupled with a standard Gaussian mixture model-based classifier, the proposed constant Q cepstral coefficients (CQCCs) outperform all previously reported results by a significant margin. In particular, those for a subset of unknown spoofing attacks (for which no matched training data was used) is 0.46%, a relative improvement of 72% over the best, previously reported results.
conference of the international speech communication association | 2016
Tomi Kinnunen; Sahidullah; Ivan Kukanov; Héctor Delgado; Massimiliano Todisco; Achintya Kumar Sarkar; Nicolai Bæk Thomsen; Ville Hautamäki; Nicholas W. D. Evans; Zheng-Hua Tan
Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously in the literature, a treatment of simultaneous speaker and utterance verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper reports different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination of utterance verification with automatic speaker verification is (almost) universally beneficial with significant performance improvements being observed.
conference of the international speech communication association | 2016
Md. Sahidullah; Héctor Delgado; Massimiliano Todisco; Hong Yu; Tomi Kinnunen; Nicholas W. D. Evans; Zheng-Hua Tan
It is well known that automatic speaker verification (ASV) systems can be vulnerable to spoofing. The community has responded to the threat by developing dedicated countermeasures aimed at detecting spoofing attacks. Progress in this area has accelerated over recent years, partly as a result of the first standard evaluation, ASVspoof 2015, which focused on spoofing detection in isolation from ASV. This paper investigates the integration of state-of-the-art spoofing countermeasures in combination with ASV. Two general strategies to countermeasure integration are reported: cascaded and parallel. The paper reports the first comparative evaluation of each approach performed with the ASVspoof 2015 corpus. Results indicate that, even in the case of varying spoofing attack algorithms, ASV performance remains robust when protected with a diverse set of integrated countermeasures.
spoken language technology workshop | 2016
Héctor Delgado; Massimiliano Todisco; Sahidullah; Achintya Kumar Sarkar; Nicholas W. D. Evans; Tomi Kinnunen; Zheng-Hua Tan
Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrolment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and when fused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.
international conference on acoustics, speech, and signal processing | 2017
Tomi Kinnunen; Sahidullah; Mauro Falcone; Luca Costantini; Rosa González Hautamäki; Dennis Alexander Lehmann Thomsen; Achintya Kumar Sarkar; Zheng-Hua Tan; Héctor Delgado; Massimiliano Todisco; Nicholas W. D. Evans; Ville Hautamäki; Kong Aik Lee
This paper describes a new database for the assessment of automatic speaker verification (ASV) vulnerabilities to spoofing attacks. In contrast to other recent data collection efforts, the new database has been designed to support the development of replay spoofing countermeasures tailored towards the protection of text-dependent ASV systems from replay attacks in the face of variable recording and playback conditions. Derived from the re-recording of the original RedDots database, the effort is aligned with that in text-dependent ASV and thus well positioned for future assessments of replay spoofing countermeasures, not just in isolation, but in integration with ASV. The paper describes the database design and re-recording, a protocol and some early spoofing detection results. The new “RedDots Replayed” database is publicly available through a creative commons license.
Computer Speech & Language | 2017
Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans
Broad evaluation of constant Q cepstral coefficients for spoofing detection.Linearisation of geometric space enables constant Q cepstral processing.Variable spectro-temporal resolution key to detection performance.State-of-the-art performance across three standard databases.Cross-database results point towards new approach for generalisation. Recent evaluations such as ASVspoof 2015 and the similarly-named AVspoof have stimulated a great deal of progress to develop spoofing countermeasures for automatic speaker verification. This paper reports an approach which combines speech signal analysis using the constant Q transform with traditional cepstral processing. The resulting constant Q cepstral coefficients (CQCCs) were introduced recently and have proven to be an effective spoofing countermeasure. An extension of previous work, the paper reports an assessment of CQCCs generalisation across three different databases and shows that they deliver state-of-the-art performance in each case. The benefit of CQCC features stems from a variable spectro-temporal resolution which, while being fundamentally different to that used by most automatic speaker verification system front-ends, also captures reliably the tell-tale signs of manipulation artefacts which are indicative of spoofing attacks. The second contribution relates to a cross-database evaluation. Results show that CQCC configuration is sensitive to the general form of spoofing attack and use case scenario. This finding suggests that the past single-system pursuit of generalised spoofing detection may need rethinking.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Héctor Delgado; Xavier Anguera; Corinne Fredouille; Javier Serrano
Speaker diarization has become a key process within other speech processing systems which take advantage of single-speaker speech signals. Furthermore, finding recurrent speakers among a set of audio recordings, known as cross-show diarization, is gaining attention in the last years. Current state-of-the-art-systems provide good performance, but usually at the cost of long processing times. This limitation may make current systems not suitable for real-life applications. In this line, the speaker diarization approach based on binary key modeling provides a very fast yet accurate alternative. In this paper, we present the last improvements applied in binary key speaker diarization with the aim of further speeding up the process and improving performance. In addition, we propose a novel method for cross-show speaker diarization based on binary keys. Experimental results show the effectiveness of the proposed improvements for single-show speaker diarization, both in terms of speed and performance, obtaining a real-time factor of 0.0354xRT and a 16.8% relative improvement in performance. Furthermore, our proposed cross-show approach provides very competitive performance, just slightly worse than its single-show diarization counterpart, and exhibits a real time factor of 0.04xRT.
IberSPEECH 2014 Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854 | 2014
Héctor Delgado; Xavier Anguera; Corinne Fredouille; Javier Serrano
The recently proposed speaker diarization technique based on binary keys provides a very fast alternative to state-of-the-art systems with little increase of Diarization Error Rate DER. Although the approach shows great potential, it also presents issues, mainly in the stopping criterion. Therefore, exploring alternative clustering/stopping criterion approaches is needed. Recently some works have addressed the speaker clustering as a global optimization problem in order to tackle the intrinsic issues of the Agglomerative Hierarchical Clustering AHC mainly the local-maximum-based decision making. This paper aims at adapting and applying this new framework to the binary key diarization system. In addition, an analysis of cluster purity across the AHC iterations is done using reference speaker ground-truth labels to select the purer clustering as input for the global framework. Experiments on the REPERE phase 1 test database show improvements of around 6% absolute DER compared to the baseline system output.
conference of the international speech communication association | 2016
Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans
This paper introduces a new articulation rate filter and reports its combination with recently proposed constant Q cepstral coefficients (CQCCs) in their first application to automatic speaker verification (ASV). CQCC features are extracted with the constant Q transform (CQT), a perceptually-inspired alternative to Fourier-based approaches to time-frequency analysis. The CQT offers greater frequency resolution at lower frequencies and greater time resolution at higher frequencies. When coupled with cepstral analysis and the new articulation rate filter, the resulting CQCC features are readily modelled using conventional techniques. A comparative assessment of CQCCs and mel frequency cepstral coefficients (MFCC) for a short-duration speaker verification scenario shows that CQCCs generally outperform MFCCs and that the two feature representations are highly complementary; fusion experiments with the RSR2015 and RedDots databases show relative reductions in equal error rates of as much as 60% compared to an MFCC baseline.
european signal processing conference | 2015
Héctor Delgado; Xavier Anguera; Corinne Fredouille; Javier Serrano
The recently proposed speaker diarization technique based on binary keys provides a very fast alternative to state-of-the-art systems. However, this speed up has the cost of a little increase in Diarization Error Rate (DER). This paper proposes a series of improvements to the original algorithm with the aim to get closer to state-of-the-art performance. First, several alternative similarity measures between binary key speaker/segment models are introduced. Second, we perform a first attempt at applying Intra-Session and IntraSpeaker Variability (ISISV) compensation within the binary diarization approach through the Nuisance Attribute Projection. Experimental results show the benefits of the newly introduced similarity metrics, as well as the potential of the Nuisance Attribute Projection for ISISV compensation in the binary key speaker diarization framework.