Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Massimiliano Todisco is active.

Publication


Featured researches published by Massimiliano Todisco.


Odyssey 2016 | 2016

A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients

Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans

Efforts to develop new countermeasures in order to protect automatic speaker verification from spoofing have intensified over recent years. The ASVspoof 2015 initiative showed that there is great potential to detect spoofing attacks, but also that the detection of previously unforeseen spoofing attacks remains challenging. This paper argues that there is more to be gained from the study of features rather than classifiers and introduces a new feature for spoofing detection based on the constant Q transform, a perceptually-inspired time-frequency analysis tool popular in the study of music. Experimental results obtained using the standard ASVspoof 2015 database show that, when coupled with a standard Gaussian mixture model-based classifier, the proposed constant Q cepstral coefficients (CQCCs) outperform all previously reported results by a significant margin. In particular, those for a subset of unknown spoofing attacks (for which no matched training data was used) is 0.46%, a relative improvement of 72% over the best, previously reported results.


conference of the international speech communication association | 2016

Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus

Tomi Kinnunen; Sahidullah; Ivan Kukanov; Héctor Delgado; Massimiliano Todisco; Achintya Kumar Sarkar; Nicolai Bæk Thomsen; Ville Hautamäki; Nicholas W. D. Evans; Zheng-Hua Tan

Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously in the literature, a treatment of simultaneous speaker and utterance verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper reports different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination of utterance verification with automatic speaker verification is (almost) universally beneficial with significant performance improvements being observed.


conference of the international speech communication association | 2016

Integrated Spoofing Countermeasures and Automatic Speaker Verification: an Evaluation on ASVspoof 2015

Md. Sahidullah; Héctor Delgado; Massimiliano Todisco; Hong Yu; Tomi Kinnunen; Nicholas W. D. Evans; Zheng-Hua Tan

It is well known that automatic speaker verification (ASV) systems can be vulnerable to spoofing. The community has responded to the threat by developing dedicated countermeasures aimed at detecting spoofing attacks. Progress in this area has accelerated over recent years, partly as a result of the first standard evaluation, ASVspoof 2015, which focused on spoofing detection in isolation from ASV. This paper investigates the integration of state-of-the-art spoofing countermeasures in combination with ASV. Two general strategies to countermeasure integration are reported: cascaded and parallel. The paper reports the first comparative evaluation of each approach performed with the ASVspoof 2015 corpus. Results indicate that, even in the case of varying spoofing attack algorithms, ASV performance remains robust when protected with a diverse set of integrated countermeasures.


spoken language technology workshop | 2016

Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

Héctor Delgado; Massimiliano Todisco; Sahidullah; Achintya Kumar Sarkar; Nicholas W. D. Evans; Tomi Kinnunen; Zheng-Hua Tan

Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrolment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and when fused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.


international conference on acoustics, speech, and signal processing | 2017

RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research

Tomi Kinnunen; Sahidullah; Mauro Falcone; Luca Costantini; Rosa González Hautamäki; Dennis Alexander Lehmann Thomsen; Achintya Kumar Sarkar; Zheng-Hua Tan; Héctor Delgado; Massimiliano Todisco; Nicholas W. D. Evans; Ville Hautamäki; Kong Aik Lee

This paper describes a new database for the assessment of automatic speaker verification (ASV) vulnerabilities to spoofing attacks. In contrast to other recent data collection efforts, the new database has been designed to support the development of replay spoofing countermeasures tailored towards the protection of text-dependent ASV systems from replay attacks in the face of variable recording and playback conditions. Derived from the re-recording of the original RedDots database, the effort is aligned with that in text-dependent ASV and thus well positioned for future assessments of replay spoofing countermeasures, not just in isolation, but in integration with ASV. The paper describes the database design and re-recording, a protocol and some early spoofing detection results. The new “RedDots Replayed” database is publicly available through a creative commons license.


IEEE Journal of Selected Topics in Signal Processing | 2017

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

Zhizheng Wu; Junichi Yamagishi; Tomi Kinnunen; Cemal Hanilçi; Mohammed Sahidullah; Aleksandr Sizov; Nicholas W. D. Evans; Massimiliano Todisco

Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence of competitive evaluations and the lack of common datasets has hampered progress in developing effective spoofing countermeasures. This paper describes the ASV Spoofing and Countermeasures (ASVspoof) initiative, which aims to fill this void. Through the provision of a common dataset, protocols, and metrics, ASVspoof promotes a sound research methodology and fosters technological progress. This paper also describes the ASVspoof 2015 dataset, evaluation, and results with detailed analyses. A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation. Priority future research directions are presented in the scope of the next ASVspoof evaluation planned for 2017.


Computer Speech & Language | 2017

Constant Q cepstral coefficients

Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans

Broad evaluation of constant Q cepstral coefficients for spoofing detection.Linearisation of geometric space enables constant Q cepstral processing.Variable spectro-temporal resolution key to detection performance.State-of-the-art performance across three standard databases.Cross-database results point towards new approach for generalisation. Recent evaluations such as ASVspoof 2015 and the similarly-named AVspoof have stimulated a great deal of progress to develop spoofing countermeasures for automatic speaker verification. This paper reports an approach which combines speech signal analysis using the constant Q transform with traditional cepstral processing. The resulting constant Q cepstral coefficients (CQCCs) were introduced recently and have proven to be an effective spoofing countermeasure. An extension of previous work, the paper reports an assessment of CQCCs generalisation across three different databases and shows that they deliver state-of-the-art performance in each case. The benefit of CQCC features stems from a variable spectro-temporal resolution which, while being fundamentally different to that used by most automatic speaker verification system front-ends, also captures reliably the tell-tale signs of manipulation artefacts which are indicative of spoofing attacks. The second contribution relates to a cross-database evaluation. Results show that CQCC configuration is sensitive to the general form of spoofing attack and use case scenario. This finding suggests that the past single-system pursuit of generalised spoofing detection may need rethinking.


conference of the international speech communication association | 2016

Articulation rate filtering of CQCC features for automatic speaker verification

Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans

This paper introduces a new articulation rate filter and reports its combination with recently proposed constant Q cepstral coefficients (CQCCs) in their first application to automatic speaker verification (ASV). CQCC features are extracted with the constant Q transform (CQT), a perceptually-inspired alternative to Fourier-based approaches to time-frequency analysis. The CQT offers greater frequency resolution at lower frequencies and greater time resolution at higher frequencies. When coupled with cepstral analysis and the new articulation rate filter, the resulting CQCC features are readily modelled using conventional techniques. A comparative assessment of CQCCs and mel frequency cepstral coefficients (MFCC) for a short-duration speaker verification scenario shows that CQCCs generally outperform MFCCs and that the two feature representations are highly complementary; fusion experiments with the RSR2015 and RedDots databases show relative reductions in equal error rates of as much as 60% compared to an MFCC baseline.


international conference on biometrics | 2017

Impact of Bandwidth and Channel Variation on Presentation Attack Detection for Speaker Verification

Héctor Delgado; Massimiliano Todisco; Nicholas W. D. Evans; Sahidullah; Wei Ming Liu; Federico Alegre; Tomi Kinnunen; Benoit G. B. Fauve

Vulnerabilities to presentation attacks can undermine confidence in automatic speaker verification (ASV) technology. While efforts to develop countermeasures, known as presentation attack detection (PAD) systems, are now under way, the majority of past work has been performed with high-quality speech data. Many practical ASV applications are narrowband and encompass various coding and other channel effects. PAD performance is largely untested in such scenarios. This paper reports an assessment of the impact of bandwidth and channel variation on PAD performance. Assessments using two current PAD solutions and two standard databases show that they provoke significant degradations in performance. Encouragingly, relative performance improvements of 98% can nonetheless be achieved through feature optimisation. This performance gain is achieved by optimising the spectro-temporal decomposition in the feature extraction process to compensate for narrowband speech. However, compensating for channel variation is considerably more challenging.


international conference on acoustics, speech, and signal processing | 2017

Artificial bandwidth extension using the constant Q transform

Pramod B. Bachhav; Massimiliano Todisco; Moctar Mossi Mossi; Christophe Beaugeant; Nicholas W. D. Evans

Most artificial bandwidth extension (ABE) algorithms are based on the classical source-filter model of speech production. This approach generally requires the dual extension of each component through independent processing. Alternative approaches reported recently operate on the spectrum. With human perception thought to be largely insensitive to phase, most such approaches focus on the extension of the magnitude spectrum alone and rely on Fourier spectral analysis. This paper reports an approach to ABE based on the constant Q transform (CQT), a more perceptually motivated approach to spectral analysis. A Gaussian mixture model is used to estimate missing highband components from available narrowband components before resynthesis with phase estimates obtained from the upsampled narrowband signal. Objective assessment shows that energy normalisation is critical to performance. These findings and the appeal of CQT for ABE are confirmed through informal subjective tests based on the mean opinion score.

Collaboration


Dive into the Massimiliano Todisco's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Sahidullah

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Junichi Yamagishi

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Md. Sahidullah

University of Eastern Finland

View shared research outputs
Researchain Logo
Decentralizing Knowledge