Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nicholas W. D. Evans is active.

Publication


Featured researches published by Nicholas W. D. Evans.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Speaker Diarization: A Review of Recent Research

Xavier Anguera Miro; Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille; Gerald Friedland; Oriol Vinyals

Speaker diarization is the task of determining “who spoke when?” in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.


Speech Communication | 2015

Spoofing and countermeasures for speaker verification

Zhizheng Wu; Nicholas W. D. Evans; Tomi Kinnunen; Junichi Yamagishi; Federico Alegre; Haizhou Li

While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has responded with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; biometric systems remain vulnerable to spoofing. Despite a growing momentum to develop spoofing countermeasures for automatic speaker verification, now that the technology has matured sufficiently to support mass deployment in an array of diverse applications, greater effort will be needed in the future to ensure adequate protection against spoofing. This article provides a survey of past work and identifies priority research directions for the future. We summarise previous studies involving impersonation, replay, speech synthesis and voice conversion spoofing attacks and more recent efforts to develop dedicated countermeasures. The survey shows that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks.


international conference on biometrics theory applications and systems | 2013

A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns

Federico Alegre; Asmaa Amehraye; Nicholas W. D. Evans

The vulnerability of automatic speaker verification systems to spoofing is now well accepted. While recent work has shown the potential to develop countermeasures capable of detecting spoofed speech signals, existing solutions typically function well only for specific attacks on which they are optimised. Since the exact nature of spoofing attacks can never be known in practice, there is thus a need for generalised countermeasures which can detect previously unseen spoofing attacks. This paper presents a novel countermeasure based on the analysis of speech signals using local binary patterns followed by a one-class classification approach. The new countermeasure captures differences in the spectro-temporal texture of genuine and spoofed speech, but relies only on a model of the former. We report experiments with three different approaches to spoofing and with a state-of-the-art i-vector speaker verification system which uses probabilistic linear discriminant analysis for intersession compensation. While a support vector machine classifier is tuned with examples of converted voice, it delivers reliable detection of spoofing attacks using synthesized speech and artificial signals, attacks for which it is not optimised.


international conference on acoustics, speech, and signal processing | 2013

Spoofing countermeasures to protect automatic speaker verification from voice conversion

Federico Alegre; Asmaa Amehraye; Nicholas W. D. Evans

This paper presents a new countermeasure for the protection of automatic speaker verification systems from spoofed, converted voice signals. The new countermeasure exploits the common shift applied to the spectral slope of consecutive speech frames involved in the mapping of a spoofers voice signal towards a statistical model of a given target. While the countermeasure exploits prior knowledge of the attack in an admittedly unrealistic sense, it is shown to detect almost all spoofed signals which otherwise provoke significant increases in false acceptance. The work also discusses the need for formal evaluations to develop new countermeasures which are less reliant on prior knowledge.


international conference on acoustics, speech, and signal processing | 2006

An Assessment on the Fundamental Limitations of Spectral Subtraction

Nicholas W. D. Evans; John S. D. Mason; Wei Ming Liu; Benoit G. B. Fauve

As with many approaches to noise robust automatic speech recognition (ASR) the benefits of spectral subtraction tend to diminish as noise levels in the order of 0 dB are approached. Whilst the majority of related work focuses on reducing magnitude errors a number of new approaches addressing the often overlooked, additional sources of error have appeared in the literature in recent years. Relatively lacking in the literature, however, is an empirical assessment which compares the effects of each error when noisy speech is processed by spectral subtraction. Such studies are vital in order to appreciate the potential penalty in performance when sources of error are overlooked. The objective in this paper is to assess, through ASR, the performance penalty associated with each source of error when noisy speech is treated with spectral subtraction. Experimental evidence based on two standard European databases and ASR protocols illustrates that, perhaps contrary to popular belief, for noise levels in the order of 0 dB and below, these often overlooked sources of error can lead to non-negligible degradations in performance. Whilst not a new idea, here the original emphasis is a thorough assessment that empirically highlights both the fundamental limitations and potential benefit of including the full complement of errors in the spectral subtraction model


Pattern Recognition Letters | 2014

A subspace co-training framework for multi-view clustering

Xuran Zhao; Nicholas W. D. Evans; Jean-Luc Dugelay

We combined LDA and k-means algorithm with co-training into an unified framework.We extended co-training classifiers to co-training subspaces.LDA projections can be learnt from data with random label noise.Significantly out-perform alternative methods on 3 real-world datasets. This paper addresses the problem of unsupervised clustering with multi-view data of high dimensionality. We propose a new algorithm which learns discriminative subspaces in an unsupervised fashion based upon the assumption that a reliable clustering should assign same-class samples to the same cluster in each view. The framework combines the simplicity of k-means clustering and Linear Discriminant Analysis (LDA) within a co-training scheme which exploits labels learned automatically in one view to learn discriminative subspaces in another. The effectiveness of the proposed algorithm is demonstrated empirically under scenarios where the conditional independence assumption is either fully satisfied (audio-visual speaker clustering) or only partially satisfied (handwritten digit clustering and document clustering). Significant improvements over alternative multi-view clustering approaches are reported in both cases. The new algorithm is flexible and can be readily adapted to use different distance measures, semi-supervised learning, and non-linear problems.


international conference on acoustics, speech, and signal processing | 2010

The lia-eurecom RT'09 speaker diarization system: Enhancements in speaker modelling and cluster purification

Simon Bozonnet; Nicholas W. D. Evans; Corinne Fredouille

There are two approaches to speaker diarization. They are bottom-up and top-down. Our work on top-down systems show that they can deliver competitive results compared to bottom-up systems and that they are extremely computationally efficient, but also that they are particularly prone to poor model initialisation and cluster impurities. In this paper we present enhancements to our state-of-the-art, top-down approach to speaker diarization that deliver improved stability across three different datasets composed of conference meetings from five standard NIST RT evaluations. We report an improved approach to speaker modelling which, despite having greater chances for cluster impurities, delivers a 35% relative improvement in DER for the MDM condition. We also describe new work to incorporate cluster purification into a top-down system which delivers relative improvements of 44% over the baseline system without compromising computational efficiency.


Odyssey 2016 | 2016

A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients

Massimiliano Todisco; Héctor Delgado; Nicholas W. D. Evans

Efforts to develop new countermeasures in order to protect automatic speaker verification from spoofing have intensified over recent years. The ASVspoof 2015 initiative showed that there is great potential to detect spoofing attacks, but also that the detection of previously unforeseen spoofing attacks remains challenging. This paper argues that there is more to be gained from the study of features rather than classifiers and introduces a new feature for spoofing detection based on the constant Q transform, a perceptually-inspired time-frequency analysis tool popular in the study of music. Experimental results obtained using the standard ASVspoof 2015 database show that, when coupled with a standard Gaussian mixture model-based classifier, the proposed constant Q cepstral coefficients (CQCCs) outperform all previously reported results by a significant margin. In particular, those for a subset of unknown spoofing attacks (for which no matched training data was used) is 0.46%, a relative improvement of 72% over the best, previously reported results.


IEEE Signal Processing Magazine | 2015

Biometrics Systems Under Spoofing Attack [An evaluation methodology and lessons learned]

Abdenour Hadid; Nicholas W. D. Evans; Sébastien Marcel; Julian Fierrez

Biometrics already form a significant component of current and emerging identification technologies. Biometrics systems aim to determine or verify the identity of an individual from their behavioral and/or biological characteristics. Despite significant progress, some biometric systems fail to meet the multitude of stringent security and robustness requirements to support their deployment in some practical scenarios. Among current concerns are vulnerabilities to spoofing?persons who masquerade as others to gain illegitimate accesses to protected data, services, or facilities. While the study of spoofing, or rather antispoofing, has attracted growing interest in recent years, the problem is far from being solved and will require far greater attention in the coming years. This tutorial article presents an introduction to spoofing and antispoofing research. It describes the vulnerabilities, presents an evaluation methodology for the assessment of spoofing and countermeasures, and outlines research priorities for the future.


international conference on acoustics, speech, and signal processing | 2006

Assessment of Objective Quality Measures for Speech Intelligibility Estimation

Wei Ming Liu; Keith A. Jellyman; John S. D. Mason; Nicholas W. D. Evans

This paper investigates the accuracy of automatic speech recognition (ASR) and 6 other well-reported objective quality measures for the task of estimating speech intelligibility. It is believed to be the first assessment of such a range of measures side-by-side and in the context of intelligibility. A total of 39 degradation conditions including those from a newly proposed low bit rate (0.3 to 1.5 kbps) codec and a noise suppression system are considered. They provide real and varied scenarios to assess the measures. The objective scores are compared to subjective listening scores, and their correlation used to assess the approach. All tests are conducted on the European standard Aurora 2 corpus. Experiments show that ASR and perceptual estimation of speech quality (PESQ) are potentially reliable estimators of intelligibility with subjective correlation as high as 0.99 and 0.96 respectively. Furthermore, ASR gives a trend corresponding to that of subjective intelligibility assessment for the different configurations of the new codec, while most others fail

Collaboration


Dive into the Nicholas W. D. Evans's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Junichi Yamagishi

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sahidullah

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge