Achintya Kumar Sarkar
Aalborg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Achintya Kumar Sarkar.
conference of the international speech communication association | 2016
Tomi Kinnunen; Sahidullah; Ivan Kukanov; Héctor Delgado; Massimiliano Todisco; Achintya Kumar Sarkar; Nicolai Bæk Thomsen; Ville Hautamäki; Nicholas W. D. Evans; Zheng-Hua Tan
Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously in the literature, a treatment of simultaneous speaker and utterance verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper reports different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination of utterance verification with automatic speaker verification is (almost) universally beneficial with significant performance improvements being observed.
spoken language technology workshop | 2016
Héctor Delgado; Massimiliano Todisco; Sahidullah; Achintya Kumar Sarkar; Nicholas W. D. Evans; Tomi Kinnunen; Zheng-Hua Tan
Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrolment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and when fused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.
2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE) | 2016
Hong Yu; Achintya Kumar Sarkar; Dennis Alexander Lehmann Thomsen; Zheng-Hua Tan; Zhanyu Ma; Jun Guo
Many researchers have demonstrated the good performance of spoofing detection systems under clean training and testing conditions. However, it is well known that the performance of speaker and speech recognition systems significantly degrades in noisy conditions. Therefore, it is of great interest to investigate the effect of noise on the performance of spoofing detection systems. In this paper, we investigate a multi-conditional training method where spoofing detection models are trained with a mix of clean and noisy data. In addition, we study the effect of different noise types as well as speech enhancement methods on a state-of-the-art spoofing detection system based on the dynamic linear frequency cepstral coefficients (LFCC) feature and a Gaussian mixture model maximum-likelihood (GMM-ML) classifier. In the experiment part we consider three additive noise types, Cantine, Babble and white Gaussian at different signal-to-noise ratios, and two mainstream speech enhancement methods, Wiener filtering and minimum mean-square error. The experimental results show that enhancement methods are not suitable for the spoofing detection task, as the spoofing detection accuracy will be reduced after speech enhancement. Multi-conditional training, however, shows potential at reducing error rates for spoofing detection.
international conference on acoustics, speech, and signal processing | 2017
Tomi Kinnunen; Sahidullah; Mauro Falcone; Luca Costantini; Rosa González Hautamäki; Dennis Alexander Lehmann Thomsen; Achintya Kumar Sarkar; Zheng-Hua Tan; Héctor Delgado; Massimiliano Todisco; Nicholas W. D. Evans; Ville Hautamäki; Kong Aik Lee
This paper describes a new database for the assessment of automatic speaker verification (ASV) vulnerabilities to spoofing attacks. In contrast to other recent data collection efforts, the new database has been designed to support the development of replay spoofing countermeasures tailored towards the protection of text-dependent ASV systems from replay attacks in the face of variable recording and playback conditions. Derived from the re-recording of the original RedDots database, the effort is aligned with that in text-dependent ASV and thus well positioned for future assessments of replay spoofing countermeasures, not just in isolation, but in integration with ASV. The paper describes the database design and re-recording, a protocol and some early spoofing detection results. The new “RedDots Replayed” database is publicly available through a creative commons license.
IEEE Journal of Translational Engineering in Health and Medicine | 2017
Andriy Temko; Achintya Kumar Sarkar; Geraldine B. Boylan; Sean Mathieson; William P. Marnane; Gordon Lightbody
The problem of creating a personalized seizure detection algorithm for newborns is tackled in this paper. A probabilistic framework for semi-supervised adaptation of a generic patient-independent neonatal seizure detector is proposed. A system that is based on a combination of patient-adaptive (generative) and patient-independent (discriminative) classifiers is designed and evaluated on a large database of unedited continuous multichannel neonatal EEG recordings of over 800 h in duration. It is shown that an improvement in the detection of neonatal seizures over the course of long EEG recordings is achievable with on-the-fly incorporation of patient-specific EEG characteristics. In the clinical setting, the employment of the developed system will maintain a seizure detection rate at 70% while halving the number of false detections per hour, from 0.4 to 0.2 FD/h. This is the first study to propose the use of online adaptation without clinical labels, to build a personalized diagnostic system for the detection of neonatal seizures.
Computer Speech & Language | 2018
Achintya Kumar Sarkar; Zheng-Hua Tan
Propose pass-phrase dependent background model (PBM) for in text-dependent speaker verification (SV).Consider two approaches to build PBMs: speaker independent and dependent.PBM significantly reduces the error rates of TD-SV for target- and impostor-wrong.Performance is demonstrated on GMM-UBM, HMM-UBM and i-vector paradigms.Conduct experiments on the RedDots and RSR2015 databases consisting of short utterances. In this paper, we propose pass-phrase dependent background models (PBMs) for text-dependent (TD) speaker verification (SV) to integrate the pass-phrase identification process into the conventional TD-SV system, where a PBM is derived from a text-independent background model through adaptation using the utterances of a particular pass-phrase. During training, pass-phrase specific target speaker models are derived from the particular PBM using the training data for the respective target model. While testing, the best PBM is first selected for the test utterance in the maximum likelihood (ML) sense and the selected PBM is then used for the log likelihood ratio (LLR) calculation with respect to the claimant model. The proposed method incorporates the pass-phrase identification step in the LLR calculation, which is not considered in conventional standalone TD-SV systems. The performance of the proposed method is compared to conventional text-independent background model based TD-SV systems using either Gaussian mixture model (GMM)-universal background model (UBM) or hidden Markov model (HMM)-UBM or i-vector paradigms. In addition, we consider two approaches to build PBMs: speaker-independent and speaker-dependent. We show that the proposed method significantly reduces the error rates of text-dependent speaker verification for the non-target types: target-wrong and impostor-wrong while it maintains comparable TD-SV performance when impostors speak a correct utterance with respect to the conventional system. Experiments are conducted on the RedDots challenge and the RSR2015 databases that consist of short utterances.
conference of the international speech communication association | 2016
Achintya Kumar Sarkar; Zheng-Hua Tan
conference of the international speech communication association | 2010
Achintya Kumar Sarkar; Srinivasan Umesh
conference of the international speech communication association | 2009
Shakti P. Rath; Srinivasan Umesh; Achintya Kumar Sarkar
conference of the international speech communication association | 2017
Kong Aik Lee; Ville Hautamäki; Anthony Larcher; Chunlei Zhang; Andreas Nautsch; T. Stafylakis; Gang Liu; Mickael Rouvier; Wei Rao; Federico Alegre; Jianbo Ma; Man-Wai Mak; Achintya Kumar Sarkar; Héctor Delgado; Rahim Saeidi; Hagai Aronowitz; Aleksandr Sizov; Hanwu Sun; Trung Hieu Nguyen; Guangsen Wang; Bin Ma; Ville Vestman; Md. Sahidullah; M. Halonen; Anssi Kanervisto; G. Le Lan; Fahimeh Bahmaninezhad; S. Isadskiy; Christian Rathgeb; Christoph Busch