Is this you? Create Your Porfile

Rohan Kumar Das

Indian Institute of Technology Guwahati

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rohan Kumar Das is active.

Explore More

Publication

Featured researches published by Rohan Kumar Das.

national conference on communications | 2014

Speech biometric based attendance system

Subhadeep Dey; Sujit Barman; Ramesh K. Bhukya; Rohan Kumar Das; B C Haris; S. R. M. Prasanna; Rohit Sinha

In this paper we present the development and implementation of a speech biometric based attendance system. The users access the system by making a call from few pre-decided mobile phones. An interactive voice response (IVR) system guides a new user in the enrollment and an enrolled user in the verification processes. The system uses text independent speaker verification with MFCC features and i-vector based speaker modeling for authenticating the user. Linear discriminant analysis and within class covariance normalization are used for normalizing the effects due to session/environment variations. A simple cosine distance scoring along with score normalization is used as the classifier and a fixed threshold is used for making the decision. The developed system has been used by a group of 110 students for about two months on a regular basis. The system performance in terms of recognition rate is found to be 94.2 % and the average response time of the system for a test data of duration 50 seconds is noted to be 26 seconds.

International Journal of Speech Technology | 2012

Multivariability speaker recognition database in Indian scenario

Haris B C; Gayadhar Pradhan; A. Misra; S. R. M. Prasanna; Rohan Kumar Das; Rohit Sinha

In this paper we describe the collection and organization of the speaker recognition database in Indian scenario named as IITG Multivariability Speaker Recognition Database. The database contains speech from 451 speakers speaking English and other Indian languages both in conversational and read speech styles recorded using various sensors in parallel under different environmental conditions. The database is organized into four phases on the basis of different conditions employed for the recording. The results of the initial studies conducted on a speaker verification system exploring the impact of mismatch in training and test conditions using the collected data are also included. A copy of this database can be obtained from the authors by contacting them.

International Journal of Speech Technology | 2013

Development and evaluation of online text-independent speaker verification system for remote person authentication

Debmalya Chakrabarty; S. R. M. Prasanna; Rohan Kumar Das

In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.

signal processing systems | 2017

Development of Multi-Level Speech based Person Authentication System

Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna

This work presents the development of a multi-level speech based person authentication system with attendance as an application. The multi-level system consists of three different modules of speaker verification, namely voice-password, text-dependent and text-independent speaker verification. The three speaker verification modules are combined in a sequential manner to develop a multi-level framework which is ported over a telephone network through interactive voice response (IVR) system for aiding remote authentication. The users call from a fixed set of mobile handsets to verify their claim against their respective models, which is then authenticated in a multi-level mode using the above stated three modules. An analysis over a period of two months is shown on the performance of the multi-level system in attendance marking. The multi-level framework having combination of the three modules helps in achieving better performance than that of the individual modules, which shows its potential for practical deployment.

national conference on communications | 2015

Different aspects of source information for limited data speaker verification

Rohan Kumar Das; Debadatta Pati; S. R. Mahadeva Prasanna

Limited data speaker verification has shown its significance in practical system oriented applications. The paper shows the importance of different aspects of voice source feature for limited test data scenario. A baseline speaker verification system using conventional mel frequency cepstral co-efficients (MFCC) feature is developed and performance under limited test data condition (≤10 s) is evaluated. A parallel system based on source feature mel power difference of spectrum in subband (M-PDSS) is developed in the i-vector based speaker verification framework. Both the systems were fused at the score level for the cases of short segments of test speech, which demonstrated the importance of source feature with reduction in test data duration. A comparative study of the M-PDSS feature is then made with our earlier work using discrete cosine transform of the integrated linear prediction residual (DCTILPR) feature and then fusion of two source features M-PDSS and DCTILPR along with MFCC features is carried out. An absolute improvement of 5.19% is obtained for 2 s of test data which conveys the significance of multiple source information under limited data speaker verification as it carries different aspects of source information.

Archive | 2015

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Rohan Kumar Das; S. R. M. Prasanna

With the current advancements achieved in the area of speaker verification, a significant performance is obtained under sufficient data conditions. Whereas when there comes a constraint in the amount of speech data, it reflects directly on the performance. This paper presents initial speaker verification studies under variable duration test segments over a standard canned database and, then, studies for variable duration test segments over a database collected from a practical speaker verification system. The latter case helps to explore session variability issues and its impact on speaker verification. This information is used for remodeling of the enrolled speaker models, which in turn improves the system performance significantly.

Journal of the Acoustical Society of America | 2016

Exploring different attributes of source information for speaker verification with limited test data

Rohan Kumar Das; S. R. Mahadeva Prasanna

This work explores mel power difference of spectrum in subband, residual mel frequency cepstral coefficient, and discrete cosine transform of the integrated linear prediction residual for speaker verification under limited test data conditions. These three source features are found to capture different attributes of source information, namely, periodicity, smoothed spectrum information, and shape of the glottal signal, respectively. On the NIST SRE 2003 database, the proposed combination of the three source features performs better [equal error rate (EER): 20.19%, decision cost function (DCF): 0.3759] than the mel frequency cepstral coefficient feature (EER: 22.31%, DCF: 0.4128) for 2 s duration of test segments.

international conference on signal processing | 2014

Significance of glottal activity detection and glottal signature for text dependent speaker verification

K. Ramesh; S. R. Mahadeva Prasanna; Rohan Kumar Das

This paper presents the significance of glottal activity and glottal signature for capturing speaker characteristics via text-dependent speaker verification (SV) system. Dynamic Time Warping (DTW) based SV system is used as base line system for matching, where detection of end-points is based on the Glottal Activity (GA) region obtained through the zero frequency filter signal (ZFFS). The GA regions are detected by exploiting the strength of excitation and periodic nature of speech and glottal signal. Identified GA regions are further processed to consider packed GA regions after removing the silence regions in between detected GA region for template matching. Using periodic nature of glottal signal in the GA region, each cycle is divided into four quadrants by considering peak and zero crossings to produce the glottal signatures and used as excitation source features. The performance of proposed end point detection method is tested by conducting text dependent SV experiments and it is compared with energy based end point detection method. The system performance improves further when feature selection is done through ZFFS algorithm in a packed GA region over the energy based Voice Activity Detection(VAD) approach.

international conference on signal processing | 2016

Significance of constraining text in limited data text-independent speaker verification

Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna

This work projects the importance of phonetic match between train and test session for a text-independent framework under limited test data condition. The robustness of text-independent speaker verification (SV) tends to fall down with the reduction of the amount of speech involved. From a deployable application oriented system point of view, the amount of speech involved, is expected to be less to ensure user comfort. Keeping this as a priority and based on the literature studies in this direction, a framework is proposed for the development of a text constrained text-independent SV system that emulates the anatomy of the text-dependent framework. This framework recommends having a text constrained speaker model developed using limited data of around 10 sec. The same content is spoken by the user during testing. A baseline system is built over a data collected in a practical scenario where sufficient train and limited test data is used. On evaluating the two systems over i-vector based SV system, the text constrained model based topology is found to work exceedingly well as compared to the conventional method under limited data condition. Further, it was observed that having the phonetic content of the test session in the training session helps in improving the baseline system performance.

international conference on signal processing | 2016

Countermeasure to handle replay attacks in practical speaker verification systems

Anupama Paul; Rohan Kumar Das; Rohit Sinha; S. R. Mahadeva Prasanna

In this work, a novel countermeasure is proposed for protecting the speaker verification (SV) system to replay based spoofing attacks. The replay attacks refer to the attacks made with recorded speech of a particular speaker by playing them back to the system, claiming as an authentic speaker. On analyzing live and recorded speech examples, it was noted that the low frequency contents get suppressed in case of recorded speech which can be considered as a distinguishable characteristics. The proposed approach is based on creation of average spectral bitmap for live and record speech, that captures the difference of the two categories in the low frequency range. The spectral bitmap of the test speech is compared to the averaged spectral bitmap of live and recorded speech by cosine kernel distance for identification. Further it is compared to a contrast method based on Gaussian mixture model (GMM) technique, where mel-frequency cepstral coefficient (MFCC) features of live and recorded speech data are extracted and separate GMMs are trained for each category. MFCC features of test speech are taken and likelihood is computed with respect to live and recorded models for classification. The experimental setup for this work is made over telephone channel based text-independent SV framework with two types of recorded speech i.e. close and distant recording. The proposed countermeasure is found to handle the replay attacks in an impressive manner.

Explore More