Sarfaraz Jelil
Indian Institute of Technology Guwahati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarfaraz Jelil.
signal processing systems | 2017
Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna
This work presents the development of a multi-level speech based person authentication system with attendance as an application. The multi-level system consists of three different modules of speaker verification, namely voice-password, text-dependent and text-independent speaker verification. The three speaker verification modules are combined in a sequential manner to develop a multi-level framework which is ported over a telephone network through interactive voice response (IVR) system for aiding remote authentication. The users call from a fixed set of mobile handsets to verify their claim against their respective models, which is then authenticated in a multi-level mode using the above stated three modules. An analysis over a period of two months is shown on the performance of the multi-level system in attendance marking. The multi-level framework having combination of the three modules helps in achieving better performance than that of the individual modules, which shows its potential for practical deployment.
international conference on signal processing | 2016
Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna
This work projects the importance of phonetic match between train and test session for a text-independent framework under limited test data condition. The robustness of text-independent speaker verification (SV) tends to fall down with the reduction of the amount of speech involved. From a deployable application oriented system point of view, the amount of speech involved, is expected to be less to ensure user comfort. Keeping this as a priority and based on the literature studies in this direction, a framework is proposed for the development of a text constrained text-independent SV system that emulates the anatomy of the text-dependent framework. This framework recommends having a text constrained speaker model developed using limited data of around 10 sec. The same content is spoken by the user during testing. A baseline system is built over a data collected in a practical scenario where sufficient train and limited test data is used. On evaluating the two systems over i-vector based SV system, the text constrained model based topology is found to work exceedingly well as compared to the conventional method under limited data condition. Further, it was observed that having the phonetic content of the test session in the training session helps in improving the baseline system performance.
conference of the international speech communication association | 2016
Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna
This work highlights the impact of session variability and template aging on speaker verification (SV) using fixed phrase short utterances from the RedDots database. These have been collected over a period of one year and contain a large number of sessions per speaker. Session variation has been found to have a direct influence on SV performance and its significance is even greater for the case of fixed phrase short utterances as a very small amount of speech data is involved for speaker modeling as well as testing. Similarly for a practical deployable SV system when there is large session variation involved over a period of time, the template aging of the speakers may effect the SV performance. This work attempts to address some issues related to session variability and template aging of speakers which are found for data having large session variability, that if considered can be utilized for improving the performance of an SV system.
ieee region 10 conference | 2016
Kuruvachan K. George; Rohan Kumar Das; Sarfaraz Jelil; K. Arun Das; C. Santhosh Kumar; S. R. Mahadeva Prasanna; Ashish Panda
In this work, the details of AMRITA-TCS and IITGUWAHATI speaker recognition systems submitted to the Speakers in the Wild (SITW) speaker recognition challenge are presented. The AMRITA-TCS system is a fusion of i-vector with a backend probabilistic linear discriminant analysis (i-PLDA) system and a cosine distance features (CDF) with backend support vector machine classifier (CDF-SVM) system, developed using the short term cepstral features, mel frequency cepstral coefficients (MFCC) and power normalized cepstral coefficients (PNCC), respectively. The IITGUWAHATI system is an i-PLDA system using MFCC with a vowel like region (VLR) based feature selection (i-PLDA-VLR). The experimental results reported in this work are based on the core-core condition of the challenge. Finally, a fusion of AMRITA-TCS and IITGUWAHATI speaker recognition systems is carried out that enhances the performance than each of the subsystems.
International Journal of Speech Technology | 2018
Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna
This work describes the process of collection and organization of a multi-style database for speaker recognition. The multi-style database organization is based on three different categories of speaker recognition: voice-password, text-dependent and text-independent framework. Three Indian institutes collaborated for the collection of the database at respective sites. The database is collected over an online telephone network that is deployed for speech based student attendance system. This enables the collection of data for a longer period from different speakers having session variabilities, which is useful for speaker verification (SV) studies in practical scenario. The database contains data of 923 speakers for the three different modes of SV and hence termed as multi-style speaker recognition database. This database is useful for session variability, multi-style speaker recognition and short utterance based SV studies. Initial results are reported over the database for the three different modes of SV. A copy of the database can be obtained by contacting the authors.
International Journal of Image, Graphics and Signal Processing | 2018
Archita Hore; S. R. Nirmala; Rohan Kumar Das; Sarfaraz Jelil; S. R. M. Prasanna
This work focuses on text dependent speaker verification system where a source feature specifically residual Mel frequency cepstral coefficients (RMFCC), has been extracted in addition to a vocal tract system feature namely Mel frequency cepstral coefficients (MFCC). The RMFCC features are derived from the LP residuals whereas MFCC features are derived from the cepstral analysis of the speech signal. Thus, these two features have different information about the speaker. A four cohort speaker’s set has been prepared using these two features and dynamic time warping (DTW) is used as the classifier. Performance comparison of the text dependent speaker verification model using MFCC and RMFCC features are enumerated. Experimental results shows that, using RMFCC feature alone do not give satisfactory results in comparison to MFCC. Also, the system’s performance obtained using the MFCC features, is not optimum. So, to improve the performance of the system, these two features are combined together using different combination algorithms. The proposed lowest ranking method yields good performance with an equal error rate (EER) of 7.50%. To further improve the efficiency of the system, the proposed method is combined along with the strength voting and weighted ranking method in the hierarchical combination method to obtain an EER of 3.75%.
Circuits Systems and Signal Processing | 2018
Rohan Kumar Das; Sarfaraz Jelil; S. R. Mahadeva Prasanna
This work focuses on long-enrollment with short-test speaker verification (SV) from the perspective of application-oriented systems. The importance of phonetic match between train and test models is explored in terms of having a text-constraint model-based framework on Part IV of RedDots database. This database has a text-dependent and a text-prompted-based enrollment conditions for speaker modeling. Two different text-constraint setups are formalized for evaluating the effect of text match on train and test sessions. Further, the excitation source features mel power difference of spectrum in subbands, residual mel frequency cepstral coefficient and discrete cosine transform of integrated linear prediction residual are investigated to determine their significance for text-constraint-based framework. Although the source features individually perform poorer compared to the conventional mel frequency cepstral coefficient (MFCC) features, their significance is reflected in fusion due to the complementary nature of information carried by them. Additionally, the source features become imperative for text-constraint-based models for long-enrollment with short-test SV in fusion to MFCC features and achieves commendable improvement from baseline framework of text-prompted-based enrollment condition. This thus minimizes the performance difference between text-dependent and text-prompted-based enrollment condition showing importance of text-constraint models and source information in long-enrollment with short-test-based framework favorable from the perspective of field deployable systems.
national conference on communications | 2017
Sarfaraz Jelil; Rohan Kumar Das; S. R. Mahadeva Prasanna; Rohit Sinha
One of the major reasons for the performance degradation of a speaker verification (SV) system in real-world conditions is its inability to spot speech regions due to the presence of noise. This work focuses on the role of voice activity detection (VAD) methods in alleviating such shortcomings. The experiments are conducted on the core-core task of the speakers in the wild (SITW) challenge. Two VAD approaches are explored in this work. One of them is the recently proposed self-adaptive VAD and the other is based on vowel-like region (VLR) detection. For evaluating the effectiveness of these approaches, the SV systems are developed using the i-vector framework in the front-end and probabilistic linear discriminant analysis (PLDA) in the back-end. The self-adaptive VAD based system shows better performance compared to the VLR based system in high SNR condition. Under degraded conditions, the VLR based method is relatively more robust compared to self-adaptive VAD. Exploiting these complementary features, significant improvements in the SV performances are noted with the fusion of scores of the two systems.
conference of the international speech communication association | 2015
Sarfaraz Jelil; Rohan Kumar Das; Rohit Sinha; S. R. Mahadeva Prasanna
conference of the international speech communication association | 2018
Sarfaraz Jelil; Sishir Kalita; S. R. Mahadeva Prasanna; Rohit Sinha