Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dipanjan Nandi is active.

Publication


Featured researches published by Dipanjan Nandi.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

Language identification using Hilbert envelope and phase information of linear prediction residual

Dipanjan Nandi; Debadatta Pati; K. Sreenivasa Rao

In this paper, magnitude and phase components of excitation source information are explored for language identification (LID) study. The linear prediction (LP) residual of speech signal represents the excitation source information. The magnitude and phase components of LP residual are processed individually at sub-segmental, segmental and supra-segmental levels. Evidences from both magnitude and phase components of LP residual are combined to capture the language-specific excitation source information. The LID studies are carried out on IITKGP-MLILSC speech database. The segmental level information yields better performance compared to sub-segmental and supra-segmental level information. The combined evidences from three levels represent the excitation source information. This study shows that, both magnitude and phase of LP residual contains significant language-specific excitation source information. From the LID performances of this study, it is observed that the phase component of LP residual contains more language discriminative information than the magnitude component of LP residual.


International Journal of Speech Technology | 2015

Implicit excitation source features for robust language identification

Dipanjan Nandi; Debadatta Pati; K. Sreenivasa Rao

In present work, the robustness of excitation source features has been analyzed for language identification (LID) task. The raw samples of linear prediction (LP) residual signal, its magnitude and phase components are processed at sub-segmental, segmental and supra-segmental levels for capturing the robust language-specific phonotactic information. Present LID study has been carried out on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC). Gaussian mixture models are used to develop the LID systems using robust language-specific excitation source information. Robustness of excitation source information has been evinced in view of (i) background noise, (ii) varying amount of training data and (iii) varying length of test samples. Finally, the robustness of proposed excitation source features is compared with the well-known spectral features using LID performances obtained from IITKGP-MLILSC database. Segmental level excitation source features obtained from raw samples of LP residual signal and its phase component perform better at low SNR levels, compared with the vocal tract features.


international conference on contemporary computing | 2014

Significance of CV transition and steady vowel regions for language identification

Dipanjan Nandi; Arup Kumar Dutta; K. Sreenivasa Rao

The present work explores the significance of the consonant-vowel (CV) transition and steady vowel (SV) regions for language identification (LID) task. The language-specific vocal tract information represented by Mel-frequency cepstral coefficients (MFCCs), extracted from the CV transition and steady vowel regions for LID task. The duration of CV transition and steady vowel regions are varied to analyze LID performance. The evidences obtained from the CV transition and steady vowel regions are combined to investigate the existence of complementary information in these two regions. The LID study carried out on 27 Indian languages from IITKGP-MLILSC speech database. The Gaussian mixture modelling (GMM) technique has been used for developing the language models. The average LID performances obtained by processing CV transition region and steady vowel regions are 70% and 71% respectively. In contemporary works, LID system has been developed by processing whole speech utterances, which provides 72% recognition accuracy.


Computer Speech & Language | 2017

Implicit processing of LP residual for language identification

Dipanjan Nandi; Debadatta Pati; K. Sreenivasa Rao

Excitation source information is explored for language identification.Implicit relations present in the LP residual samples are examined for LID task.The magnitude component of LP residual is explored for discriminating languages.The phase information present among LP residual samples is explored for LID task.Combined LID systems are developed using source features to enhance LID accuracy. Present work explores the excitation source information for the language identification (LID) task. In this work, excitation source information is captured by implicit processing of linear prediction (LP) residual signal for discriminating the languages. Raw samples of LP residual signal, its magnitude, and phase components are processed independently at sub-segmental, segmental and suprasegmental levels for extracting the language-specific excitation source information. The LID studies are carried out using 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC) and 11 international languages from OGI-MLTS corpus. The Gaussian mixture models (GMMs) are used in this work to model the language-specific excitation source information for LID task. From the experimental results, it can be observed that, features extracted from segmental level yields better identification accuracy (50.92%), compared to sub-segmental (47.77%) and suprasegmental levels (43.88%). Further, the evidence from all three levels is combined to obtain the complete excitation source information. Finally, we have investigated the existence of non-overlapping language-specific information present in excitation source and vocal tract features.


Computer Speech & Language | 2017

Parametric representation of excitation source information for language identification

Dipanjan Nandi; Debadatta Pati; K. Sreenivasa Rao

Excitation source information is explored for language identification.RMFCC and MPDSS features represent segmental level language-specific information.GFD parameters capture sub-segmental level language-specific information.PC and ESC represent supra-segmental level excitation source information.Complementary information from source and system features is examined. In this work, the linear prediction (LP) residual signal has been parameterized to capture the excitation source information for language identification (LID) study. LP residual signal has been processed at three different levels: sub-segmental, segmental and supra-segmental levels to demonstrate different aspects of language-specific excitation source information. Proposed excitation source features have been evaluated on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC), Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) and National Institute of Standards and Technology Language Recognition Evaluation (NIST LRE) 2011 corpora. LID systems were developed using Gaussian mixture model (GMM) and i-vector based approaches. Experimental results have shown that segmental level parametric features provide better identification accuracy (62%), compared to sub-segmental (40%) and supra-segmental level (34%) features. Excitation source features obtained from three levels show distinct language-specific evidence. Therefore, the scores from all three levels are combined to obtain the complete excitation source information for the LID task. LID performances achieved from both the excitation source and vocal tract system are compared. Finally, the scores obtained by processing the vocal tract and excitation source features are combined to achieve better improvement in LID accuracy. The best recognition accuracies obtained from stage-IV integrated LID systems I, II and III are 69%, 70% and 72% respectively.


international conference on signal processing | 2014

Sub-segmental, segmental and supra-segmental analysis of linear prediction residual signal for language identification

Dipanjan Nandi; Debadatta Pati; K. Sreenivasa Rao

In this work, excitation source information is explored for language identification (LID) task. The excitation signal is represented by linear prediction (LP) residual. Different aspects of the excitation source information can be captured by processing LP residual signal at sub-segmental, segmental and supra-segmental levels. Gaussian mixture modelling (GMM) technique is used to build the language models. Present LID study has been carried out on IITKGP-MLILSC speech database. Individually, the segmental level information provides good LID accuracy followed by sub-segmental and supra-segmental level information. Combined evidences from all three levels represent the complete excitation source information. Finally, a comparative study has been carried out between the vocal tract and excitation source features, which portrays the distinct nature of these two features. Combination of both the features, yield an improvement of 10.01% in LID accuracy than only excitation source information. This observation indicates the significance of excitation source information for LID task.


Archive | 2015

Language Identification Using Excitation Source Features

K. Sreenivasa Rao; Dipanjan Nandi

This chapter introduces the basic goal of language identification (LID) and its impacts on real-life applications. A brief overview of the basic features used for developing LID systems has been given and different categories of LID systems are also discussed here. Eventually, the primary issues in developing LID systems and the major contributions of this book towards solving those issues have been highlighted.


ieee india conference | 2013

Multilingual speaker recognition on Indian languages

Sourjya Sarkar; K. Sreenivasa Rao; Dipanjan Nandi; Sunil Kumar

In this paper we explore the performance of multilingual speaker recognition systems developed on the IITKGP-MLILSC speech corpus. Closed-set speaker identification and speaker verification experiments are individually conducted on 13 widely spoken Indian languages. In particular, we focus on the effect of language mismatch in the speaker recognition performance of individual languages and all languages together. The standard GMM-based speaker recognition framework is used. While the average language-independent speaker identification rate is as high as 95.21%, an average equal error rate of 11.71% shows scope for further improvement in speaker verification performance.


Archive | 2015

Language Identification—A Brief Review

K. Sreenivasa Rao; Dipanjan Nandi

This chapter provides compendious reviews about both the explicit and implicit LID systems present in the literature. Existing works related to language identification in Indian context are briefly discussed. The related works about the excitation source features are also presented here. Various speech features and models proposed in the context of language identification are briefly reviewed in this chapter. The motivation for the present work from the existing literature is briefly discussed.


Archive | 2015

Complementary and Robust Nature of Excitation Source Features for Language Identification

K. Sreenivasa Rao; Dipanjan Nandi

This chapter discusses about the combination of implicit and parametric features of excitation source to enhance the LID accuracy. Further, complementary nature of excitation source and vocal tract features is exploited for improving the LID accuracy. The robustness of proposed language-specific excitation source features is investigated on various noisy background environments.

Collaboration


Dive into the Dipanjan Nandi's collaboration.

Top Co-Authors

Avatar

K. Sreenivasa Rao

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Debadatta Pati

Balasore College of Engineering

View shared research outputs
Top Co-Authors

Avatar

Arup Kumar Dutta

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Sourjya Sarkar

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Sunil Kumar

Indian Institute of Technology Kharagpur

View shared research outputs
Researchain Logo
Decentralizing Knowledge