Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Biswajit Dev Sarma is active.

Publication


Featured researches published by Biswajit Dev Sarma.


national conference on communications | 2013

Assamese spoken query system to access the price of agricultural commodities

Syed Shahnawazuddin; Deepak Thotappa; Biswajit Dev Sarma; A. Deka; S. R. M. Prasanna; Rohit Sinha

In this work, a spoken query system developed for accessing the price of agricultural commodities in Assamese language is described. The developed system enables the user to access the latest price of the commodity by calling the system using a landline/mobile phone. The spoken query system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules which are developed using open source resources. For the development of ASR models, the task specific speech data is collected from different dialect regions of Assam. The issues in test data adaptation are highlighted and a constrained data-unseen speaker adaptation approach is implemented which is found to give a relative improvement by 8% in baseline performance.


ieee india conference | 2013

Development of Assamese Phonetic Engine: Some issues

Biswajit Dev Sarma; Mousmita Sarma; Meghamallika Sarma; S. R. Mahadeva Prasanna

The phonetic engine is a system that performs speech signal to symbol transformation. This work describes some issues in the development of an Assamese Phonetic Engine (PE). International phonetic alphabet (IPA) is used as the phonetic unit to transcribe the speech database collected in three different modes, namely, reading, lecture and conversation modes. Only reading mode data is used for training and Hidden markov model (HMM) is used to model each phonetic unit without imposing any language or contextual constraint. The trained HMMs are used to derive a sequence of phonetic units from a test speech signal. Accuracy of 47.31%, 45.30% and 36.13% is achieved in reading, lecture and conversation mode, respectively. Confusion among the phonetic units specific to Assamese are discussed. Issues related to different recording modes, language and native speaker dependencies are discussed. The speech data is also collected in Hindi from three different sets of speakers to study speaker, language and native dependancies. Accuracy of 40.5%, 36.10% and 29.61% is achieved in native speaker dependent, native speaker independent and non-native speaker independent cases, respectively.


ieee india conference | 2013

Analysis of spurious vowel-like regions (VLRs) detected by excitation source information

Biswajit Dev Sarma; S. R. M. Prasanna

This work treats vowels and semivowels as vowellike regions. An analysis of the spurious vowel-like regions (VLRs) detected by a signal processing based method using excitation source information is demonstrated. Limitation of excitation information in detecting some of the nasals and voiced consonants as non-VLRs is discussed. An attempt to reduce spurious VLRs compared to the existing signal processing based method for VLRs detection [1] is made. A multi-class statistical phone classifier that classifies speech into broad vowel, consonant and silence categories is trained. The outputs of the classifier are suitably combined to get evidence for vowel-like regions, different broad categories of consonants and silence regions. The output from the existing signal processing method is compared with different evidences from the statistical method. The spurious ones are eliminated by using the evidences from the statistical method. The experimental studies conducted on TIMIT and inhouse databases demonstrate significant reduction in the spurious VLRs with a little loss in the VLRs detection performance. A net gain of 4.21% and 7.71% in frame error rate is achieved for TIMIT and in-house databases, respectively.


international conference on signal processing | 2014

Improved Vowel Onset and offset points detection using bessel features

Biswajit Dev Sarma; S Supreeth Prajwal; S. R. Mahadeva Prasanna

This work presents a method for improving accuracy of Vowel Onset Point (VOP) and Vowel End Point (VEP) detection in continuous speech. VOP and VEP are the instants at which the onset and offset of vowel takes place, respectively, during speech production. Speech signal is represented using Bessel functions with their damped sinusoid-like basis functions. Bessel expansion is used to emphasize the vowel regions by appropriate consideration of the range of Bessel coefficients. Bandpass filtered narrow-band signal is modeled as a monocomponent amplitude modulated-frequency modulated (AM-FM) signal. The amplitude envelope (AE) function of this vowel emphasized AM-FM signal gives strong evidence for the VOP and VEP. This evidence after adding with some of the existing evidences having source and system information, increases the detection rate as well as the accuracy of detection.


ieee india conference | 2014

Exploration of Deep Belief Networks for Vowel-like regions detection

Banriskhem K. Khonglah; Biswajit Dev Sarma; S. R. M. Prasanna

This work explores Deep Belief Networks (DBN) for the task of detecting Vowel-like regions (VLRs). Vowels and semivowels are considered as VLRs. By using vocal tract features at the input layer of DBN, we extract an evidence for VLRs by transforming the vocal tract features through multiple non-linear hidden layers. The linear classifier is used to predict the class of evidence, i.e.,whether it is VLR or not. The DBN method is then combined with excitation source (ES) based method for VLRs detection. Even though DBN method provides comparable performance with the existing methods, the combination provides improved performance confirming the different way of modeling VLR information in the DBN.


IEEE Signal Processing Letters | 2014

Analysis of Vocal Tract Constrictions using Zero Frequency Filtering

Biswajit Dev Sarma; S. R. Mahadeva Prasanna

This work proposes evidence using zero frequency filtering (ZFF) that gives an approximate measure of vocal tract constriction in terms of the low frequency component present in the speech signal. The vocal tract is completely closed in the case of voice bars and nasals and is wide open for low vowels. Intermediate cases are for high vowels, semivowels, laterals, voiced fricatives and other sounds. Vocal tract constriction affects the spectrum by reducing the first formant and attenuating the amplitude of the spectrum. The attenuation is relatively high in higher frequencies resulting in an increase in the low frequency component. The proposed method exploits the sinusoid like nature of ZFF signal (ZFFS) to obtain the evidence. Epoch synchronous analysis is performed and the ZFFS between successive epochs is compared with the corresponding speech segment using a cosine kernel. The low frequency dominant voiced regions match closely with ZFFS as compared to other regions and hence give higher value. This evidence when used as a feature gives relatively higher performance for the constricted phones in an HMM-based phoneme recognizer.


Speech Communication | 2017

Consonant-vowel unit recognition using dominant aperiodic and transition region detection

Biswajit Dev Sarma; S. R. Mahadeva Prasanna; Priyankoo Sarmah

Abstract This work reports a method of Consonant-Vowel (CV) unit recognition by detecting the Dominant Aperiodic component Regions (DARs) and by predicting the Duration of Transition Regions (DTRs) in speech. DAR detection is performed using complementary information from source and vocal tract. While source information is extracted using sub-fundamental frequency filtering of speech, vocal tract information is extracted using a) Dominant Resonant Frequency (DRF) and b) High to Low Frequency component Ratio (HLFR), computed from Hilbert envelope of Numerator Group Delay (HNGD) spectrum of zero-time windowed signal. The DTR is predicted by using vocal tract constriction information. Subsequently, detected DARs and predicted DTRs are compared with manually marked regions and finally used for CV unit recognition of Indian languages. Conventionally, CV unit recognition is performed by anchoring the Vowel Onset Point (VOP) and assuming fixed durations for transition and consonant regions on either side of the VOP. However, in speech, the duration of transition and consonantal regions vary depending on the type of consonants and vowels. In the proposed method, the use of dynamic values for consonant duration and transition regions have resulted in better consonant recognition improving CV unit recognition.


Archive | 2015

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

Biswajit Dev Sarma; Mousmita Sarma; S. R. M. Prasanna

Syllables play an important role in speech synthesis and recognition. Prosodic information is embedded into syllable units of speech. Here we present a method for semi-automatic syllable labelling of Assamese speech utterances using Hidden Markov Models (HMMs) and vowel onset-offset points. Semi-automatic syllable labelling means syllable labelling of the speech signal when transcription or the text corresponding to the speech file is provided. HMM models for 15 broad classes of phone is built. Time label of the transcription is obtained by the forced alignment procedure using the HMM models. A parser is used to convert the word transcription to syllable transcription using certain syllabification rules. This syllable transcription and the time label of the phones are used to get the time label of the syllables. Now the syllable labelling output is refined using the knowledge of vowel onset point and vowel offset point derived from the speech signal using different signal processing techniques. This refinement gives improvement in terms of both syllable detection as well as average deviation in the syllable onset and offset.


Iete Technical Review | 2018

Acoustic–Phonetic Analysis for Speech Recognition: A Review

Biswajit Dev Sarma; S. R. Mahadeva Prasanna

ABSTRACT This paper reviews the literature related to the acoustic–phonetic analysis of speech and the speech recognition approaches that use these types of knowledge. At first, acoustic–phonetic cues that are important for recognition of different sound units are presented. This include description of the acoustic–phonetic events, literature related to analysis and automatic detection of the events, and significance of the events in automatic speech recognition. Next, different speech recognition approaches are discussed and the literature related to the use of acoustic–phonetic knowledge by these approaches are reviewed. Finally, different approaches are compared and a framework suitable for recognition of phones present in syllable-like units is proposed.


ieee region 10 conference | 2015

Exploration of vowel onset and offset points for hybrid speech segmentation

Biswajit Dev Sarma; Bidisha Sharma; S. Aswin Shanmugam; S. R. Mahadeva Prasanna; Hema A. Murthy

Automatic segmentation of speech using embedded reestimation of monophone hidden Markov models (HMMs) followed by forced alignment may not give accurate boundaries. Group delay (GD) processing for refining the boundaries at the syllable level is attempted earlier. This paper aims at exploring vowel onset point (VOP) and vowel offset or end point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, however may not detect the exact boundary. In case of VOPs and VEPs, spurious rate or miss rate can be there, but detected boundaries are more accurate. Combining both HMM and VOP/VEP gives improvement in terms of log likelihood scores of forced aligned phoneme boundaries. HMM boundaries are corrected using VOP/VEP and model parameters are reestimated at the syllable level. Results are compared with that of GD based correction and found that overall performance is comparable. Performance for vowels is found to be higher than that of GD based refinement as the refinement in this case is mainly at the vowel boundaries. HMM based speech synthesis systems (HTS) are developed using phone as a basic unit with the proposed segmentation method. Subjective evaluation indicates that there is an improvement in the quality of synthesis.

Collaboration


Dive into the Biswajit Dev Sarma's collaboration.

Top Co-Authors

Avatar

S. R. Mahadeva Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

S. R. M. Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Priyankoo Sarmah

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Abhishek Dey

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rohit Sinha

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

A. Deka

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Syed Shahnawazuddin

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Banriskhem K. Khonglah

Indian Institute of Technology Guwahati

View shared research outputs
Researchain Logo
Decentralizing Knowledge