Mousmita Sarma
Gauhati University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mousmita Sarma.
Applied Soft Computing | 2013
Mousmita Sarma; Kandarpa Kumar Sarma
Initial phoneme is used in spoken word recognition models. These are used to activate words starting with that phoneme in spoken word recognition models. Such investigations are critical for classification of initial phoneme into a phonetic group. A work is described in this paper using an artificial neural network (ANN) based approach to recognize initial consonant phonemes of Assamese words. A self organizing map (SOM) based algorithm is developed to segment the initial phonemes from its word counterpart. Using a combination of three types of ANN structures, namely recurrent neural network (RNN), SOM and probabilistic neural network (PNN), the proposed algorithm proves its superiority over the conventional discrete wavelet transform (DWT) based phoneme segmentation. The algorithm is exclusively designed on the basis of Assamese phonemical structure which consists of certain unique features and are grouped into six distinct phoneme families. Before applying the segmentation approach using SOM, an RNN is used to take some localized decision to classify the words into six phoneme families. Next the SOM segmented phonemes are classified into individual phonemes. A two-class PNN classification is performed with clean Assamese phonemes, to recognize the segmented phonemes. The validation of recognized phonemes is checked by matching the first formant frequency of the phoneme. Formant frequency of Assamese phonemes, estimated using the pole or formant location determination from the linear prediction model of vocal tract, is used effectively as a priori knowledge in the proposed algorithm.
soft computing | 2015
Chayashree Patgiri; Mousmita Sarma; Kandarpa Kumar Sarma
Abstract In this work, a class of neuro-computational classifiers are used for classification of fricative phonemes of Assamese language. Initially, a Recurrent Neural Network (RNN) based classifier is used for classification. Later, another neuro fuzzy classifier is used for classification. We have used two different feature sets for the work, one using the specific acoustic-phonetic characteristics and another temporal attributes using linear prediction cepstral coefficients (LPCC) and a Self Organizing Map (SOM). Here, we present the experimental details and performance difference obtained by replacing the RNN based classifier with an adaptive neuro fuzzy inference system (ANFIS) based block for both the feature sets to recognize Assamese fricative sounds.
Archive | 2014
Mousmita Sarma; Kandarpa Kumar Sarma
The book discusses intelligent system design using soft computing and similar systems and their interdisciplinary applications. It also focuses on the recent trends to use soft computing as a versatile tool for designing a host of decision support systems.
2012 3rd National Conference on Emerging Trends and Applications in Computer Science | 2012
Mousmita Sarma; Kandarpa Kumar Sarma
Phonemes are the smallest distinguishable unit of speech signal. Segmentation of phoneme from its word counterpart is a fundamental and crucial part in speech processing since initial phoneme is used to activate words starting with that phoneme. This work describes an Artificial Neural Network (ANN) based algorithm developed for segmentation and classification of consonant phoneme of Assamese language. The algorithm uses weight vectors, obtained by training Self Organizing Map (SOM) with different number of iteration. Segments of different phonemes constituting the word whose LPC samples are used for training are obtained from SOM weights. A two class Probabilistic Neural Network (PNN) trained with clean Assamese phoneme is used to identify phoneme segment. The classification of phoneme segment is performed as per the consonant phoneme structure of Assamese language which consists of six phoneme families. Experimental results establish the superiority of the SOM-based segmentation over the speaker independent phoneme segmentation reported till now including those obtained using Discrete Wavelet Transform (DWT).
ieee india conference | 2013
Biswajit Dev Sarma; Mousmita Sarma; Meghamallika Sarma; S. R. Mahadeva Prasanna
The phonetic engine is a system that performs speech signal to symbol transformation. This work describes some issues in the development of an Assamese Phonetic Engine (PE). International phonetic alphabet (IPA) is used as the phonetic unit to transcribe the speech database collected in three different modes, namely, reading, lecture and conversation modes. Only reading mode data is used for training and Hidden markov model (HMM) is used to model each phonetic unit without imposing any language or contextual constraint. The trained HMMs are used to derive a sequence of phonetic units from a test speech signal. Accuracy of 47.31%, 45.30% and 36.13% is achieved in reading, lecture and conversation mode, respectively. Confusion among the phonetic units specific to Assamese are discussed. Issues related to different recording modes, language and native speaker dependencies are discussed. The speech data is also collected in Hindi from three different sets of speakers to study speaker, language and native dependancies. Accuracy of 40.5%, 36.10% and 29.61% is achieved in native speaker dependent, native speaker independent and non-native speaker independent cases, respectively.
Journal of intelligent systems | 2013
Mousmita Sarma; Kandarpa Kumar Sarma
Abstract Vowel phonemes are a part of any acoustic speech signal. Vowel sounds occur in speech more frequently and with higher energy. Therefore, vowel phoneme can be used to extract different amounts of speaker discriminative information in situations where acoustic information is noise corrupted. This article presents an approach to identify a speaker using the vowel sound segmented out from words spoken by the speaker. The work uses a combined self-organizing map (SOM)- and probabilistic neural network (PNN)-based approach to segment the vowel phoneme. The segmented vowel is later used to identify the speaker of the word by matching the patterns with a learning vector quantization (LVQ)-based code book. The LVQ code book is prepared by taking features of clean vowel phonemes uttered by the male and female speakers to be identified. The proposed work formulates a framework for the design of a speaker-recognition model of the Assamese language, which is spoken by ∼3 million people in the Northeast Indian state of Assam. The experimental results show that the segmentation success rates obtained using a SOM-based technique provides an increase of at least 7% compared with the discrete wavelet transform-based technique. This increase contributes to the improvement in overall performance of speaker identification by ∼3% compared with earlier related works.
soft computing | 2012
Mousmita Sarma; Kandarpa Kumar Sarma
In spoken word recognition, one of the crucial points is to identify the vowel phonemes. This paper describes an Artificial Neural Network (ANN) based algorithm developed for the segmentation and recognition of the vowel phonemes of Assamese language from some words containing those vowels. Self-Organizing Map (SOM) trained with a various number of iterations is used to segment the word into its constituent phonemes. Later, Probabilistic Neural Network (PNN) trained with clean vowel phonemes is used to recognize the vowel segment from the six different SOM segmented phonemes. One of the important aspects of the proposed algorithm is that it proves the validation of the recognized vowel by checking its first formant frequency. The first formant frequency of all the Assamese vowels is predetermined by estimating pole or formant location from the linear prediction (LP) model of the vocal tract. The proposed algorithm shows a high recognition performance in comparison to the conventional Discrete Wavelet Transform (DWT) based segmentation.
FICTA (1) | 2015
Munmi Dutta; Chayashree Patgiri; Mousmita Sarma; Kandarpa Kumar Sarma
This paper presents an Artificial Neural Network (ANN) based algorithm design to identify speakers of specific dialect using features obtained from various speaker dependent parameters of voiced speech. It is evident that speakers can be identified from their voiced sounds which have higher energy. Voice sounds are extracted from continuous speech signal from a set of trained male and female speakers. Here, feature vectors are generated from the speaker specific characteristics like pitch, linear prediction (LP) residual and empirical mode decomposition (EMD) residual of the speech. Using these feature vectors, three different ANN classifiers are designed using Multilayer Perceptron (MLP) and Recurrent Neural Network (RNN) to identify the speakers along with the dialect of the speaker. From the experiment, it is found that a hybrid classifier designed by combining all three classifiers correctly identifies more than 90% of the enrolled speakers.
international symposium on neural networks | 2013
Mousmita Sarma; Kandarpa Kumar Sarma
This paper presents a neural model of speaker identification using the vowel sound segmented out from words spoken by a speaker. Vowel sounds occur in a speech more frequently and with higher energy. Therefore, situations where acoustic information is noise corrupted vowel sounds can be used to extract different amounts of speaker discriminative information. The model explained here uses a neural framework formed with Probabilistic Neural Network (PNN) and Learning Vector Quantization (LVQ) where a novel Self Organizing Map (SOM) based vowel segmentation technique is used. The work extracts glottal source information of the speakers by Empirical-Mode Decomposition (EMD) of the speech signal and depending on which a LVQ based speaker code book is formed. The work shows the use of residual signal obtained from EMD of speech as a speaker discriminative feature. The neural approach of speaker identification gives superior performance in comparison to the conventional statistical approach like Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs) etc. found in literature. The work formulates a framework for the design of a ANN based speaker recognition model for Assamese language which is spoken by around three million people in the North East Indian state of Assam. Although the proposed model has been experimented in case of the speakers of Assamese language, it shall also be suitable for other Devanagari based languages for which the speaker database should contain samples of that specific language.
international conference on computer and communication technology | 2010
Mousmita Sarma; Krishna Dutta; Kandarpa Kumar Sarma
The quality and details captured in speech corpus directly affects the precision of performance in an Automatic Speech Recognition (ASR) system. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC Cepstrum, as a part of an Artificial Neural Network (ANN) based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language-a major language in the North Eastern part of India. The paper describes the use of an adaptive filter configured as a pre-emphasis block for generation of LPC-Cepstrum feature to apply in an ANN-based Speech Recognition System for Assamese language.