Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Basil Abraham is active.

Publication


Featured researches published by Basil Abraham.


spoken language technology workshop | 2014

A data-driven phoneme mapping technique using interpolation vectors of phone-cluster adaptive training

Basil Abraham; Neethu Mariam Joy; Navneeth K. S. Umesh

One of the major problems in acoustic modeling for a low-resource language is data sparsity. In recent years, cross-lingual acoustic modeling techniques have been employed to overcome this problem. In this paper we propose multiple cross-lingual techniques to address the problem of data insufficiency. The first method, which we call as the cross-lingual phone-CAT, uses the principles of phone-cluster adaptive training (phone-CAT), where the parameters of context-dependent states are obtained by linear interpolation of monophone cluster models. The second method uses the interpolation vectors of phone-CAT, which is known to capture the phonetic context information, to map phonemes between two languages. Finally, the data-driven phoneme-mapping technique is incorporated into the cross-lingual phone-CAT, to obtain what we call as the phoneme-mapped cross-lingual phone-CAT. The proposed techniques are employed in acoustic modeling of three Indian languages namely Bengali, Hindi and Tamil. The phoneme-mapped cross-lingual phone-CAT gave relative improvements of 15.14% for Bengali, 16.4% for Hindi and 11.3% for Tamil over the conventional cross-lingual subspace Gaussian mixture model (SGMM) in low-resource scenario.


national conference on communications | 2014

Cross-lingual acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models

Neethu Mariam Joy; Basil Abraham; K Navneeth; S. Umesh

Cross-lingual acoustic modeling using Subspace Gaussian Mixture Model for low-resource languages of Indian origin is investigated. Building acoustic model for a low-resource language with limited vocabulary by leveraging resources from another language with comparatively larger resources was focused upon. Experiments were done on Bengali and Tamil corpus from MANDI database, with Tamil having greater resources than Bengali. We observed that the word accuracy of cross-lingual acoustic model of Bengali was approximately 2.5% above its CDHMM model and gave equivalent performance as its monolingual SGMM model.


Speech Communication | 2017

An automated technique to generate phone-to-articulatory label mapping

Basil Abraham; S. Umesh

Recent studies have shown that in the case of under-resourced languages, use of articulatory features (AF) emerging from an articulatory model results in improved automatic speech recognition (ASR) compared to conventional mel frequency cepstral coefficient (MFCC) features. Articulatory features are more robust to noise and pronunciation variability compared to conventional acoustic features. To extract articulatory features, one method is to take conventional acoustic features like MFCC and build an articulatory classifier that would output articulatory features (known as pseudo-AF). However, these classifiers require a mapping from phone to different articulatory labels (AL) (e.g., place of articulation and manner of articulation), which is not readily available for many of the under-resourced languages. In this article, we have proposed an automated technique to generate phone-to-articulatory label (phone-to-AL) mapping for a new target language based on the knowledge of phone-to-AL mapping of a well-resourced language. The proposed mapping technique is based on the center-phone capturing property of interpolation vectors emerging from the recently proposed phone cluster adaptive training (Phone-CAT) method. Phone-CAT is an acoustic modeling technique that belongs to the broad category of canonical state models (CSM) that includes subspace Gaussian mixture model (SGMM). In Phone-CAT, the interpolation vector belonging to a particular context-dependent state has maximum weight for the center-phone in case of monophone clusters or by the AL of the center-phone in case of AL clusters. These relationships from the various context-dependent states are used to generate a phone-to-AL mapping. The Phone-CAT technique makes use of all the speech data belonging to a particular context-dependent state. Therefore, multiple segments of speech are used to generate the mapping, which makes it more robust to noise and other variations. In this study, we have obtained a phone-to-AL mapping for three under-resourced Indian languages namely Assamese, Hindi and Tamil based on the phone-to-AL mapping available for English. With the generated mappings, articulatory features are extracted for these languages using varying amounts of data in order to build an articulatory classifier. Experiments were also performed in a cross-lingual scenario assuming a small training data set (ź 2źh) from each of the Indian languages with articulatory classifiers built using a lot of training data (ź 22źh) from other languages including English (Switchboard task). Interestingly, cross-lingual performance is comparable to that of an articulatory classifier built with large amounts of native training data. Using articulatory features, more than 30% relative improvement was observed over the conventional MFCC features for all the three languages in a DNN framework.


conference of the international speech communication association | 2016

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages.

Basil Abraham; S. Umesh; Neethu Mariam Joy

In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the lowresource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.


conference of the international speech communication association | 2016

DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data.

Neethu Mariam Joy; Murali Karthick Baskar; S. Umesh; Basil Abraham

In this paper, we propose the use of deep neural networks (DNN) as a regression model to estimate feature-space maximum likelihood linear regression (FMLLR) features from unnormalized features. During training, the pair of unnormalized features as input and corresponding FMLLR features as target are provided and the network is optimized to reduce the mean-square error between output and target FMLLR features. During test, the unnormalized features are passed through this DNN feature extractor to obtain FMLLR-like features without any supervision or first pass decode. Further, the FMLLR-like features are generated frame-by-frame, requiring no explicit adaptation data to extract the features unlike in FMLLR or ivector. Our proposed approach is therefore suitable for scenarios where there is little adaptation data. The proposed approach provides sizable improvements over basis-FMLLR and conventional FMLLR when normalization is done at utterance level on TIMIT and Switchboard-33hour data sets.


international conference on signal processing | 2016

Improved phone-cluster adaptive training acoustic model

Neethu Mariam Joy; S. Umesh; Basil Abraham; K Navneeth

Phone-cluster adaptive training (Phone-CAT) is a subspace based acoustic modeling technique inspired from cluster adaptive training (CAT) and subspace Gaussian mixture model (SGMM). This paper explores three extensions, viz., increasing phonetic subspace dimension, including sub-states and speaker subspace, to the basic Phone-CAT model to improve its recognition performance. The latter two extensions are similar in implementation as that of SGMM as both acoustic models share a similar subspace framework. But, since the phonetic subspace dimension of Phone-CAT is constrained to be equal to the number of monophones, the first extension is not straightforward to implement. We propose a Two-stage Phone-CAT model where we increase the phonetic subspace dimension to that of the number of monophone states. This model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone-CAT. Experiments done on 33-hour train subset of Switchboard database shows improvements in recognition performance of basic Phone-CAT model with the inclusion of the proposed extensions.


conference of the international speech communication association | 2016

Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition.

Basil Abraham; Srinivasan Umesh; Neethu Mariam Joy

Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set.


2016 Twenty Second National Conference on Communication (NCC) | 2016

Improved acoustic modeling of low-resource languages using shared SGMM parameters of high-resource languages

Neethu Mariam Joy; Basil Abraham; K Navneeth; S. Umesh

In this paper, we investigate methods to improve the recognition performance of low-resource languages with limited training data by borrowing subspace parameters from a high-resource language in subspace Gaussian mixture model (SGMM) framework. As a first step, only the state-specific vectors are updated using low-resource language, while retaining all the globally shared parameters from the high-resource language. This approach gave improvements only in some cases. However, when both state-specific and weight projection vectors are re-estimated with low-resource language, we get consistent improvement in performance over conventional monolingual SGMM of the low-resource language. Further, we conducted experiments to investigate the effect of different shared parameters on the acoustic model built using the proposed method. Experiments were done on the Tamil, Hindi and Bengali corpus of MANDI database. Relative improvement of 16.17% for Tamil, 13.74% for Hindi and 12.5% for Bengali, over respective monolingual SGMM were obtained.


conference of the international speech communication association | 2018

Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition.

Vishwas M. Shetty; Rini A. Sharon; Basil Abraham; Tejaswi Seeram; Anusha Prakash; Nithya Ravi; S. Umesh


conference of the international speech communication association | 2017

Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages.

Basil Abraham; S. Umesh; Neethu Mariam Joy

Collaboration


Dive into the Basil Abraham's collaboration.

Top Co-Authors

Avatar

Neethu Mariam Joy

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar

S. Umesh

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar

K Navneeth

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar

Srinivasan Umesh

Indian Institute of Technology Kanpur

View shared research outputs
Top Co-Authors

Avatar

Anusha Prakash

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar

Navneeth K. S. Umesh

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge