Tulika Basu
Centre for Development of Advanced Computing
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tulika Basu.
2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009
Joyanta Basu; Tulika Basu; Mridusmita Mitra; Shyamal Kr. Das Mandal
The automatic conversion of text to phoneme is a necessary step in all-current approaches to Text-to-Speech (TTS) synthesis and Automatic Speech Recognition System. This paper presents a methodology for Grapheme to Phoneme (G2P) conversion for Bangla based on orthographic rules. In Bangla G2P conversion sometimes depends not only on orthographic information but also on Parts of Speech (POS) information and semantics. This paper also addresses these issues along with their implementation methodology. The G2P conversion system of Bangla is tested on 1000 different types of Bangla sentences containing 9294 words. The percentage of correct conversion is 91.58% without considering the semantics and contextual POS with the exception table size of 333 words. If those errors which occur due to lack of exceptional words are considered, then the percentage of correct conversion will increase to 98%.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Hemant A. Patil; Tanvina B. Patel; Nirmesh J. Shah; Hardik B. Sailor; Raghava Krishnan; G. R. Kasthuri; T. Nagarajan; Lilly Christina; Naresh Kumar; Veera Raghavendra; S P Kishore; S. R. M. Prasanna; Nagaraj Adiga; Sanasam Ranbir Singh; Konjengbam Anand; Pranaw Kumar; Bira Chandra Singh; S L Binil Kumar; T G Bhadran; T. Sajini; Arup Saha; Tulika Basu; K. Sreenivasa Rao; N P Narendra; Anil Kumar Sao; Rakesh Kumar; Pranhari Talukdar; Purnendu Acharyaa; Somnath Chandra; Swaran Lata
In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.
international conference on asian language processing | 2012
Hemant A. Patil; Purushotam G. Radadia; Tulika Basu
One of the challenging and difficult problems under the category of Music Information Retrieval (MIR) is to identify a singer of a given song under the strong influence of instrumental sounds. The performance of Singer Identification (SID) system is also severely affected by the quality of recording devices, transmission channels and singing voice(s) of other singer(s). We have proposed a large database of 500 songs, prepared from Hindi Bollywood songs. The State-of-the-art Mel Frequency Cepstral Coefficients (MFCC) are used as feature vectors and 2nd order polynomial classifier is employed as a pattern classifier in our work. We also used Cepstral Mean Subtraction (CMS) based MFCC (CMSMFCC) feature vectors for SID and are found to give better results than the MFCC on proposed database. The SID accuracy for MFCC and CMSMFCC was found to be 75.75% and 84.5%, respectively and Equal Error Rate (EER) for MFCC and CMSMFCC was found to be 9.48% and 8.45%, respectively. While score-level-fusion of both gave improvement in SID accuracy and EER by 10.25% and 2.08% respectively than MFCC alone.
2012 International Conference on Speech Database and Assessments | 2012
Anal Haque Warsi; Tulika Basu; Keikichi Hirose; Hiroya Fujisaki
This study first examines the differences in the gross features of the fundamental frequency contour (the F0 contour) responsible for discriminating utterances of three sentence types, namely declarative, imperative and interrogative, in Bangla. In order to realize these differences in speech synthesis, these differences are then interpreted in terms of differences in the parameters of the command-response model for F0 contour generation. Finally, the results of model-based analysis were used to generate synthetic speech stimuli for a perceptual experiment in order to verify the results of analysis. The result of the experiment indicates that the synthesized F0 contours are quite satisfactory for the perception of utterances of three sentence types in Bangla, and thus can be successfully used in a concatenative Text-to-Speech System of Standard Colloquial Bangla (SCB) developed by C-DAC, Kolkata.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014
Soma Khan; Joyanta Basu; Tulika Basu; Milton Samirakshma Bepari; Madhab Pal; Rajib Roy
The Japanese-English aligned Basic Travel Expression Corpus (BTEC) has been used as a basic dataset for development of real-world Speech-to-Speech Translation (S2ST) systems in related prior studies. This paper presents a detailed statistical analysis on the Bengali translated BTEC text and its phonetic transcriptions for development of English-Bengali speech translation applications in travel domain. In different level of analysis hierarchy, the study focuses on the lexical and phonetical status of the analyzed corpus based on frequency spectrums, estimated population size, coverage ratio, goodness of fit of Large Number of Rare Events (LNRE) model and transition patterns. The experimental observations provide necessary insights on sufficiency of the analyzed corpus with respect to the travel domain as well as for building basic components of English-Bengali S2ST system.
2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) | 2016
Joyanta Basu; Soma Khan; Rajib Roy; Madhab Pal; Tulika Basu; Milton Samirakshma Bepari; Tapan Kumar Basu
Speaker Diarization task consists of inferring “who spoke when” in an audio stream without any prior knoniedge. It is an important task in audio processing and retrieval A concise overview of speaker diarization problem and available solutions are presented in this paper. Efforts have been given to summarize different approaches and practices to speaker diarization highlighting existing resources like toolkits and standard datasets, evaluation matrices, main application areas, and associated challenges. The study can serve as the basic material to provide necessary preliminary idea on the topic Though most of the related research iniiatives have been reported in several prior studies, this study will be useful to focus on the real-time application development perspective.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2015
Tulika Basu; Arup Saha; Somnath Chandra
The aim of the study is to experimentally verify the characteristics of Assamese consonants. All the previous work has been done subjectively. This study is the first attempt to objectively verify the characteristics of Assamese consonants. There are eight vowels, two semivowels and twenty one consonants in standard colloquial Assamese. Semivowels are excluded in this study. The study of place and manner of articulation is based on the Electropalatography system and acoustic data of the phoneme. Phonation process is determined by Electroglottograph. The study reveals that in Assamese there are contrasts in three distinct places of articulation: the lips, the alveolar ridge and the velum. It is also observed that alveolar plosives in context with vowel /u/ and /a/ show signature of retroflection.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Tulika Basu; Arup Saha
In speech synthesis the role of prosody is very crucial. To make the synthesized speech more natural and soothing to the human ears various prosody and intonation model together with emotional model have been experimented over last few decades. Apart from the segmental quality and voice characteristics, it depends mostly on the quality of the prosody model which is responsible for the naturalness of any TTS system. But as it is very hard to evaluate prosody model in an objective way, a perceptual comparison method is adopted in this work to evaluate prosody model.
ICFCE | 2012
Hemant A. Patil; Pallavi N. Baljekar; Tulika Basu
In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-of-the-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the 2nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database.
international conference on asian language processing | 2011
Prakhar Kant Jain; Robin Jain; Hemant A. Patil; Tulika Basu
A query-by-humming (QBH) system deals with retrieving the original song or music from the knowledge of its humming tune. In this paper, we present a novel Derivative Dynamic Time Warping (DDTW) based method for querying desired songs in Hindi (an Indian language) from a database by humming the tune. The system presented here use both intuitive as well as performance-specific criterion to process the query humming. In this paper, pitch contour extracted from the humming signal is used as source feature for all the experiments. The approach presented here can be exploited for improving performance of QBH system to index the music for fast retrieval. The results obtained using proposed DDTW approach are compared with standard Dynamic Time Warping (DTW) approach and are found to be better for smaller window size.
Collaboration
Dive into the Tulika Basu's collaboration.
Dhirubhai Ambani Institute of Information and Communication Technology
View shared research outputs