Bhavik Vachhani
Harvard University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bhavik Vachhani.
international conference on acoustics, speech, and signal processing | 2014
Nirmesh J. Shah; Bhavik Vachhani; Hardik B. Sailor; Hemant A. Patil
In this paper, use of Viterbi-based algorithm and spectral transition measure (STM)-based algorithm for the task of speech data labeling is being attempted. In the STM framework, we propose use of several spectral features such as recently proposed cochlear filter cepstral coefficients (CFCC), perceptual linear prediction cepstral coefficients (PLPCC) and RelAtive SpecTrAl (RASTA)-based PLPCC in addition to Mel frequency cepstral coefficients (MFCC) for phonetic segmentation task. To evaluate effectiveness of these segmentation algorithms, we require manual accurate phoneme-level labeled data which is not available for low resourced languages such as Gujarati (one of the official languages of India). In order to measure effectiveness of various segmentation algorithms, HMM-based speech synthesis system (HTS) for Gujarati has been built. From the subjective and objective evaluations, it is observed that Viterbi-based and STM with PLPCC-based segmentation algorithms work better than other algorithms.
international conference on asian language processing | 2012
Hemant A. Patil; Maulik C. Madhavi; Kewal D. Malde; Bhavik Vachhani
This paper addresses phonetic transcription related issues in Gujarati and Marathi (Indian Languages). Some adhoc approaches to fix relationship between the general alphabetical symbols and phonetic symbols may not always work. Hence, some research issues like ambiguity between frication and aspirated plosive are addressed in this paper. The anusvara in both of these languages are produced based on the immediate following consonant. Implication for this finding for the problem of phonetic transcription is presented. Furthermore, the effect of dialectal variations on phonetic transcription is also analyzed for Marathi. Finally, some examples of phonetic transcription for sentences of these two languages are presented.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Hemant A. Patil; Tanvina B. Patel; Swati Talesara; Nirmesh J. Shah; Hardik B. Sailor; Bhavik Vachhani; Janki Akhani; Bhargav Kanakiya; Yashesh Gaur; Vibha Prajapati
Text-to-speech (TTS) synthesizer has been an effective tool for many visually challenged people for reading through hearing feedback. TTS synthesizers build through the festival framework requires a large speech corpus. This corpus needs to be labeled. The labeling can be done at phoneme-level or at syllable-level. TTS systems are mostly available in English, however, it has been observed that people feel more comfortable in hearing their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built. As Indian languages are syllabic in nature, syllable is taken as the basic speech sound unit. In building the unit selection-based Gujarati TTS system, one requires large Gujarati labeled corpus. The task of labeling is manual, most time-consuming and tedious. Therefore, in this work, an attempt has been made to reduce these efforts by automatically generating almost accurate labeled speech corpus at syllable-level. To that effect, group delay-based segmentation, spectral transition measure (STM)-based and Gaussian filter-based methods are presented and their performances are compared. It has been observed that percentage of correctness of labeled data is around 83 % for both male and female voice as compared to 70 % for group delay-based labeling and 78 % for STM-based labeling. In addition, the systems built by labeled files generated from above methods were evaluated by a visually challenged subject. The word correctness rate is increased by 5 % (3 %) and 10 % (12 %) for Gaussian filter-based TTS system as compared to group delay-based TTS and Spectral Transition Measure (STM)-based system built on female (male) voice. Similarly, there is an overall reduction in the word error rate (WER) of Gaussian-based approach of 8% (2%) and 6% (-5%) as compared to group delay-based TTS and Spectral Transition Measure (STM)-based system built on female (male) voice.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Kewal D. Malde; Bhavik Vachhani; Maulik C. Madhavi; Nirav H. Chhayani; Hemant A. Patil
There have been growing interest to use speech technology for rural areas. In this context, this paper describes the development of speech corpora in Indian languages (viz., Gujarati and Marathi from remote villages) for the task of phonetic transcription. This paper also presents related analysis of phonetic transcription. The manual phonetic transcription was done for two Indian languages, viz., Gujarati and Marathi for 8 hours of field recorded speech data in real-life settings. Dialectal variations are also analyzed using spectrograms and phonetic transcription. In addition, it was found that for consonant sounds, plosive sounds are having large coverage in broad phonetic category. The collected speech corpora can be very useful for speech and speaker recognition tasks.
international conference on asian language processing | 2013
Bhavik Vachhani; Hemant A. Patil
Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.
european signal processing conference | 2015
Maulik C. Madhavi; Hemant A. Patil; Bhavik Vachhani
Obstruents are very important acoustical events (i.e., abrupt-consonantal landmarks) in the speech signal. This paper presents the use of novel Spectral Transition Measure (STM) to locate the obstruents in the continuous speech signal. The problem of obstruent detection involves detection of phonetic boundaries associated with obstruent sounds. In this paper, we propose use of STM information derived from state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) feature set and newly developed feature set, viz., MFCC-TMP (which uses Teager Energy Operator (TEO) to exploit implicitly Magnitude and Phase information in the MFCC framework) for obstruent detection. The key idea here is to exploit capabilities of STM to capture high dynamic transitional characteristics associated with obstruent sounds. The experimental setup is developed on entire TIMIT database. For 20 ms agreement (tolerance) duration, obstruent detection rate is found to be 97.59 % with 17.65 % false acceptance using state-of-the-art MFCC-STM and 96.42 % with 12.88 % false acceptance using MFCC-TMP-STM. Finally, STM-based features along with static representation (i.e., MFCC-STM and MFCC-TMP-STM) are evaluated for phone recognition task.
international conference on asian language processing | 2014
Bhavik Vachhani; Kewal D. Malde; Maulik C. Madhavi; Hemant A. Patil
Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach. In this paper, we propose use of state-of-the-art Mel Frequency Cepstral Coefficients (MFCC)-based features to capture spectral transition for obstruent detection task. It is expected more spectral transition in the vicinity of obstruents. The entire experimental setup is developed on TIMIT database. The detection efficiency and estimated probability are around 77 % and 0.77 respectively (with 30 ms agreement duration and 0.4 STM threshold).
international conference on acoustics, speech, and signal processing | 2017
Chitralekha Bhat; Bhavik Vachhani; Sunil Kumar Kopparapu
Dysarthria is a motor speech impairment, often characterized by speech that is generally indiscernible by human listeners. Assessment of the severity level of dysarthria provides an understanding of the patients progression in the underlying cause and is essential for planning therapy, as well as improving automatic dysarthric speech recognition. In this paper, we propose a non-linguistic manner of automatic assessment of severity levels using audio descriptors or a set of features traditionally used to define timbre of musical instruments and have been modified to suit this purpose. Multitapered spectral estimation based features were computed and used for classification, in addition to the audio descriptors for timbre. An Artificial Neural Network (ANN) was trained to classify speech into various severity levels within Universal Access dysarthric speech corpus and the TORGO database. An average classification accuracy of 96.44% and 98.7% was obtained for UA speech corpus and TORGO database respectively.
international conference on speech and computer | 2016
Chitralekha Bhat; Bhavik Vachhani; Sunil Kumar Kopparapu
Dysarthria is a motor speech disorder, characterized by slurred or slow speech resulting in low intelligibility. Automatic recognition of dysarthric speech is beneficial to enable people with dysarthria to use speech as a mode of interaction with electronic devices. In this paper we propose a mechanism to adapt the tempo of sonorant part of dysarthric speech to match that of normal speech, based on the severity of dysarthria. We show a significant improvement in recognition of tempo-adapted dysasrthic speech, using a Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM) recognition system as well as a Deep neural network (DNN) - HMM based system. All evaluations were done on Universal Access Speech Corpus.
european signal processing conference | 2016
Bhavik Vachhani; Chitralekha Bhat; Sunil Kumar Kopparapu
Robust phonetic segmentation is extremely important for several speech processing tasks such as phone level articulation analysis and error detection, speech synthesis, and annotation. In this paper, we present an unsupervised phonetic segmentation approach and its application to noisy and clipped speech such as mobile phone recordings. We propose a multi-taper-based Perceptual Linear Prediction (PLP) speech processing front-end, together with Spectral Transition Measure (STM) and a novel post-processing technique, to improve over the baseline STM technique. Performance of the proposed technique has been evaluated using precision, recall and F-score measures. Experimental results show an absolute improvement of 11% for TIMIT and 18% for Hindi speech data (clean) over the baseline approach. Significant improvement in phonetic segmentation was observed for noisy speech - simulated as well as mobile phone recordings.
Collaboration
Dive into the Bhavik Vachhani's collaboration.
Dhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputs