Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bidisha Sharma is active.

Publication


Featured researches published by Bidisha Sharma.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Sonority Measurement Using System, Source, and Suprasegmental Information

Bidisha Sharma; S. R. Mahadeva Prasanna

Sonorant sounds are characterized by regions with a prominent formant structure, high energy, and high degree of periodicity. In this work, the vocal-tract system, excitation source, and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of the numerator of the group-delay function. It is derived from a zero-time-windowed speech signal that provides a better resolution of the formants. A 5-D feature set is computed from the estimated formants to measure the prominence of the spectral peaks. A feature representing strength of excitation is derived from the Hilbert envelope of linear prediction residual, which represents the source information. Correlation of speech over ten consecutive pitch periods is used as the suprasegmental feature representing periodicity information. The combination of evidence from the three different aspects of speech provides a better discrimination among different sonorant classes, compared to the baseline mel frequency cepstral coefficient features. The usefulness of the proposed sonority feature is demonstrated in the tasks of phoneme recognition and sonorant classification.


ieee region 10 conference | 2015

Development of Assamese Text-to-speech synthesis system

Bidisha Sharma; Nagaraj Adiga; S. R. Mahadeva Prasanna

This paper presents the design and development of Assamese Text to speech (TTS) synthesis system. In particular, work focused on designing language specific rules, developing quality database, data segmentation, and to handle bilingual sound units. In Assamese language, till now no study is done to construct the grapheme to phoneme conversion rules. In this work, grapheme to phoneme conversion rules are proposed for Assamese language. The database is recorded by checking the speaking rate, variation in amplitude level, dc wandering, and clipping during data collection. A significant improvement in the synthesized voice is observed by ensuring uniform speaking rate, controlling variation in the signal amplitude level, and avoiding dc wandering and clipping during data collection. A semi-automatic segmentation approach is developed for data segmentation. Initially, segmentation is done by automatic process and later manual correction of segmentation boundaries is done to improve quality and intelligibility. It also reduce time required for the segmentation process. The developed TTS can work in bilingual mode. It can switch between Assamese and English language smoothly and maintains the sentence level intonation even for mixed texts.


Iete Technical Review | 2017

Polyglot Speech Synthesis: A Review

Bidisha Sharma; S. R. Mahadeva Prasanna

ABSTRACT The term polyglot speech synthesis refers to the process of producing speech in multiple languages and single speakers voice from a single text-to-speech synthesis (TTS) system. This report reviews existing efforts in the literature to develop a polyglot TTS. Different methods described in this review mainly focus on developing a natural, intelligible, and cost-effective TTS system for multilingual text input. Since multilingual text is becoming very common in all applications of TTS, recent focus is made on developing a cost-effective polyglot TTS system, instead of conventional monolingual TTS. This review also discusses the pros and cons of different methods and mentions possible directions to overcome the limitations.


ieee region 10 conference | 2015

Exploration of vowel onset and offset points for hybrid speech segmentation

Biswajit Dev Sarma; Bidisha Sharma; S. Aswin Shanmugam; S. R. Mahadeva Prasanna; Hema A. Murthy

Automatic segmentation of speech using embedded reestimation of monophone hidden Markov models (HMMs) followed by forced alignment may not give accurate boundaries. Group delay (GD) processing for refining the boundaries at the syllable level is attempted earlier. This paper aims at exploring vowel onset point (VOP) and vowel offset or end point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, however may not detect the exact boundary. In case of VOPs and VEPs, spurious rate or miss rate can be there, but detected boundaries are more accurate. Combining both HMM and VOP/VEP gives improvement in terms of log likelihood scores of forced aligned phoneme boundaries. HMM boundaries are corrected using VOP/VEP and model parameters are reestimated at the syllable level. Results are compared with that of GD based correction and found that overall performance is comparable. Performance for vowels is found to be higher than that of GD based refinement as the refinement in this case is mainly at the vowel boundaries. HMM based speech synthesis systems (HTS) are developed using phone as a basic unit with the proposed segmentation method. Subjective evaluation indicates that there is an improvement in the quality of synthesis.


ieee india conference | 2015

Improvement of syllable based TTS system in assamese using prosody modification

Bidisha Sharma; S. R. Mahadeva Prasanna

A Unit Selection based Synthesis (USS) system selects appropriate units from the inventory based on target cost and join cost. For a syllable based USS system, it is not practically feasible to cover all possible syllables in a language with all possible contexts. Hence, some discontinuity is observed at the concatenation points in the synthesized speech due to unavailability of units with proper target specifications in the inventory. In this work, we are using prosody modification of the syllables before concatenation to match fundamental frequency (F0) and intensity of adjacent syllables. This approach is also helpful to build a small footprint syllable level USS system, where many examples of the basic units with varying prosodical and spectral features is not available. The discontinuity in pitch and intensity at the concatenation point is reduced after prosody modification. Moreover, naturalness and intelligibility of synthesized speech is found to be improved using the proposed method.


ieee india conference | 2014

Faster prosody modification using time scaling of epochs

Bidisha Sharma; S. R. M. Prasanna

The objective of this work is to propose a simple and faster method for prosody modification that avoids complex procedure of deriving the modified epoch sequence. The proposed method processes the speech signal by zero frequency filtering (ZFF) to extract epochs. The extracted epochs are time scaled using the knowledge of given prosody modification factor. The prosody modified (PM) speech is obtained by copying the speech signal samples around the original epochs using the knowledge of time scaled epochs. The computational advantage of the proposed method is due to not deriving any modified pitch markers. The time scaled epochs act only as guiding sequence for the construction of prosody modified speech. The experimental results of the proposed method are compared with earlier epoch based prosody modification methods and found to provide same perceptual quality, but with a faster process.


Speech Communication | 2018

Significance of sonority information for voiced/unvoiced decision in speech synthesis

Bidisha Sharma; S. R. Mahadeva Prasanna

Abstract The quality of synthesized speech obtained from statistical parametric speech synthesis (SPSS) significantly relies on excitation source generation. Voiced/unvoiced decision is an essential component for generation of excitation source. It is obtained from fundamental frequency and other excitation source evidence in the existing literature. The discontinuity at the point of contact in the vocal-folds excites energy into the vocal-tract resulting voicing effect in the produced speech signal. The perceptual reflection of voicing over the sound produced is correlated with the sonority information which is related to less vocal-tract constriction and significant glottal vibration. Therefore, the possible variation in voicing with the change in supraglottal pressure due to vocal-tract constriction, rate of closing of vocal folds and regularity in structure of the signal are intact in the sonority associated with a sound unit. Voicing and degree of opening of vocal-tract are the two most effective correlates of sonority, that potentially contribute to the sonority hierarchy for sonorants and obstruents uniformly. Therefore, the voicing effect can be captured by the sonority measurement derived from system, source and suprasegmental information in the speech signal. In this work, a novel voiced/unvoiced decision method using sonority information is proposed and integrated in the SPSS framework for generation of excitation source. It leads to better voicing decision compared to the existing methods resulting in synthesized speech of improved quality, which is assured from objective and subjective analysis.


International Journal of Speech Technology | 2018

Significance of duration modification for speaker verification under mismatch speech tempo condition

Rohan Kumar Das; Bidisha Sharma; S. R. Mahadeva Prasanna

This work explores the scope of duration modification for speaker verification (SV) under mismatch speech tempo condition. The SV performance is found to depend on speaking rate of a speaker. The mismatch in the speaking rate can degrade the performance of a system and is crucial from the perspective of deployable systems. In this work, an analysis of SV performance is carried out by varying the speaking rate of train and test speech. Based on the studies, a framework is proposed to compensate the mismatch in speech tempo. The framework changes the duration of test speech in terms of speaking rate according to the derived mismatch factor between train and test speech. This in turn matches speech tempo of the test speech to that of the claimed speaker model. The proposed approach is found to have significant impact on SV performance while comparing the performance under mismatch conditions. A set of practical data having mismatch in speech tempo is also used to cross-validate the framework.


national conference on communications | 2017

Pause insertion in assamese synthesized speech using speech specific features

Bidisha Sharma; S. R. Mahadeva Prasanna

The research in the area of text-to-speech synthesis is going forward to achieve more naturalness in the synthesized speech. Pause prediction from the text to be synthesized plays a vital role in achieving naturalness. From the perspective of speech production, some speech specific features before and after the pause may be coordinated with the pause prediction process. Based on this hypothesis, pattern of features, namely, the modulation spectrum energy, the strength of excitation, and peak-to-dip ratio from smoothed Hilbert envelope of linear prediction residual are analyzed relative to presence or absence of pause, at word junctures from manually pause marked database. While most of the existing works rely only on linguistic aspects for predicting pause position, in this work, support vector machines (SVMs) are trained using both speech based features and linguistic features to predict position of pause at word junctures. The accuracy of pause prediction method is improved to 96.57% by adding speech based evidences, while the prediction accuracy is 90.07% when only semantic features are used. The same SVM classifier is used for pause insertion in the synthesized speech. For this initially speech is synthesized without any pause prediction, from which signal based features are derived at each word juncture. Based on the previously trained classifier output for these features, pauses are inserted in the synthesized speech. Subjective evaluation shows improvement in naturalness and intelligibility of synthesized speech after using proposed pause insertion method.


IEEE Signal Processing Letters | 2017

Enhancement of Spectral Tilt in Synthesized Speech

Bidisha Sharma; S. R. Mahadeva Prasanna

The research in statistical parametric speech synthesis is towards improving naturalness and intelligibility. In this work, the deviation in spectral tilt of the natural and synthesized speech is analyzed and observed a large gap between the two. Furthermore, the same is analyzed for different classes of sounds, namely low-vowels, mid-vowels, high-vowels, semi-vowels, nasals, and found to be varying with category of sound units. Based on variation, a novel method for spectral tilt enhancement is proposed, where the amount of enhancement introduced is different for different classes of sound units. The proposed method yields improvement in terms of intelligibility, naturalness, and speaker similarity of the synthesized speech.

Collaboration


Dive into the Bidisha Sharma's collaboration.

Top Co-Authors

Avatar

S. R. Mahadeva Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Nagaraj Adiga

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

S. R. M. Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Biswajit Dev Sarma

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Deepshikha Mahanta

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Hema A. Murthy

Indian Institute of Technology Madras

View shared research outputs
Top Co-Authors

Avatar

Loitongbam Gyanendro Singh

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Priyankoo Sarmah

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Rohan Kumar Das

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

S. Aswin Shanmugam

Indian Institute of Technology Madras

View shared research outputs
Researchain Logo
Decentralizing Knowledge