Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jainath Yadav is active.

Publication


Featured researches published by Jainath Yadav.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Vowel Onset Point Detection for Low Bit Rate Coded Speech

Anil Kumar Vuppala; Jainath Yadav; Saswat Chakrabarti; K. S. Rao

In this paper, we propose a method for detecting the vowel onset points (VOPs) for low bit rate coded speech. VOP is the instant at which the onset of the vowel takes place in the speech signal. VOP plays an important role for the applications, such as consonant-vowel (CV) unit recognition and speech rate modification. The proposed VOP detection method is based on the spectral energy present in the glottal closure region of the speech signal. Speech coders considered to carry out this study are Global System for Mobile Communications (GSM) full rate, code-excited linear prediction (CELP), and mixed-excitation linear prediction (MELP). TIMIT database and CV units collected from the broadcast news corpus are used for evaluation. Performance of the proposed method is compared with existing methods, which uses the combination of evidence from the excitation source, spectral peaks energy, and modulation spectrum. The proposed VOP detection method has shown significant improvement in the performance compared to the existing method under clean as well as coded cases. The effectiveness of the proposed VOP detection method is analyzed in CV recognition by using VOP as an anchor point.


international conference on contemporary computing | 2014

Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu

Parakrant Sarkar; Arijul Haque; Arup Kumar Dutta; Gurunath Reddy M; Harikrishna D M; Prasenjit Dhara; Rashmi Verma; Narendra N P; Sunil Kr. S B; Jainath Yadav; K. Sreenivasa Rao

This paper provides a design of prosody rule-set for transforming the neutral speech synthesized by Text-to-Speech (TTS) system to storytelling style speech. The objective of this work is to synthesize storyteller speech from the neutral TTS system for a given story text as input. In this work, neutral TTS refers to TTS system developed using Festival framework with neutral speech corpus. For generating storyteller speech from neutral TTS, we are proposing modifications to various prosodic parameters of neutral synthesized speech. In this work, the prosodic parameters considered for modification are (i) pitch contour, (ii) duration patterns, (iii) intensity patterns, (iv) pause patterns and (v) tempo. We have designed individual rule-sets for the above mentioned prosodic parameters, separately for three Indian languages Bengali, Hindi and Telugu. The rule-sets are designed carefully by analyzing the perceptual differences between synthesized neutral speech utterances and their respective natural (original) spoken utterances, narrated by a storyteller. The designed prosody rule-sets are evaluated using subjective listening tests. The results of the perceptual evaluation indicate that the designed prosody rule-sets play a significant role in achieving the story-specific style during conversion from neutral to storytelling style speech.


international conference on devices and communications | 2011

IITKGP-SEHSC : Hindi Speech Corpus for Emotion Analysis

Shashidhar G. Koolagudi; Ramu Reddy; Jainath Yadav; K. Sreenivasa Rao

In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.


national conference on communications | 2013

Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM

Manav Bhaykar; Jainath Yadav; K. Sreenivasa Rao

In this paper we have analysed emotion recognition performance in speaker dependent, text dependent, text independent, speaker independent, language dependent and cross language emotion recognition from speech. These studies were carried out using Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) as classification models. IITKGP-SESC and IITKGP-SEHSC emotional speech corpora are used for carried out these studies. The emotions considered in this study are anger, disgust, fear, happy, neutral, sarcastic, and surprise. Mel Frequency Cepstral Coefficients (MFCCs) features are used for identifying the emotions. Emotion recognition performance of speaker dependent mode is better than speaker independent and cross language modes. From the results it is observed that emotion recognition performance depends on the speaker and language.


international conference on contemporary computing | 2011

Text Independent Emotion Recognition Using Spectral Features

Rahul Chauhan; Jainath Yadav; Shashidhar G. Koolagudi; K. Sreenivasa Rao

This paper presents text independent emotion recognition from speech using mel frequency cepstral coefficients (MFCCs) along with their velocity and acceleration coefficients. In this work simulated Hindi emotion speech corpus, IITKGP-SEHSC is used for conducting the emotion recognition studies. The emotions considered are anger, disgust, fear, happy, neutral, sad, sarcastic, and surprise. Gaussian mixture models are used for developing emotion recognition models. Emotion recognition performance for text independent and text dependent cases are compared. Around 72% and 82% of emotion recognition rate is observed for text independent and dependent cases respectively.


Circuits Systems and Signal Processing | 2016

Prosodic Mapping Using Neural Networks for Emotion Conversion in Hindi Language

Jainath Yadav; K. Sreenivasa Rao

An emotion is made of several components such as physiological changes in the body, subjective feelings and expressive behaviors. These changes in speech signal are mainly observed in prosody parameters such as pitch, duration and energy. Hindi language is mostly syllabic in nature. Syllables are the most suitable basic units for the analysis and synthesis of speech. Therefore, vowel onset point detection method is used to segment the speech utterance into syllable like units. In this work, prosody parameters are modified using instants of significant excitation (epochs) and these instants are detected using zero frequency filtering-based method. Epoch locations in the voiced speech correspond to instants of glottal closure, and in the unvoiced region, they correspond to some random instants of significant excitation. Anger, happiness and sadness emotions are considered as target emotions in the proposed emotion conversion framework. Feedforward neural network models are explored for mapping the prosodic parameters between neutral and target emotions. Predicted prosodic parameters of the target emotion are incorporated into neutral speech at syllable level to produce the desired emotional speech. After incorporating the emotion-specific prosody, perceptual quality of the transformed speech is evaluated by subjective tests.


international conference on cognitive computing and information processing | 2015

Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral speech

Jainath Yadav; K. Sreenivasa Rao

In emotional-speech, it is observed that some words and phrases are spoken prominently, compared to neutral-speech. The prominence of these specific words and phrases are reflected in the form of prosodic features such as duration, intonation and intensity patterns of the words or phrases. The neutral speech and emotional speech have basic difference due to prosody aspects of speech. Three acoustic aspects of prosodic features were examined: the pitch contour, durations, and the intensity contour. These prosodic features from Hindi emotional-speech are imposed on Hindi neutral-speech at three different levels; sentence, word and syllable levels. The pitch contour, durations, and the intensity contour were imposed on neutral-speech using Praat tool with the help of Praat script. Subjective result indicates that syllable level fragments are good choice than word or sentence level fragments for generating emotional speech from neutral speech.


international conference on devices and communications | 2011

Effect of Low Bit Rate Speech Coding on Epoch Extraction

Anil Kumar Vuppala; Jainath Yadav; Saswat Chakrabarti; K. Sreenivasa Rao

Speech coding is one of the major degradation involved in building the speech systems in mobile environment. In this paper, we are exploring the effect of low bit rate speech coding on the accuracy of detection of epochs. Epoch is referred as the instant of significant excitation of the vocal-tract system during production of speech. Many speech applications depend on the the accurate estimation of the epoch locations. Epoch extraction from speech signal is challenging due to time-varying characteristics of the excitation source and vocal-tract system. For determining the epochs, two recently developed accurate methods (i) zero frequency filter (ZFF) (ii) Dynamic programming projected phase slope (DYPSA) are used. Most of the epoch extraction methods except ZFF method, attempt to remove the characteristics of the vocal-tract system, in order to emphasize the excitation characteristics in the residual. ZFF method extracts the epoch locations directly from the speech signals using impulse like nature of excitation. Speech coders used in this study are GSM full rate (ETSI 06.10), CELP (FS-1016), and MELP (TI 2.4 kbps). Performance of epoch extraction methods is evaluated using CMU-Arctic data using the epoch locations from electro-glottograph as reference.


international conference on contemporary computing | 2011

Effect of Noise on Vowel Onset Point Detection

Anil Kumar Vuppala; Jainath Yadav; K. Sreenivasa Rao; Saswat Chakrabarti

This paper discuss the effect of noise on vowel onset point (VOP) detection performance. Noise is one of the major degradation in real-time environments. In this work, initially effect of noise on VOP detection is studied by using recently developed VOP detection method. In this method, VOPs are detected by combining the complementary evidence from excitation source, spectral peaks and modulation spectrum to improve VOP detection performance. Later spectral processing based speech enhancement methods such as spectral subtraction and minimum mean square error (MMSE) are used for preprocessing to improve the VOP detection performance under noise. Performance of the VOP detection is analyzed by using TIMIT database for white and vehicle noise. In general, performance of VOP detection is degraded due to noise and in particular performance is effected significantly due to spurious VOPs introduced at low SNR values. Experimental results indicate that the speech enhancement techniques provides the improvement in the VOP detection performance by eliminating spurious VOPs under noise.


Speech Communication | 2018

Epoch detection from emotional speech signal using zero time windowing

Jainath Yadav; Md. Shah Fahad; K. Sreenivasa Rao

Abstract The main objective of this work is to enhance the performance of epoch detection in the case of emotional speech. Existing epoch estimation methods require either modeling of the vocal-tract system or a priori information of the average pitch period. The performance of existing epoch estimation methods degrades significantly due to rapid variation of the pitch period in the emotional speech. In the present work, we have utilized the advantage of zero time windowing method, which provides instantaneous spectral information at each sample point due to the contribution of that sample point itself. The amplitudes of spectral peaks are higher at the instants of epochs compared to neighbouring sample points. The proposed method uses the sum of three prominent spectral peaks at each sampling instant of the Hilbert envelope of Numerator Group Delay (HNGD) spectrum, for accurate detection of epochs in the emotional speech. The experimental result shows that the accuracy of the proposed method is better than existing methods in the case of emotional speech. It is also observed that the proposed method works well even for the aperiodic nature of the speech signal and it is robust against emotional speech.

Collaboration


Dive into the Jainath Yadav's collaboration.

Top Co-Authors

Avatar

K. Sreenivasa Rao

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Anil Kumar Vuppala

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Saswat Chakrabarti

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Arijul Haque

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Arup Kumar Dutta

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Gurunath Reddy M

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Harikrishna D M

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

K E Manjunath

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

K. S. Rao

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Manav Bhaykar

Indian Institute of Technology Kharagpur

View shared research outputs
Researchain Logo
Decentralizing Knowledge