Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hari Krishna Vydana is active.

Publication


Featured researches published by Hari Krishna Vydana.


Circuits Systems and Signal Processing | 2016

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Hari Krishna Vydana; Sudarsana Reddy Kadiri; Anil Kumar Vuppala

The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.


international conference on mining intelligence and knowledge exploration | 2015

Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion

Hari Krishna Vydana; V. V. Vidyadhara Raju; Suryakanth V. Gangashetty; Anil Kumar Vuppala

Most of the speech processing applications suffer from a degradation in performance when operated in emotional environments. The degradation in performance is mostly due to a mismatch between developing and operating environments. Model adaptation and feature adaptation schemes have been employed to adapt speech systems developed in neutral environments to emotional environments. In this study, we have considered only anger emotion in emotional environments. In this work, we have studied the signal level conversion from anger emotion to neutral emotion. Emotion in human speech is concentrated over a small region in the entire utterance. The regions of speech that are highly influenced by the emotive state of the speaker is are considered as emotionally significant regions of an utterance. Physiological constraints of human speech production mechanism are explored to detect the emotionally significant regions of an utterance. Variation of various prosody parameters Pitch, duration and energy based on their position in the sentences is analyzed to obtain the modification factors. Speech signal in the emotionally significant regions is modified using the corresponding modification factor to generate the neutral version of the anger speech. Speech samples from Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus IITKGP-SESC are used in this study. Ai¾?subjective listening test is performed for evaluating the effectiveness of the proposed conversion.


ieee india conference | 2015

Detection of emotionally significant regions of speech for emotion recognition

Hari Krishna Vydana; Peddakota Vikash; Tallam Vamsi; Kolla Pavan Kumar; Anil Kumar Vuppala

Emotions in human speech are short lived. In an emotive utterance, the emotive gestures produced due to the emotive state of the speaker persists only to a shorter duration. In this study, the regions of an utterance that are highly influenced by the emotive state of the speaker are detected. These regions are labeled as emotionally significant regions. Data from the detected emotionally significant regions is used for developing an emotion recognition system. Physiological constraints of human speech production system are explored for detecting the emotionally significant regions of an utterance. Spectral features from the detected emotionally significant regions are used to develop an emotion recognition system. A significant improvement in the performance of the emotion recognition system is observed in the present approach. An average improvement of 11% is in noted owing to the use of data from emotionally significant regions while developing an emotion recognition system. Gaussian mixture modelling (GMM) technique is employed to develop an emotion recognition system. During the present study, speech samples from Berlin emotion speech database (EMO-DB) are used. Four basic emotions such as anger, happy, neutral and fear are considered for study.


international symposium on neural networks | 2017

Significance of neural phonotactic models for large-scale spoken language identification

Brij Mohan Lal Srivastava; Hari Krishna Vydana; Anil Kumar Vuppala; Manish Shrivastava

Language identification (LID) is vital frontend for spoken dialogue systems operating in diverse linguistic settings to reduce recognition and understanding errors. Existing LID systems which use low-level signal information for classification do not scale well due to exponential growth of parameters as the classes increase. They also suffer performance degradation due to the inherent variabilities of speech signal. In the proposed approach, we model the language-specific phonotactic information in speech using recurrent neural network for developing an LID system. The input speech signal is tokenized to phone sequences by using a common language-independent phone recognizer with varying phonetic coverage. We establish a causal relationship between phonetic coverage and LID performance. The phonotactics in the observed phone sequences are modeled using statistical and recurrent neural network language models to predict language-specific symbol from a universal phonetic inventory. Proposed approach is robust, computationally light weight and highly scalable. Experiments show that the convex combination of statistical and recurrent neural network language model (RNNLM) based phonotactic models significantly outperform a strong baseline system of Deep Neural Network (DNN) which is shown to surpass the performance of i-vector based approach for LID. The proposed approach outperforms the baseline models in terms of mean F1 score over 176 languages. Further we provide significant information-theoretic evidence to analyze the mechanism of the proposed approach.


Journal of the Acoustical Society of America | 2016

Detection of fricatives using S-transform

Hari Krishna Vydana; Anil Kumar Vuppala

Two prime acoustic characteristics of fricatives are the concentration of spectral energy above 3 kHz and having noisy nature. Spectral domain approaches for detecting fricatives rely on capturing the information from spectral energy distribution. In this work, S-transform based time-frequency representation is explored for detecting fricatives from continuous speech. S-transform based time-frequency representation exhibits a progressive resolution which is tailored for localizing the high frequency events (i.e., onset and offset of fricative regions) with time. Spectral evidence computed from S-transform based time-frequency representation is observed to perform better compared to the spectral evidence computed from short time Fourier transform. The existing predictability measure based approach relies on capturing the noisy nature of fricatives. A phone level comparative analysis is carried out between S-transform and predictability measure based approaches and the phone distribution of the detected fricatives is observed to be complimentary. In this work, a combination of S-transform and predictability based approaches is put forth for detecting fricatives from continuous speech. Apart from detecting the presence of a fricative, the proposed S-transform based approach and combined approach exhibit better accuracy in detecting the boundaries of fricatives, i.e., extracting the durational information of fricatives.


international conference on signal processing | 2015

Significance of speech enhancement and sonorant regions of speech for robust language identification

Anil Kumar Vuppala; K. V. Mounika; Hari Krishna Vydana

A high degree of robustness is a prerequisite to operate speech and language processing systems in practical environments. Performance of these systems is highly influenced by varying and mixed background environments. In this paper, we put forward a robust method for automatic language identification in various background environments. Combined temporal and spectral processing method is used as a preprocessing technique for enhancing the degraded speech. Language discriminative information in high sonority regions of speech is used for the task of language identification. Sonority regions are regions of speech whose signal energy is high and these regions are less influenced by background environments. Spectral energy of formants in the glottal closure regions is employed as an acoustic correlate for the detection of sonority regions of speech. In this paper performance of the LID system is studied in various background environments like clean room, car factory, high frequency, pink and white noise environments. In this work, Indian Institute of Technology Kharagpur - Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC) is used for building language identification system. Noise speech samples from the NOISEX database are employed in the present study. The performance of the proposed method is quite satisfactory compared to existing approaches.


international conference on mining intelligence and knowledge exploration | 2015

Improved Language Identification in Presence of Speech Coding

Ravi Kumar Vuddagiri; Hari Krishna Vydana; Jiteesh Varma Bhupathiraju; Suryakanth V. Gangashetty; Anil Kumar Vuppala

Automatically identifying the language being spoken from speech plays a vital role in operating multilingual speech processing applications. A rapid growth in the use of mobile communication devices has inflicted the necessity of operating all speech processing applications in mobile environments. Degradation in the performance of any speech processing applications is majorly due to varying background environments, speech coding and transmission errors. In this work, we focus on developing a language identification system robust to degradations in coding environments in Indian scenario. Spectral features MFCC extracted from high sonority regions of speech are used for language identification. Sonorant regions of speech are the regions of speech that are perceptually loud, carry a clear pitch. The quality of coded speech in high sonority region is high compared to less sonorant regions. Spectral features MFCC extracted from high sonority regions of speech are used for language identification. In this work, GMM-UBM based modelling technique is employed to develop an language identification LID system. Present study is carried out on IITKGP-MLILSC speech database.


Expert Systems With Applications | 2018

Curriculum learning based approach for noise robust language identification using DNN with attention

Ravi Kumar Vuddagiri; Hari Krishna Vydana; Anil Kumar Vuppala

Abstract Automatic language identification (LID) in practical environments is gaining a lot of scientific attention due to rapid developments in multilingual speech processing applications. When an LID is operated in noisy environments a degradation in the performance can be observed and it can be majorly attributed to mismatch between the training and operating environments. This work is aimed towards developing an LID system that can robustly operate in clean and noisy environments. Traditionally, to reduce the mismatch between training and operating environments, noise is synthetically induced to the training corpus and these models are termed as multi-SNR models. In this work, various curriculum learning strategies are explored to train multi-SNR models, such that the trained models have better generalization in performance over varying background environments. I-vector, Deep neural networks (DNN) and DNN With Attention (DNN-WA) architectures are used in this work for developing LID systems. Experimental verification of the proposed approach is carried out using IIIT-H Indian database and AP17-OLR database. The performance of LID system is tested at different signal-to-noise ratio (SNR) levels using white and vehicular noises from NOISEX dataset. In comparison to multi-SNR models, the LID systems trained with curriculum learning have performed better in terms of equal error rate (EER) and generalization in EER across varying background environments. The degradation in the performance of LID systems due to environmental noise has been effectively reduced by training multi-SNR models using curriculum learning.


national conference on communications | 2017

Investigative study of various activation functions for speech recognition

Hari Krishna Vydana; Anil Kumar Vuppala

Significant developments in deep learning methods have been achieved with the capability to train more deeper networks. The performance of speech recognition system has been greatly improved by the use of deep learning techniques. Most of the developments in deep learning are associated with the development of new activation functions and the corresponding initializations. The development of Rectified linear units (ReLU) has revolutionized the use of supervised deep learning methods for speech recognition. Recently there has been a great deal of research interest in the development of activation functions Leaky-ReLU (LReLU), Parametric-ReLU (PReLU), Exponential Linear units (ELU) and Parametric-ELU (PELU). This work is aimed at studying the influence of various activation functions on speech recognition system. In this work, a hidden Markov model-Deep neural network (HMM-DNN) based speech recognition is used, where deep neural networks with different activation functions have been employed to obtain the emission probabilities of hidden Markov model. In this work, two datasets i.e., TIMIT and WSJ are employed to study the behavior of various speech recognition systems with different sized datasets. During the study, it is observed that the performance of ReLU-networks is superior compared to the other networks for the smaller sized dataset (i.e., TIMIT dataset). For the datasets of sufficiently larger size (i.e., WSJ) performance of ELU-networks is superior to the other networks.


international conference on mining intelligence and knowledge exploration | 2017

DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech Recognition

Vishnu Vidyadhara Raju Vegesna; Krishna Gurugubelli; Hari Krishna Vydana; Bhargav Pulugandla; Manish Shrivastava; Anil Kumar Vuppala

The main focus of this paper is towards the development of a large vocabulary Telugu speech database. Telugu is a low resource language where there exists no standardized database for building the speech recognition system (ASR). The database consists of neutral speech samples collected from 100 speakers for building the Telugu ASR system and it was named as IIIT-H Telugu speech corpus. The speech and text corpus design and the procedure followed for the collection of the database have been discussed in detail. The preliminary ASR system results for the models built in this database are reported. The architectural choices of deep neural networks (DNNs) play a crucial role in improving the performance of ASR systems. ASR trained with hybrid DNNs (DNN-HMM) with more hidden layers have shown better performance over the conventional GMMs (GMM-HMM). Kaldi tool kit is used for building the acoustic models required for the ASR system.

Collaboration


Dive into the Hari Krishna Vydana's collaboration.

Top Co-Authors

Avatar

Anil Kumar Vuppala

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Manish Shrivastava

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ravi Kumar Vuddagiri

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Suryakanth V. Gangashetty

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Brij Mohan Lal Srivastava

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Krishna Gurugubelli

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

K. V. Mounika

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

V. V. Vidyadhara Raju

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Vishnu Vidyadhara Raju Vegesna

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Bhargav Pulugandla

International Institute of Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge