Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where V. Ramu Reddy is active.

Publication


Featured researches published by V. Ramu Reddy.


International Journal of Speech Technology | 2013

Pitch synchronous and glottal closure based speech analysis for language recognition

K. Sreenivasa Rao; Sudhamay Maity; V. Ramu Reddy

This paper explores pitch synchronous and glottal closure (GC) based spectral features for analyzing the language specific information present in speech. For determining pitch cycles (for pitch synchronous analysis) and GC regions, instants of significant excitation (ISE) are used. The ISE correspond to the instants of glottal closure (epochs) in the case of voiced speech, and some random excitations like onset of burst in the case of nonvoiced speech. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. Proposed pitch synchronous and glottal closure spectral features are evaluated using language recognition studies. The evaluation results indicate that language recognition performance is better with pitch synchronous and GC based spectral features compared to conventional spectral features derived through block processing. GC based spectral features are found to be more robust against degradations due to background noise. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.


international conference on embedded networked sensor systems | 2014

Person identification from arbitrary position and posture using kinect

V. Ramu Reddy; Tanushyam Chattopadhyay; Kingshuk Chakravarty; Aniruddha Sinha

In this paper authors have proposed a person identification method independent of his position with respect to the input sensor. The proposed method works for various postures or states namely, standing, sitting, walking. This method initially identifies the persons state and separate SVM based models are used for person identification (PI) for each of these three above mentioned states.


international conference on computer and communication technology | 2011

Intonation modeling using FFNN for syllable based Bengali text to speech synthesis

V. Ramu Reddy; K. Sreenivasa Rao

This paper proposes an intonation model using feed forward neural network (FFNN) for syllable based text to speech (TTS) synthesis system for an Indian language Bengali. The features used to model the neural network include set of positional, contextual and phonological features. The proposed intonation model predicts three F0 values correspond to initial, middle and final positions of each syllable. These three F0 values captures the broad shape of the F0 contour of the syllable. The prediction performance of the neural network model is compared with the Classification and Regression Tree (CART) model which was used by Festival for building the TTS. Both CART and FFNN models are evaluated by means of objective measures such as average prediction error (μ), root mean squared error (RMSE) and correlation coefficient (γ X, Y). The models are also evaluated using subjective listening tests.


international conference on multimedia and expo | 2014

Recognition of who is doing what from the stick model obtained from Kinect

V. Ramu Reddy; Kingshuk Chakravarty; Tanushyam Chattopadhyay; Aniruddha Sinha; Arpan Pal

In this demo, authors are going to demonstrate a method of identifying the person and his/her activities such as sitting, standing and walking using the skeleton information (stick model) obtained from Kinect. This set up is deployed in a drawing room for the real-time Television Rating Point (TRP) measurement.


Proceedings of the 2nd workshop on Emotion Representations and Modelling for Companion Systems | 2016

Emotion detection and recognition using HRV features derived from photoplethysmogram signals

Raj Rakshit; V. Ramu Reddy; Parijat Deshpande

Detection of true human emotions has attracted a lot of interest in the recent years. The applications range from e-retail to health-care for developing effective companion systems with reliable emotion recognition. This paper proposes heart rate variability (HRV) features extracted from photoplethysmogram (PPG) signal obtained from a cost-effective PPG device such as Pulse Oximeter for detecting and recognizing the emotions on the basis of the physiological signals. The HRV features obtained from both time and frequency domain are used as features for classification of emotions. These features are extracted from the entire PPG signal obtained during emotion elicitation and baseline neutral phase. For analyzing emotion recognition, using the proposed HRV features, standard video stimuli are used. We have considered three emotions namely, happy, sad and neutral or null emotions. Support vector machines are used for developing the models and features are explored to achieve average emotion recognition of 83.8% for the above model and listed features.


international conference on automation robotics and applications | 2015

Robotics audition using Kinect

V. Ramu Reddy; Parijat Deshpande; Ranjan Dasgupta

Robot audition systems are expected to support a variety of civilian and rescue applications in hazardous situations. Sensed data can only be interpreted meaningfully when referenced to the location of the sensor, making localization an important area of research. In this paper, we have developed sound source localization system for our Fire Bird VI robot. Localization algorithms like cross correlation (CC), phase transform (PHAT) and maximum-likelihood (ML) are explored to find the direction of arrival (DOA) by estimating the time delay of arrival (TDOA) from the received signals of linear array microphones of Kinect sensor. The sound signals comprising of different sample pings and pause durations ranging from 3 ms to 3 s durations, frequencies varying from 100 Hz to 5 kHz are tested using different microphone pairs of Kinect in different azimuths (DOA) ranging from 0° to 180° across different distances like 1 m, 1.5 m and 2 m. The performance of localization algorithms is evaluated by computing the error between estimated DOA and actual DOA. It is found that PHAT algorithm outperformed others, however some errors are obtained due to inherent reverberation affects may have caused by room. The authors also present practical limitations which result in errors for different azimuths based on distance between microphone pairs and sampling frequency of signals.


intelligent systems design and applications | 2013

Fusion of spectral and time domain features for crowd noise classification system

V. Ramu Reddy; Aniruddha Sinha; Guruprasad Seshadri

In this paper, we explore features related to spectral and time domain for classification of crowd noise. Spectral information is represented by mel-frequency cepstral coefficients (MFCC) and spectral flatness measure (SFM), whereas time domain information is represented by short-time energy (STE) and zero-cross rate (ZCR). For carrying out these studies, crowd noise data collected from railway stations and book fairs have been used. In this study, two categories of crowd noise, namely, no crowd and crowd, are used. Support Vector Machines (SVM) are used to capture the discriminative information between the above mentioned noise categories, from the spectral and time domain features. The SVM models are developed separately using spectral and time domain features. The classification performance of the developed SVM models using spectral and time domain features is observed to be 91.35% and 84.65%, respectively. In this work, we have also examined the performance of the crowd noise classification system by combining the spectral and time domain information at feature and score levels. The classification performance using feature and score level fusion is observed to be 93.10% and 96.25% respectively.


intelligent human computer interaction | 2012

Better human computer interaction by enhancing the quality of text-to-speech synthesis

V. Ramu Reddy; K. Sreenivasa Rao

In this paper we propose high quality prosody models for enhancing the quality of text-to-speech (TTS) synthesis for providing better human computer interaction. In this study prosody refers to duration and intonation patterns of the sequence of syllables. In this work, prosody models are developed using feedforward neural networks, and prosodic information is predicted from linguistic and production constraints of syllables. The prediction accuracy of the proposed neural network based prosody models is compared objectively with Classification and Regression Tree based prosody models used by Festival. Subjective listening tests are also performed to evaluate the quality of the synthesized speech generated by incorporating the predicted prosodic features. From the evaluation studies, it is observed that prediction accuracy is better for neural network models, compared to other models.


Neurocomputing | 2016

Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

V. Ramu Reddy; K. Sreenivasa Rao

Prosody plays an important role in improving the quality of text-to-speech synthesis (TTS) system. In this paper, features related to the linguistic and the production constraints are proposed for modeling the prosodic parameters such as duration, intonation and intensities of the syllables. The linguistic constraints are represented by positional, contextual and phonological features, and the production constraints are represented by articulatory features. Neural network models are explored to capture the implicit duration, F0 and intensity knowledge using above mentioned features. The prediction performance of the proposed neural network models is evaluated using objective measures such as average prediction error (µ), standard deviation (?) and linear correlation coefficient ( γ X , Y ) . The prediction accuracy of the proposed neural network models is compared with other state-of-the-art prosody models used in TTS systems. The prediction accuracy of the proposed prosody models is also verified by conducting listening tests, after integrating the proposed prosody models to the baseline TTS system.


Special Session on Smart Medical Devices - From Lab to Clinical Practice | 2017

PerDMCS: Weighted Fusion of PPG Signal Features for Robust and Efficient Diabetes Mellitus Classification.

V. Ramu Reddy; Anirban Dutta Choudhury; Srinivasan Jayaraman; Naveen Kumar Thokala; Parijat Deshpande; Venkatesh Kaliaperumal

Non-invasive detection of Diabetes Mellitus (DM) has attracted a lot of interest in the recent years in pervasive health care. In this paper, we explore features related to heart rate variability (HRV) and signal pattern of the waveform from photoplethysmogram (PPG) signal for classifying DM (Type 2). HRV features includes timedomain (F1), frequency domain ( F2), non-linear features ( F3) where as waveform features ( F4) are one set of features such as height, width, slope and durations of pulse. The study was carried out on 50 healthy subjects and 50 DM patients. Support Vector Machines (SVM) are used to capture the discriminative information between the above mentioned healthy and DM categories, from the proposed features. The SVM models are developed separately using different sets of features F1, F2, F3,andF4, respectively. The classification performance of the developed SVM models using time-domain, frequency domain, non-linear and waveform features is observed to be 73%, 78%, 80% and 77%. The performance of the system using combination of all features is 82%. In this work, the performance of the DM classification system by combining the above mentioned feature sets with different percentage of discriminate features from each set is also examined. Furthermore weight based fusion is performed using confidence values obtained from each model to find the optimal set of features from each set with optimal weights for each set. The best performance accuracy of 89% is obtained by scores fusion where combinations of mixture of 90% features from the feature sets F1 and F2 and mixture of 100% features from the feature sets F3 andF4, with fusion optimal weights of 0.3 and 0.7, respectively.

Collaboration


Dive into the V. Ramu Reddy's collaboration.

Top Co-Authors

Avatar

K. Sreenivasa Rao

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sudhamay Maity

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ranjan Dasgupta

Tata Consultancy Services

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge