Is this you? Create Your Porfile

Sudarsana Reddy Kadiri

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sudarsana Reddy Kadiri is active.

Explore More

Publication

Featured researches published by Sudarsana Reddy Kadiri.

Speech Communication | 2017

Epoch extraction from emotional speech using single frequency filtering approach

Sudarsana Reddy Kadiri; B. Yegnanarayana

An approach, which exploits the nature of impulse-like excitation in the speech signal is explored.Three properties of impulse are used to extract impulse-like discontinuities, namely: (a)An impulse in the time domain results in flat spectrum in the frequency domain.(b)An impulse event in the time domain is a high energy event, i.e., the strength of the impulse is substantially larger than the strengths of the samples in the neighborhood.(c)The effect of impulse is spread at all the frequencies.We demonstrate the benefit of exploiting the impulse-like events in terms of epoch identification accuracy.The results of the proposed methods for emotional speech is comparable to the results for neutral speech, and is better than the results from many of the standard methods. Epochs are instants of significant excitation of the vocal tract system during production of voiced speech. Existing methods for epoch extraction provide good results on neutral speech. But effectiveness of these methods has not been examined carefully for analysis of emotional speech, where the emotion characteristics are embedded mainly in the source component of the signal. Performance of the state-of-art epoch extraction methods on emotional speech data may be affected due to large variations in the pitch period. An approach, which exploits the nature of impulse-like excitation in the speech signal, instead of the pitch period information, is explored in this paper. The approach uses single frequency filtering (SFF) analysis of speech signals, which provides the high temporal resolution of some features of excitation source (such as impulse-like events) and high spectral resolution for some features of spectrum (such as harmonics and resonances). The Berlin emotional speech database (EMO-DB), which contains the simultaneous electroglottograph (EGG) recordings is used as the ground truth. For comparison, several epoch extraction methods are evaluated in terms of both reliability and accuracy measures for six different emotion categories and neutral speech. The results indicate that the performance of the proposed SFF-based methods for emotional speech is comparable to the results for neutral speech, and is better than the results from many of the standard methods.

Toward Robotic Socially Believable Behaving Systems (I) | 2016

Analysis of Emotional Speech—A Review

P. Gangamohan; Sudarsana Reddy Kadiri; B. Yegnanarayana

Speech carries information not only about the lexical content, but also about the age, gender, signature and emotional state of the speaker. Speech in different emotional states is accompanied by distinct changes in the production mechanism. In this chapter, we present a review of analysis methods used for emotional speech. In particular, we focus on the issues in data collection, feature representations and development of automatic emotion recognition systems. The significance of the excitation source component of speech production in emotional states is examined in detail. The derived excitation source features are shown to carry the emotion correlates.

international conference on acoustics, speech, and signal processing | 2015

Analysis of singing voice for epoch extraction using Zero Frequency Filtering method

Sudarsana Reddy Kadiri; B. Yegnanarayana

Epoch is the instant of significant excitation of the vocal tract system during the production of voiced speech. Estimation of epochs or Glottal closure instants (GCIs) is a well studied topic in the speech analysis. From the recent studies on GCI detection from singing voice with state-of-art methods proposed for speech, there exist a clear gap in accuracy between speech and singing voice. This is because of source-filter interaction in singing voice compared to speech. Performance of existing algorithms deteriorates as most of the techniques depends on the ability to model the vocal tract system in order to emphasize the excitation characteristics in the residual. The objective of this paper is to analyze the singing voice for the estimation of epochs by studying the characteristics of the source-filter interaction and the effect of wider range of pitch using the Zero Frequency Filtering (ZFF) method. It is observed that high source-filter interaction can be captured in the form of the impulse-like excitation by passing the signal through three ideal digital resonators having poles at zero frequency, and the effect of wider range of pitch can be controlled by processing short segment (0.4-0.5 sec) signal.

Circuits Systems and Signal Processing | 2016

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Hari Krishna Vydana; Sudarsana Reddy Kadiri; Anil Kumar Vuppala

The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.

conference of the international speech communication association | 2016

Robust Estimation of Fundamental Frequency Using Single Frequency Filtering Approach.

Vishala Pannala; G. Aneeja; Sudarsana Reddy Kadiri; B. Yegnanarayana

A new method for robust estimation of fundamental frequency (F0) from speech signal is proposed in this paper. The method exploits the high SNR regions of speech in time and frequency domains in the outputs of single frequency filtering (SFF) of speech signal. The high resolution in the frequency domain brings out the harmonic characteristics of speech clearly. The harmonic spacing in the high SNR regions of spectrum determine the F0. The concept of root cepstrum is used to reduce the effects of vocal tract resonances in the F0 estimation. The proposed method is evaluated for clean speech and noisy speech simulated for 15 different degradations at different noise levels. Performance of the proposed method is compared with four other standard methods of F0 extraction. From the results it is evident that the proposed method is robust for most types of degradations.

international conference on industrial and information systems | 2014

Neutral to anger speech conversion using non-uniform duration modification

Anil Kumar Vuppala; Sudarsana Reddy Kadiri

In this paper, the non-uniform duration modification is exploited along with other prosody features for neutral speech to anger speech conversion. The non-uniform duration modification method modifies the durations of vowel and pause segments by different modification factors. Vowel segments are modified by factors based on their identities, and pause segments by uniform factors. Consonant and transition segments are not modified. These modification factors are derived from the analysis of neutral and anger speech. For this purpose, a well known Indian database named as the Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) is chosen for analysis of emotions and synthesis of emotions from neutral speech. The prosodic features used in this study for emotion conversion are pitch contour, intensity contour, and duration contour. Subjective listening test results show that the effectiveness of perception of emotion is better in case of non-uniform duration modification than uniform duration modification.

Computer Speech & Language | 2019

Spectral and temporal manipulations of SFF envelopes for enhancement of speech intelligibility in noise

Nivedita Chennupati; Sudarsana Reddy Kadiri; B. Yegnanarayana

Abstract This paper presents a method for modifying speech to enhance its intelligibility in noise. The features contributing to intelligibility are analyzed using the recently proposed single frequency filtering (SFF) analysis of speech signals. In the SFF method, the spectral and temporal resolutions can be controlled using a single parameter of the filter, corresponding to the location of the pole on the negative real axis with respect to the unit circle in the z-plane. The SFF magnitude (envelope) and phase at several frequencies can be used to synthesize the original speech signal. Analysis of highly intelligible speech shows that the speech signal is more intelligible when it has higher dynamic range of amplitude locally (fine structure) and/or lower dynamic range of amplitude globally (gross structure) in both the spectral and temporal domains. Some features of normal speech are modified at fine and gross temporal and spectral levels, and the modified SFF envelopes are used to synthesize speech. The proposed method gives higher objective scores of intelligibility compared to original and the reference method (spectral shaping and dynamic range compression), under different conditions of noise. In subjective evaluation, though the word accuracies are not significantly different between the proposed and reference methods, listeners seem to prefer the proposed method as it gives louder and crisper sound.

Speech Communication | 2018

Significance of phase in single frequency filtering outputs of speech signals

Nivedita Chennupati; Sudarsana Reddy Kadiri; B. Yegnanarayana

Abstract Studies on phase component of signals are important due to complementary information it provides besides the amplitude information. Though most studies focused on the phase of the short-time Fourier transform (STFT), there are other forms of phase like the phase of an analytic signal and the phase of the signals obtained through filtering operation on the signals. In this paper the significance of the phase of single frequency filtering (SFF) output of signals is examined. The single frequency filter has a pole on the negative real axis, close to the unit circle in the z-plane. The pole location parameter (r) controls the bandwidth of the filter at each frequency. Using sufficient number of filters in the SFF analysis, the speech signal can be reconstructed from the filtered signals without distortion. The relative importance of the SFF magnitude and SFF phase for reconstruction of the signal is examined for different values of r, and also by interchanging the magnitude and phase components obtained for two different values of r, as well as for two different utterances for the same value of r. It is observed that the intelligibility of the reconstructed signal is high if the SFF phase for values of r close to unity are used. The information in the reconstructed signal is dominated by the phase component of the SFF output.

international conference on acoustics, speech, and signal processing | 2017

Speech polarity detection using strength of impulse-like excitation extracted from speech epochs

Sudarsana Reddy Kadiri; B. Yegnanarayana

In this paper, we address the issue of speech polarity detection using strength of impulse-like excitation around epoch. The correct detection of speech polarity is a crucial step for many speech processing algorithms to extract suitable information. Occurrence of errors in the detection of speech polarity could have an impact on the performance of speech systems. Automatic detection of speech polarity has become an important preliminary step for many speech processing algorithms. We propose a method based on the knowledge of impulse-like excitation of speech production mechanism. The impulse-like excitation is reflected across all frequencies including the zero frequency (0 Hz). Using the slope around zero crossings of the zero frequency filtered signal, an automatic speech polarity detection method is proposed. Performance of the proposed method is demonstrated on 8 different speech corpora. The proposed method is compared with the three existing techniques such as gradient of the spurious glottal waveforms (GSGW), oscillating moments-based polarity detection (OMPD) and residual excitation skewness (RESKEW). From the experimental results, it is observed that the performance of the proposed method is comparable or better than the existing methods for the experiments considered.

conference of the international speech communication association | 2015