D. Govind
Amrita Vishwa Vidyapeetham
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by D. Govind.
International Journal of Speech Technology | 2013
D. Govind; S. R. M. Prasanna
The objective of the present work is to provide a detailed review of expressive speech synthesis (ESS). Among various approaches for ESS, the present paper focuses the development of ESS systems by explicit control. In this approach, the ESS is achieved by modifying the parameters of the neutral speech which is synthesized from the text. The present paper reviews the works addressing various issues related to the development of ESS systems by explicit control. The review provided in this paper include, review of the various approaches for text to speech synthesis, various studies on the analysis and estimation of expressive parameters and various studies on methods to incorporate expressive parameters. Finally the review is concluded by mentioning the scope of future work for ESS by explicit control.
international conference on signal processing | 2012
D. Govind; S. R. M. Prasanna
This work proposes a modified zero frequency filtering (ZFF) method for epoch extraction from emotional speech. Epochs refers the instants of maximum excitation of the vocal tract. In the conventional ZFF method, the epochs are estimated by trend removing the output of the zero frequency resonator (ZFR) using the window length equal to the average pitch period of the utterance. Use of this fixed window length for the epoch estimation causes spurious or missed estimation from the speech signals having rapid pitch variations like in emotional speech. This work therefore proposes a refined ZFF method for epoch estimation by trend removing the output of ZFR using the variable windows obtained by finding the average pitch periods for every fixed blocks of speech and low pass filtering the resulting trend removed signal segments using the estimated pitch as the cutoff frequency. The epoch estimation performance is evaluated for five different emotions in the German emotional speech corpus having simultaneous electro-glotto graph (EGG) recordings. The improved epoch estimation performance indicates the robustness of the proposed method against rapid pitch variations in emotional speech signals. The effectiveness of the proposed method is also confirmed by the improved epoch estimation performance on the Hindi emotional speech database.
International Journal of Speech Technology | 2013
D. Govind; S. R. Mahadeva Prasanna
Modifying the prosody parameters like pitch, duration and strength of excitation by desired factor is termed as prosody modification. The objective of this work is to develop a dynamic prosody modification method based on zero frequency filtered signal (ZFFS), a byproduct of zero frequency filtering (ZFF). The existing epoch based prosody modification techniques use epochs as pitch markers and the required prosody modification is achieved by the interpolation of epoch intervals plot. Alternatively, this work proposes a method for prosody modification by the resampling of ZFFS. Also the existing epoch based prosody modification method is further refined for modifying the prosodic parameters at every epoch level. Thus providing more flexibility for prosody modification. The general framework for deriving the modified epoch locations can also be used for obtaining the dynamic prosody modification from existing PSOLA and epoch based prosody modification methods. The quality of the prosody modified speech is evaluated using waveforms, spectrograms and subjective studies. The usefulness of the proposed dynamic prosody modification is demonstrated for neutral to emotional conversion task. The subjective evaluations performed for the emotion conversion indicate the effectiveness of the dynamic prosody modification over the fixed prosody modification for emotion conversion. The dynamic prosody modified speech files synthesized using the proposed, epoch based and TD-PSOLA methods are available at http://www.iitg.ac.in/eee/emstlab/demos/demo5.php.
Signal, Image and Video Processing | 2017
V. Sowmya; D. Govind; K. P. Soman
This paper provides an alternative framework for color-to-grayscale image conversion by exploiting the chrominance information present in the color image using singular value decomposition (SVD). In the proposed technique of color-to-grayscale image conversion, a weight matrix corresponds to the chrominance components is derived by reconstructing the chrominance data matrix (planes a* and b*) from the eigenvalues and eigenvectors computed using SVD. The final grayscale converted image is obtained by adding the weighted chrominance data to the luminous intensity which is kept intact for the CIEL*a*b* color space of the given color image. The effectiveness of the proposed grayscale conversion is confirmed by the comparative analysis performed on the color-to-gray benchmark dataset across 10 existing algorithms based on the standard objective measures, namely normalized cross-correlation, color contrast preservation ratio, color content fidelity ratio, E score and subjective evaluation.
International Journal of Speech Technology | 2017
D. Pravena; D. Govind
The work presented in this paper is focused on the development of a simulated emotion database particularly for the excitation source analysis. The presence of simultaneous electroglottogram (EGG) recordings for each emotion utterance helps to accurately analyze the variations in the source parameters according to different emotions. The work presented in this paper describes the development of comparatively large simulated emotion database for three emotions (Anger, Happy and Sad) along with neutrally spoken utterances in three languages (Tamil, Malayalam and Indian English). Emotion utterances in each language are recorded from 10 speakers in multiple sessions (Tamil and Malayalam). Unlike the existing simulated emotion databases, instead of emotionally neutral utterances, emotionally biased utterances are used for recording. Based on the emotion recognition experiments, the emotions elicited from emotionally biased utterances are found to show more emotion discrimination as compared to emotionally neutral utterances. Also, based on the comparative experimental analysis, the speech and EGG utterances of the proposed simulated emotion database are found to preserve the general trend in the excitation source characteristics (instantaneous F0 and strength of excitation parameters) for different emotions as that of the classical German emotion speech-EGG database (EmoDb). Finally, the emotion recognition rates obtained for the proposed speech-EGG emotion database using the conventional mel frequency cepstral coefficients and Gaussian mixture model based emotion recognition system, are found to be comparable with that of the existing German (EmoDb) and IITKGP-SESC Telugu speech emotion databases.
Circuits Systems and Signal Processing | 2016
D. Govind; Tinu T. Joy
Modification of suprasegmental features such as pitch and duration of original speech by fixed scaling factors is referred to as static prosody modification. In dynamic prosody modification, the prosodic scaling factors (time-varying modification factors) are defined for all the pitch cycles present in the original speech. The present work is focused on improving the naturalness of the prosody modified speech by reducing the generation of piecewise constant segments in the modified pitch contour. The prosody modification is performed by anchoring around the accurate instants of significant excitation estimated from the original speech. The division of longer pitch intervals into many equal intervals over long speech segments introduces step-like discontinuities in the form of piecewise constant segments in the modified pitch contours. The effectiveness of proposed dynamic modification method is initially confirmed from the smooth modified pitch contour plot obtained for finer static prosody scaling factors, waveforms, spectrogram plots and comparison subjective evaluations. Also, the average
international conference on signal processing | 2014
D. Govind; Anju Susan Biju; Aguthu Smily
international conference on signal processing | 2014
Nagaraj Adiga; D. Govind; S. R. Mahadeva Prasanna
F_0
Communication (NCC), 2016 Twenty Second National Conference on | 2016
D. Govind; P.M. Hisham; D. Pravena
international conference on signal and image processing applications | 2015
D. Govind; R. Vishnu; D. Pravena
F0 jitter computed from the pitch segments of each glottal activity region in the modified speech is proposed as an objective measure for the prosody modification. The naturalness of the prosody modified speech using the proposed method is objectively and subjectively compared with that of the existing zero frequency filtered signal-based dynamic prosody modification. Also, the proposed algorithm effectively preserves the dynamics of the prosodic patterns in singing voices where in the