Vinay Kumar Mittal
International Institute of Information Technology, Hyderabad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinay Kumar Mittal.
Journal of the Acoustical Society of America | 2013
Vinay Kumar Mittal; B. Yegnanarayana
In this paper characteristics of speech produced at different loudness levels are analyzed in terms of changes in the glottal excitation. Four loudness levels are considered in this study, namely, soft, normal, loud, and shout. The distinct changes in the excitation of the shout signal are analyzed using electroglottograph signals. The open and closed phases of the glottal vibration are distinctly different for shout signals, in comparison with those for normal speech. It is generally difficult to derive the glottal pulse information from the speech signal due to limitations in inverse filtering. Hence, the effects of changes in the excitation are examined by analyzing the speech signal using methods that can capture the temporal variations of the spectral features. In particular, the recently proposed methods of zero-frequency filtering and zero-time liftering are used in this analysis. It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region. The ratio of the LF to high frequency energy clearly discriminates the speech produced at different loudness levels. These distinctions in the excitation features are also observed in different vowel contexts and across several speakers.
consumer communications and networking conference | 2013
Vinay Kumar Mittal; B. Yegnanarayana
Shouted speech or screaming signals have been studied mostly through spectral representation such as melcepstral coefficients. Intuitive evidence that the characteristics of the excitation source may vary in the case of shouted speech has drawn little attention yet. In this paper we examine how the characteristics of both components of speech production mechanism, especially the glottal excitation source, are modified during the production of shout signals. Shouted and normal speech signals are examined along with the corresponding Electro-glotto-graph (EGG) signals. Distinguishing features like the dominant frequency and the strength of excitation are explored, along with the instantaneous fundamental frequency. These features are computed using linear prediction analysis and zero frequency filtering of the speech signal. Efficacy of these features in discriminating between shouted and normal speech is tested in five different vowel contexts.
Journal of the Acoustical Society of America | 2015
Vinay Kumar Mittal; B. Yegnanarayana
The feasibility of representing the excitation source characteristics in expressive voice signals by an aperiodic sequence of impulses in the time domain is examined in this paper. In particular, the aperiodic components of excitation of expressive voices, like the Noh voice, are examined in some detail. The aperiodic component is extracted from the speech signal using a modified zero-frequency filtering method, and it is represented using a sequence of impulses with amplitudes corresponding to the relative strength of excitation around each impulse. The spectral characteristics of the aperiodic sequence show subharmonics and harmonics of the fundamental frequency corresponding to pitch. The effects of aperiodicity are examined using spectrograms and saliency plots of synthetic amplitude and duration (i.e., frequency) modulation of sequences of impulses.
Computer Speech & Language | 2015
Vinay Kumar Mittal; B. Yegnanarayana
HighlightsProduction characteristics of laughter are analysed using EGG and speech signals.Three cases are considered: normal speech, laughed-speech and nonspeech-laugh.A modified zero-frequency filtering method is proposed to extract source features.Parameters representing production features are derived to distinguish the 3 cases.These help studying discriminating characteristics of laughter from normal speech. In this paper, the production characteristics of laughter are analysed at call and bout levels. Data of natural laughter is examined using electroglottograph (EGG) and acoustic signals. Nonspeech-laugh and laughed-speech are analysed in comparison with normal speech using features derived from the EGG and acoustic signals. Analysis of EGG signal is used to derive the average closed phase quotient in glottal cycles and the average instantaneous fundamental frequency (F0). Excitation source characteristics are analysed from the acoustic signal using a modified zero-frequency filtering (mZFF) method. Excitation impulse density and the strength of impulse-like excitation are extracted from the mZFF signal. Changes in the vocal tract system characteristics are examined in terms of the first two dominant frequencies derived using linear prediction (LP) analysis. Additional excitation information present in the acoustic signal is examined using a measure of sharpness of peaks in the Hilbert envelope of the LP residual at the glottal closure instants. Parameters representing degree of change and temporal changes in the production features are also derived to study the discriminating characteristics of laughter from normal speech. Changes are larger for nonspeech-laugh than laughed-speech, with reference to normal speech.
consumer communications and networking conference | 2012
P. Gangamohan; Vinay Kumar Mittal; B. Yegnanarayana
This paper aims to understand the components of speech that contribute to emotion characteristics in speech. Four components of speech (vocal tract, excitation, duration and intonation) are considered in this study. A Flexible Analysis Synthesis Tool (FAST) is developed to modify the features of an utterance from neutral to emotion or from emotion to neutral. The key ideas used in this work are the dynamic time warping algorithm for alignment of two utterances and a flexible prosody manipulation for incorporating the desired features. The tool is used for conversion of neutral to emotion speech. Subjective evaluation is performed based on listening tests. The tool has potential to convert neutral to emotion speech and vice-versa, which can lead to understanding the significance of various components contributing to emotional content in speech.
Journal of the Acoustical Society of America | 2014
Vinay Kumar Mittal; B. Yegnanarayana; Peri Bhaskararao
Characteristics of glottal vibration are affected by the obstruction to the flow of air through the vocal tract system. The obstruction to the airflow is determined by the nature, location, and extent of constriction in the vocal tract during production of voiced sounds. The effects of constriction on glottal vibration are examined for six different categories of speech sounds having varying degree of constriction. The effects are examined in terms of source and system features derived from the speech and electroglottograph signals. It is observed that a high degree of constriction causing obstruction to the flow of air results in large changes in these features, relative to the adjacent steady vowel regions, as in the case of apical trill and alveolar fricative sounds. These changes are insignificant when the obstruction to the airflow is less, as in the case of velar fricative and lateral approximant sounds. There are no changes in the excitation features when there is a free flow of air along the auxiliary tract, despite constriction in the vocal tract, as in the case of nasals. These studies show that effects of constriction can indeed be observed in the features of glottal vibration as well as vocal tract resonances.
conference of the international speech communication association | 2014
Vinay Kumar Mittal; B. Yegnanarayana
Automatic detection of shout in continuous speech is a challenging task. In our recent study, the characteristics of shout and normal speech signals are examined along with the electroglottograph (EGG) signals. The study highlights the changes in the characteristics of both the excitation source and the vocal tract system during production of shout, from those of normal speech. In this paper, we aim to develop an automatic system to detect regions of shout in continuous speech, based upon changes in the production characteristics of shouted speech. Discriminating production features like instantaneous fundamental frequency, strength of excitation, dominant frequency and spectral band energy ratio are extracted from the speech signal. Parameters are derived for the shout decision capturing average level and temporal changes in the features and their pairwise mutual relations. A speaker and language independent prototype automatic shout detection system is developed. Performance evaluation over four databases gave encouraging results.
international conference on control instrumentation communication and computational technologies | 2014
Kiran Kumar Lekkala; Vinay Kumar Mittal
Proportional integral differential (PID) feedback systems are known for their robustness, accuracy and stability. These systems are used in a wide variety of applications. In this paper, we explore the possibility of using a PID architecture in robotic 2D navigation systems. The prototype system developed can be implemented for robotic applications that require high precision movement to follow the control provided for an unmanned, autonomous driving system. A prototype 2D precision robot is developed, in which the PID algorithm is implemented. Experiments are conducted to ascertain the feasibility and effectiveness of PID controller in enabling high precision of robotic movements in two dimensions.
international conference on signal processing | 2015
Kiran Kumar Lekkala; Vinay Kumar Mittal
Proportional Integral Differential (PID) feedback systems are known for their robustness, accuracy and stability. These systems are used in a wide variety of applications. In this paper, we explore the possibility of using a PID architecture in robotic 3D navigation systems. The system developed can be implemented for robotic applications that require high precision of movements along the three dimensions. The precision of movements may be required with reference to the user controls provided, for example in unmanned or autonomous driving systems, which in turn require Artificial Intelligence methodologies to give the outputs such that it attains precision. An experimental 3D precision robot is developed, in which the PID algorithm is implemented. The results of experiments conducted, confirm the effectiveness of PID controller in achieving the high precision of robotic movements in three dimensions.
international conference on control instrumentation communication and computational technologies | 2014
Ch. Hasitha; N Sai Chinmayi; B. Sravya; Vinay Kumar Mittal
In the past decade, robotic applications in human life have made significant progress. However, mobility of robots and user convenience of their control is still a challenge. Utility of robots for a physically challenged person, with practicality and ease of operation, is another issue. In this paper, a robotic solution is proposed for utilization by the physically challenged. A prototype robot is developed using RF signals, with voice control or remote control options. The human voice/remote controlled robot, has obstacle avoidance and edge avoidance features as well. The prototype is useful for applications in diverse fields.