Rajesh M. Hegde
Indian Institute of Technology Kanpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rajesh M. Hegde.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Rajesh M. Hegde; Hema A. Murthy; Venkata Ramana Rao Gadde
Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed
international conference on acoustics, speech, and signal processing | 2004
Rajesh M. Hegde; Hema A. Murthy; Gadde V. Ramana Rao
In this paper, we explore new methods by which speakers can be identified and discriminated, using features derived from the Fourier transform phase. The modified group delay feature (MODGDF) which is a parameterized form of the modified group delay function is used as a front end feature in this study. A Gaussian mixture model (GMM) based speaker identification system is built with the MODGDF as the front end feature. The system is tested on both clean (TIMIT) and noisy telephone (NTIMIT) speech. The results obtained are compared with traditional Mel frequency cepstral coefficients (MFCC) which is derived from the Fourier transform magnitude. When both MFCC and MODGDF were combined, the performance improved by about 4% indicating that both phase and magnitude contain complementary information. In an earlier paper (Murthy et al. (2003)), it was shown that the MODGDF does possess phoneme specific characteristics. In this paper we show that the MODGDF has speaker specific properties. We also make an attempt to understand speaker discriminating characteristics of the MODGDF using the nonlinear mapping technique based on Sammon mapping (Sammon (1969)) and find that the MODGDF empirically demonstrates a certain level of linear separability among speakers.
Eurasip Journal on Audio, Speech, and Music Processing | 2007
Rajesh M. Hegde; Hema A. Murthy; Venkata Ramana Rao Gadde
This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF). The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude does not give any significant improvement.
IEEE Transactions on Signal Processing | 2014
Lalan Kumar; Ardhendu Tripathy; Rajesh M. Hegde
Subspace-based source localization methods utilize the spectral magnitude of the MUltiple SIgnal Classification (MUSIC) method. However, in all these methods, a large number of sensors are required to resolve closely spaced sources. A novel method for high resolution source localization based on the group delay of MUSIC is described in this work. The method can resolve both the azimuth and elevation angles of closely spaced sources using a minimal number of sensors over a planar array. At the direction of arrival (DOA) of the desired source, a transition is observed in the phase spectrum of MUSIC. The negative differential of the phase spectrum also called group delay, results in a peak at the DOA. The proposed MUSIC-Group delay spectrum defined as product of MUSIC-Magnitude (MM) and group delay spectra, resolves spatially close sources even under reverberation owing to its spatial additive property. This is illustrated by performing spectral analysis of the MUSIC-Group delay function under reverberant environments. A mathematical proof for the spatial additive property of group delay spectrum is also provided. Source localization error analysis, sensor perturbation analysis, and Cramér-Rao bound (CRB) analysis are then performed to verify the robustness of the MUSIC-Group delay method. Experiments on speech enhancement and distant speech recognition are also conducted on spatialized TIMIT and MONC databases. Experimental results obtained using objective performance measures and word error rates (WER) indicate reasonable robustness when compared to conventional source localization methods in literature.
Journal of Cardiothoracic and Vascular Anesthesia | 2014
Rajnish Garg; Shekhar Rao; Colin John; Chinnaswamy Reddy; Rajesh M. Hegde; Keshava Murthy; P.V.S. Prakash
OBJECTIVE This prospective observational study was undertaken to determine the feasibility of extubation of children in the operating room after cardiac surgery. DESIGN A prospective observational study compared with historic controls. SETTING A single tertiary care referral hospital. PARTICIPANTS One thousand consecutive pediatric patients requiring cardiac surgery aged 1 day to 18 years. Patients with spinal deformity, neurologic problems, coagulopathy as diagnosed by high international normalized ratio (INR) more than 1.5, and patients preoperatively on mechanical ventilation were excluded from the study. Data were also reviewed for another 1,000 patients operated before the beginning of this study, which constituted historic controls. INTERVENTIONS All 1,000 patients were considered as potential candidates for extubation in the operating room after cardiac surgery and managed by a combination of general anesthesia and neuraxial analgesia with a mixture of caudal morphine and dexmedetomidine, and extubation in the operating room was attempted after completion of the surgical procedure. These comprised the study group (SG). Data also were reviewed for another 1,000 patients before the beginning of this study when extubation in the operating room was not attempted and compared with this group to study the impact of extubation in the operating room on intensive care unit (ICU) stay and resource utilization. This data comprised the before-study group (BSG). MEASUREMENTS AND MAIN RESULTS Eight hundred seventy-one (87.1%) patients were extubated in the operating room. This included 40% of neonates and 70%, 85%, and 91% of patients aged between 1 and 3 months, 3 months to 1 year, and more than 1 year, respectively. Forty-five patients (4.5%) required re-intubation within 24 hours, and 9 patients died among those extubated in the OR, but for reasons thought not to be related to extubation. The ICU stay was significantly less in the study group (2.56±1.84 v 5.4±2.32 days, p<0.0001) as compared to before-study group (BSG). The number of patients in the ICU (34.76±3.19 v 59.98±4.92, p<0.0001) and the number of patients on a ventilator (5.1±1.24 v 24.5±2.88, p<0.0001) on a daily basis were significantly less in the study group, reflecting positive impact on resource utilization. CONCLUSION Extubation in the operating room was successful in 87.1% of the patients without any increase in mortality and morbidity, but with a decrease in ICU length of stay and less use of hospital resources.
Archive | 2008
Naveen Ashish; Ronald T. Eguchi; Rajesh M. Hegde; Charles K. Huyck; Dmitri V. Kalashnikov; Sharad Mehrotra; Padhraic Smyth; Nalini Venkatasubramanian
Responding to natural or man-made disasters, in a timely and effective manner, can reduce deaths and injuries, contain or prevent secondary disasters, and reduce the resulting economic losses and social disruption. During a crisis, responding organizations confront grave uncertainties in making critical decisions. They need to gather situational information (e.g., state of the civil, transportation and information infrastructures), together with information about available resources (e.g., medical facilities, rescue and law enforcement units). There is a strong correlation between the accuracy, timeliness, and reliability of the information available to the decision-makers, and the quality of their decisions. Dramatic improvements in the speed and accuracy at which information about the crisis flows through the disaster response networks has the potential to revolutionize crisis response, saving human lives and property. This chapter highlights some of the key information technology challenges being addressed in the Project RESCUE1 [1], with a particular focus
ad hoc networks | 2016
Sudhir Kumar; Rajesh M. Hegde; Niki Trigoni
Abstract In this paper, Gaussian process regression (GPR) for fingerprinting based localization is presented. In contrast to general regression techniques, the GPR not only infers the posterior received signal strength (RSS) mean but also the variance at each fingerprint location. The GPR does take into account the variance of input i.e., noisy RSS data. The hyper-parameters of GPR are estimated using trust-region-reflective algorithm. The Cramer-Rao bound is analysed to highlight the performance of the parameter estimator. The posterior mean and variance of RSS data is utilized in fingerprinting based localization. The principal component analysis is employed to choose the k strongest wi-fi access points (APs). The performance of the proposed algorithm is validated using using real field deployments. Accuracy improvements of 10% and 30% are observed in two sites compared to the Horus fingerprinting approach.
international conference on acoustics, speech, and signal processing | 2005
Rajesh M. Hegde; Hema A. Murthy; Gadde V. Ramana Rao
The paper discusses the significance of joint cepstral features derived from the modified group delay function and MFCC in speech processing. We start with a definition of cepstral features derived from the modified group delay function called the modified group delay feature (MODGDF) which is derived from the Fourier transform phase. Robustness issues like similarities of the MODGDF to RASTA and cepstral mean subtraction are discussed. The efficiency with which formants can be reconstructed for noisy cellular speech using joint features derived from early fusion is illustrated. The joint features are used for four speech processing tasks phoneme, syllable, speaker, and language recognition. Based on the results of analysis and performance evaluation, the significance of joint features derived from the MODGDF and MFCC are discussed.
international conference on acoustics, speech, and signal processing | 2010
Mrityunjaya Shukla; Rajesh M. Hegde
Conventionally the spectral magnitude of MUSIC is used for efficient beam forming and clean speech acquisition from distant microphones. The MUSIC method is unable to resolve closely spaced DOAs with a computationally plausible number of sensors. In this paper we propose the use of the group delay function computed from theMUSIC phase spectrum for efficient DOA estimation. The group delay function which has been hitherto used for temporal frequency processing of speech signals is computed on the phase spectrum of MUSIC and is found to resolve spatially contiguous speech sources. The additive property of the group delay function in the spatial domain is also discussed using root-MUSIC polynomial analysis. Experimental results on DOA estimation using a two channel microphone array show that the average error distribution of the MUSIC group delay spectrum is minimum when compared to MUSIC magnitude spectrum. Filter-Sum beam formers are trained using estimated DOAs on speech acquired from distant microphones. The results of speech recognition experiments conducted on meeting room data are used to illustrate the significance of the MUSIC group delay spectrum in speech acquisition from distant microphones.
global communications conference | 2013
Sudhir Kumar; Vatsal Sharan; Rajesh M. Hegde
In this paper, a single mobile beacon based method to localize nodes using principle of maximum power reception is proposed. Optimal positioning of the mobile beacon for minimum energy consumption is also discussed. In contrast to existing methods, the node localization is done with prior location of only three nodes. There is no need of synchronization, as there is only one mobile anchor and each node communicates only with the anchor node. Also, this method is not constrained by a fixed sensor geometry. The localization is done in a distributed fashion, at each sensor node. Experiments on node-source localization are conducted by deploying sensors in an ad-hoc manner in both outdoor and indoor environments. Localization results obtained herein indicate a reasonable performance improvement when compared to conventional methods.