Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kaustubh Kalgaonkar is active.

Publication


Featured researches published by Kaustubh Kalgaonkar.


international conference on acoustics, speech, and signal processing | 2009

One-handed gesture recognition using ultrasonic Doppler sonar

Kaustubh Kalgaonkar; Bhiksha Raj

This paper presents a new device based on ultrasonic sensors to recognize one-handed gestures. The device uses three ultrasonic receivers and a single transmitter. Gestures are characterized through the Doppler frequency shifts they generate in reflections of an ultrasonic tone emitted by the transmitter. We show that this setup can be used to classify simple one-handed gestures with high accuracy. The ultrasonic doppler based device is very inexpensive -


advanced video and signal based surveillance | 2007

Acoustic Doppler sonar for gait recogination

Kaustubh Kalgaonkar; Bhiksha Raj

20 USD for the whole setup including the acquisition system, and computationally efficient as compared to most traditional devices (e.g. video). These gestures, could potentially be used to control and drive a device.


IEEE Signal Processing Letters | 2007

Ultrasonic Doppler Sensor for Voice Activity Detection

Kaustubh Kalgaonkar; Rongquiang Hu; Bhiksha Raj

A persons gait is a characteristic that might be employed to identify him/her automatically. Conventionally, automatic for gait-based identification of subjects employ video and image processing to characterize gait. In this paper we present an Acoustic Doppler Sensor(ADS) based technique for the characterization of gait. The ADS is very inexpensive sensor that can be built using off-the-shelf components, for under


international conference on acoustics, speech, and signal processing | 2008

Ultrasonic Doppler sensor for speaker recognition

Kaustubh Kalgaonkar; Bhiksha Raj

20 USD at todays prices. We show that remarkably good gait recognition is possible with the ADS sensor.


international conference on acoustics, speech, and signal processing | 2010

Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition

Michael L. Seltzer; Alex Acero; Kaustubh Kalgaonkar

This letter describes a robust voice activity detector using an ultrasonic Doppler sonar device. An ultrasonic beam is incident on the talkers face. Facial movements result in Doppler frequency shifts in the reflected signal that are sensed by an ultrasonic sensor. Speech-related facial movements result in identifiable patterns in the spectrum of the received signal that can be used to identify speech activity. These sensors are not affected by even high levels of ambient audio noise. Unlike most other non-acoustic sensors, the device need not be taped to a talker. A simple yet robust method of extracting the voice activity information from the ultrasonic Doppler signal is developed and presented in this letter. The algorithm is seen to be very effective and robust to noise, and it can be implemented in real time.


Journal of the Acoustical Society of America | 2009

Talker-to-listener distance effects on speech production and perception

Harold A. Cheyne; Kaustubh Kalgaonkar; Mark A. Clements; Patrick M. Zurek

In this paper we present a novel use of an acoustic Doppler sonar for multi-modal speaker identification. An ultrasonic emitter directs a 40 kHz tone toward the speaker. Reflections from the speakers face are recorded as the speaker talks. The frequency of the tone is modified by the velocity of the facial structures it is reflected by. The received ultrasonic signal thus contains an entire spectrum of frequencies representing the set of all velocities of facial components. The pattern of frequencies in the reflected signal is observed to be typical of the speaker. The captured ultrasonic signal is synchronously analyzed with the corresponding voice signal to extract specific characteristics that can be used to identify the speaker. Experiments show that the information this can result in significant improvements in speaker identification accuracy both under clean conditions and in noise.


ieee automatic speech recognition and understanding workshop | 2009

Noise robust model adaptation using linear spline interpolation

Kaustubh Kalgaonkar; Michael L. Seltzer; Alex Acero

We recently proposed a new algorithm to perform acoustic model adaptation to noisy environments called Linear Spline Interpolation (LSI). In this method, the nonlinear relationship between clean and noisy speech features is modeled using linear spline regression. Linear spline parameters that minimize the error the between the predicted noisy features and the actual noisy features are learned from training data. A variance associated with each spline segment captures the uncertainty in the assumed model. In this work, we extend the LSI algorithm in two ways. First, the adaptation scheme is extended to compensate for the presence of linear channel distortion. Second, we show how the noise and channel parameters can be updated during decoding in an unsupervised manner within the LSI framework. Using LSI, we obtain an average relative improvement in word error rate of 10.8% over VTS adaptation on the Aurora 2 task with improvements of 15–18% at SNRs between 10 and 15 dB.


international conference on acoustics, speech, and signal processing | 2010

Synthesizing speech from Doppler signals

Arthur R. Toth; Kaustubh Kalgaonkar; Bhiksha Raj; Tony Ezzat

Simulating talker-to-listener distance (TLD) in virtual audio environments requires mimicking natural changes in vocal effort. Studies have identified several acoustic parameters manipulated by talkers when varying vocal effort. However, no systematic study has investigated vocal effort variations due to TLD, under natural conditions, and their perceptual consequences. This work examined the feasibility of varying the vocal effort cues for TLD in synthesized speech and real speech by (a) recording and analyzing single word tokens spoken at 1 m < or = TLD < or = 32 m, (b) creating synthetic and modified speech tokens that vary in one or more acoustic parameters associated with vocal effort, and (c) conducting perceptual tests on the reference, synthetic, and modified tokens to identify salient cues for TLD perception. Measured changes in fundamental frequency, intensity, and formant frequencies of the reference tokens across TLD were similar to other reports in the literature. Perceptual experiments that asked listeners to estimate TLD showed that TLD estimation is most accurate with real speech; however, large standard deviations in the responses suggest that reliable judgments can only be made for gross changes in TLD.


computer vision and pattern recognition | 2007

Sensor and Data Systems, Audio-Assisted Cameras and Acoustic Doppler Sensors

Kaustubh Kalgaonkar; Paris Smaragdis; Bhiksha Raj

This paper presents a novel data-driven technique for performing acoustic model adaptation to noisy environments. In the presence of additive noise, the relationship between log mel spectra of speech, noise and noisy speech is nonlinear. Traditional methods linearize this relationship using the mode of the nonlinearity or use some other approximation. The approach presented in this paper models this nonlinear relationship using linear spline regression. In this method, the set of spline parameters that minimizes the error between the predicted and actual noisy speech features is learned from training data, and used at runtime to adapt clean acoustic model parameters to the current noise conditions. Experiments were performed to evaluate the performance of the system on the Aurora 2 task. Results show that the proposed adaptation algorithm (word accuracy 89.22%) outperforms VTS model adaptation (word accuracy 88.38%).


international conference on acoustics, speech, and signal processing | 2009

Sparse probabilistic state mapping and its application to speech bandwidth expansion

Kaustubh Kalgaonkar; Mark A. Clements

It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talkers face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talkers face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

Collaboration


Dive into the Kaustubh Kalgaonkar's collaboration.

Top Co-Authors

Avatar

Bhiksha Raj

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Mark A. Clements

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chris Harrison

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arthur R. Toth

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Tony Ezzat

Mitsubishi Electric Research Laboratories

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Patrick M. Zurek

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge