Amarnag Subramanya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amarnag Subramanya is active.

Explore More

Publication

Featured researches published by Amarnag Subramanya.

empirical methods in natural language processing | 2005

The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments

Jeff A. Bilmes; Xiao Li; Jonathan Malkin; Kelley Kilanski; Richard Wright; Katrin Kirchhoff; Amarnag Subramanya; Susumu Harada; James A. Landay; Patricia Dowden; Howard Jay Chizeck

We present a novel voice-based human-computer interface designed to enable individuals with motor impairments to use vocal parameters for continuous control tasks. Since discrete spoken commands are ill-suited to such tasks, our interface exploits a large set of continuous acoustic-phonetic parameters like pitch, loudness, vowel quality, etc. Their selection is optimized with respect to automatic recognizability, communication bandwidth, learnability, suitability, and ease of use. Parameters are extracted in real time, transformed via adaptation and acceleration, and converted into continuous control signals. This paper describes the basic engine, prototype applications (in particular, voice-based web browsing and a controlled trajectory-following task), and initial user studies confirming the feasibility of this technology.

IEEE Transactions on Speech and Audio Processing | 2005

Microphone array position calibration by basis-point classical multidimensional scaling

Stanley T. Birchfield; Amarnag Subramanya

Classical multidimensional scaling (MDS) is a global, noniterative technique for finding coordinates of points given their interpoint distances. We describe the algorithm and show how it yields a simple, inexpensive method for calibrating an array of microphones with a tape measure (or similar measuring device). We present an extension to the basic algorithm, called basis-point classical MDS (BCMDS), which handles the case when many of the distances are unavailable, thus yielding a technique that is practical for microphone arrays with a large number of microphones. We also show that BCMDS, when combined with a calibration target consisting of four synchronized sound sources, can be used for automatic calibration via time-delay estimation. We evaluate the accuracy of both classical MDS and BCMDS, investigating the sensitivity of the algorithms to noise and to the design parameters to yield insight as to the choice of those parameters. Our results validate the practical applicability of the algorithms, showing that errors on the order of 10-20 mm can be achieved in real scenarios.

international conference on acoustics, speech, and signal processing | 2004

DBN based multi-stream models for audio-visual speech recognition

John N. Gowdy; Amarnag Subramanya; Chris D. Bartels; Jeff A. Bilmes

In this paper, we propose a model based on dynamic Bayesian networks (DBN) to integrate information from multiple audio and visual streams. We also compare the DBN based system (implemented using the Graphical Model Toolkit (GMTK)) with a classical HMM (implemented in the Hidden Markov Model Toolkit (HTK)) for both the single and two stream integration problems. We also propose a new model (mixed integration) to integrate information from three or more streams derived from different modalities and compare the new models performance with that of a synchronous integration scheme. A new technique to estimate stream confidence measures for the integration of three or more streams is also developed and implemented. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicate an absolute improvement of about 4% in word accuracy in the -4 to 10db average case when making use of two audio and one video streams for the mixed integration models over the sychronous models.

empirical methods in natural language processing | 2008

Soft-Supervised Learning for Text Classification

Amarnag Subramanya; Jeff A. Bilmes

We propose a new graph-based semi-supervised learning (SSL) algorithm and demonstrate its application to document categorization. Each document is represented by a vertex within a weighted undirected graph and our proposed framework minimizes the weighted Kullback-Leibler divergence between distributions that encode the class membership probabilities of each vertex. The proposed objective is convex with guaranteed convergence using an alternating minimization procedure. Further, it generalizes in a straightforward manner to multi-class problems. We present results on two standard tasks, namely Reuters-21578 and WebKB, showing that the proposed algorithm significantly outperforms the state-of-the-art.

international symposium on experimental robotics | 2008

Rao-Blackwellized Particle Filters for Recognizing Activities and Spatial Context from Wearable Sensors

Alvin Raj; Amarnag Subramanya; Dieter Fox; Jeff A. Bilmes

Recent advances in wearable sensing and computing devices and in fast probabilistic inference techniques make possible the fine-grained estimation of a person’s activities over extended periods of time [6]. Such technologies enable applications ranging from context aware computing to support for cognitively impaired people to monitoring of activities of daily living.

Speech Communication | 2008

Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling

Amarnag Subramanya; Zhengyou Zhang; Zicheng Liu; Alex Acero

In this paper, we tackle the problem of speech enhancement from two fronts: speech modeling and multisensory input. We present a new speech model based on statistics of magnitude-normalized complex spectra of speech signals. By performing magnitude normalization, we are able to get rid of huge intra- and inter-speaker variation in speech energy and to build a better speech model with a smaller number of Gaussian components. To deal with real-world problems with multiple noise sources, we propose to use multiple heterogeneous sensors, and in particular, we have developed microphone headsets that combine a conventional air microphone and a bone sensor. The bone sensor makes direct contact with the speakers temple (area behind the ear), and captures the vibrations of the bones and skin during the process of vocalization. The signals captured by the bone microphone, though distorted, contain useful audio information, especially in the low frequency range, and more importantly, they are very robust to external noise sources (stationary or not). By fusing the bone channel signals with the air microphone signals, much improved speech signals have been obtained.

IEEE Signal Processing Letters | 2007

Automatic Removal of Typed Keystrokes From Speech Signals

Amarnag Subramanya; Michael L. Seltzer; Alejandro Acero

Computers are increasingly being used to capture audio in various applications such as video conferencing and meeting recording. In many of these applications, user may be simultaneously typing on the keyboard, e.g., to take notes or search for information. As a result, the captured speech signals are significantly corrupted by sounds generated by the users keystrokes. In this paper we propose an algorithm to automatically detect and remove keystrokes from speech signals. The proposed method does not require any user training or enrollment and is computationally efficient. The keystroke removal algorithm generates significantly enhanced speech as measured by both user listening tests and speech recognition experiments

multimedia signal processing | 2006

Hierarchical Models for Activity Recognition

Amarnag Subramanya; Alvin Raj; Jeff A. Bilmes; Dieter Fox

In this paper we propose a hierarchical dynamic Bayesian network to jointly recognize the activity and environment of a person. The hierarchical nature of the model allows us to implicitly learn data driven decompositions of complex activities into simpler sub-activities. We show by means of our experiments that the hierarchical nature of the model is able to better explain the observed data thus leading to better performance. We also show that joint estimation of both activity and environment of a person outperforms systems in which they are estimated alone. The proposed model yields about 10% absolute improvement in accuracy over existing systems

international conference on acoustics, speech, and signal processing | 2007

A Generative-Discriminative Framework using Ensemble Methods for Text-Dependent Speaker Verification

Amarnag Subramanya; Zhengyou Zhang; Arun C. Surendran; Patrick Nguyen; Mukund Narasimhan; Alex Acero

Speaker verification can be treated as a statistical hypothesis testing problem. The most commonly used approach is the likelihood ratio test (LRT), which can be shown to be optimal using the Neymann-Pearson lemma. However, in most practical situations the Neymann-Pearson lemma does not apply. In this paper, we present a more robust approach that makes use of a hybrid generative-discriminative framework for text-dependent speaker verification. Our algorithm makes use of a generative models to learn the characteristics of a speaker and then discriminative models to discriminate between a speaker and an impostor. One of the advantages of the proposed algorithm is that it does not require us to retrain the generative model. The proposed model, on an average, yields 36.41% relative improvement in EER over a LRT.

ieee automatic speech recognition and understanding workshop | 2007

Uncertainty in training large vocabulary speech recognizers

Amarnag Subramanya; Chris D. Bartels; Jeff A. Bilmes; Patrick Nguyen

We propose a technique for annotating data used to train a speech recognizer. The proposed scheme is based on labeling only a single frame for every word in the training set. We make use of the virtual evidence (VE) framework within a graphical model to take advantage of such data. We apply this approach to a large vocabulary speech recognition task, and show that our VE-based training scheme can improve over the performance of a system trained using sequence labeled data by 2.8% and 2.1% on the dev01 and eva101 sets respectively. Annotating data in the proposed scheme is not significantly slower than sequence labeling. We present timing results showing that training using the proposed approach is about 10 times faster than training using sequence labeled data while using only about 75% of the memory.

Explore More