Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Karthik Visweswariah is active.

Publication


Featured researches published by Karthik Visweswariah.


international conference on acoustics, speech, and signal processing | 2008

Boosted MMI for model and feature-space discriminative training

Daniel Povey; Dimitri Kanevsky; Brian Kingsbury; Bhuvana Ramabhadran; George Saon; Karthik Visweswariah

We present a modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training. The modification consists of boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript, by using the same phone accuracy function that is used in Minimum Phone Error (MPE) training. We combine this with another improvement to our implementation of the Extended Baum-Welch update equations for MMI, namely the canceling of any shared part of the numerator and denominator statistics on each frame (a procedure that is already done in MPE). This change affects the Gaussian-specific learning rate. We also investigate another modification whereby we replace I-smoothing to the ML estimate with I-smoothing to the previous iterations value. Boosted MMI gives better results than MPE in both model and feature-space discriminative training, although not consistently.


conference on information and knowledge management | 2010

PROSPECT: a system for screening candidates for recruitment

Amit Singh; Catherine Rose; Karthik Visweswariah; Vijil Chenthamarakshan; Nandakishore Kambhatla

Companies often receive thousands of resumes for each job posting and employ dedicated screeners to short list qualified applicants. In this paper, we present PROSPECT, a decision support tool to help these screeners shortlist resumes efficiently. Prospect mines resumes to extract salient aspects of candidate profiles like skills, experience in each skill, education details and past experience. Extracted information is presented in the form of facets to aid recruiters in the task of screening. We also employ Information Retrieval techniques to rank all applicants for a given job opening. In our experiments we show that extracted information improves our ranking by 30% there by making screening task simpler and more efficient.


mobile data management | 2009

CAESAR: A Context-Aware, Social Recommender System for Low-End Mobile Devices

Lakshmish Ramaswamy; Deepak P; Ramana V. Polavarapu; Kutila Gunasekera; Dinesh Garg; Karthik Visweswariah; Shivkumar Kalyanaraman

Mobile-enabled social networks applications are becoming increasingly popular. Most of the current social network applications have been designed for high-end mobile devices, and they rely upon features such as GPS, capabilities of the world wide web, and rich media support. However, a significant fraction of mobile user base, especially in the developing world, own low-end devices that are only capable of voice and short text messages (SMS). In this context, a natural question is whether one can design meaningful social network-based applications that can work well with these simple devices, and if so, what the real challenges are. Towards answering these questions, this paper presents a social network-based recommender system that has been explicitly designed to work even with devices that just support phone calls and SMS. Our design of the social network based recommender system incorporates three features that complement each other to derive highly targeted ads. First, we analyze information such as customers address books to estimate the level of social affinity among various users. This social affinity information is used to identify the recommendations to be sent to an individual user. Second, we combine the social affinity information with the spatio-temporal context of users and historical responses of the user to further refine the set of recommendations and to decide when a recommendation would be sent. Third, social affinity computation and spatio-temporal contextual association are continuously tuned through user feedback. We outline the challenges in building such a system, and outline approaches to deal with such challenges.


IEEE Transactions on Speech and Audio Processing | 2005

Subspace constrained Gaussian mixture models for speech recognition

Scott Axelrod; Vaibhava Goel; Ramesh A. Gopinath; Peder A. Olsen; Karthik Visweswariah

A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, maximum likelihood linear transformation, or extended maximum likelihood linear transformation), as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper, we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar-based tasks, as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity.


international conference on acoustics, speech, and signal processing | 2003

Maximum likelihood training of subspaces for inverse covariance modeling

Karthik Visweswariah; Peder A. Olsen; Ramesh A. Gopinath; Scott Axelrod

Speech recognition systems typically use mixtures of diagonal Gaussians to model the acoustics. Using Gaussians with a more general covariance structure can give improved performance; EM-LLT and SPAM models give improvements by restricting the inverse covariance to a linear/affine subspace spanned by rank one and full rank matrices respectively. We consider training these subspaces to maximize likelihood. For EMLLT ML training the subspace results in significant gains over the scheme proposed by Olsen and Gopinath (see Proceedings of ICASSP, 2002). For SPAM ML training of the subspace slightly improves performance over the method reported by Axelrod, Gopinath and Olsen (see Proceedings of ICSLP, 2002). For the same subspace size an EMLLT model is more efficient computationally than a SPAM model, while the SPAM model is more accurate. This paper proposes a hybrid method of structuring the inverse covariances that both has good accuracy and is computationally efficient.


Multimodal Technologies for Perception of Humans | 2008

The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings

Jing Huang; Etienne Marcheret; Karthik Visweswariah; Gerasimos Potamianos

We present the IBM systems for the Rich Transcription 2007 (RT07) speaker diarization evaluation task on lecture meeting data. We first overview our baseline system that was developed last year, as part of our speech-to-text system for the RT06s evaluation. We then present a number of simple schemes considered this year in our effort to improve speaker diarization performance, namely: (i) A better speech activity detection (SAD) system, a necessary pre-processing step to speaker diarization; (ii) Use of word information from a speaker-independent speech recognizer; (iii) Modifications to speaker cluster merging criteria and the underlying segment model; and (iv) Use of speaker models based on Gaussian mixture models, and their iterative refinement by frame-level re-labeling and smoothing of decision likelihoods. We report development experiments on the RT06s evaluation test set that demonstrate that these methods are effective, resulting in dramatic performance improvements over our baseline diarization system. For example, changes in the cluster segment models and cluster merging methodology result in a 24.2% relative reduction in speaker error rate, whereas use of the iterative model refinement process and word-level alignment produce a 36.0% and 9.2% speaker error relative reduction, respectively. The importance of the SAD subsystem is also shown, with SAD error reduction from 12.3% to 4.3% translating to a 20.3% relative reduction in speaker error rate. Unfortunately however, the developed diarization system heavily depends on appropriately tuning thresholds in the speaker cluster merging process. Possibly as a result of over-tuning such thresholds, performance on the RT07 evaluation test set degrades significantly compared to the one observed on development data. Nevertheless, our experiments show that the introduced techniques of cluster merging, speaker model refinement and alignment remain valuable in the RT07 evaluation.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Gaussian mixture models with covariances or precisions in shared multiple subspaces

Satya Dharanipragada; Karthik Visweswariah

We introduce a class of Gaussian mixture models (GMMs) in which the covariances or the precisions (inverse covariances) are restricted to lie in subspaces spanned by rank-one symmetric matrices. The rank-one basis are shared between the Gaussians according to a sharing structure. We describe an algorithm for estimating the parameters of the GMM in a maximum likelihood framework given a sharing structure. We employ these models for modeling the observations in the hidden-states of a hidden Markov model based speech recognition system. We show that this class of models provide improvement in accuracy and computational efficiency over well-known covariance modeling techniques such as classical factor analysis, shared factor analysis and maximum likelihood linear transformation based models which are special instances of this class of models. We also investigate different sharing mechanisms. We show that for the same number of parameters, modeling precisions leads to better performance when compared to modeling covariances. Modeling precisions also gives a distinct advantage in computational and memory requirements


international symposium on information theory | 2000

Output distribution of the Burrows-Wheeler transform

Karthik Visweswariah; Sanjeev R. Kulkarni; Sergio Verdú

The Burrows-Wheeler transform is a block-sorting algorithm which has been shown empirically to be useful in compressing text data. In this paper we study the output distribution of the transform for i.i.d. sources, tree sources and stationary ergodic sources. We can also give analytic bounds on the performance of some universal compression schemes which use the Burrows-Wheeler transform.


international conference on acoustics, speech, and signal processing | 2003

Dimensional reduction, covariance modeling, and computational complexity in ASR systems

Scott Axelrod; Ramesh A. Gopinath; Peder A. Olsen; Karthik Visweswariah

We study acoustic modeling for speech recognition using mixtures of exponential models with linear and quadratic features tied across all context dependent states. These models are one version of the SPAM models introduced by Axelrod, Gopinath and Olsen (see Proc. ICSLP, 2002). They generalize diagonal covariance, MLLT, EMLLT, and full covariance models. Reduction of the dimension of the acoustic vectors using LDA/HDA projections corresponds to a special case of reducing the exponential model feature space. We see, in one speech recognition task, that SPAM models on an LDA projected space of varying dimensions achieve a significant fraction of the WER improvement in going from MLLT to full covariance modeling, while maintaining the low computational cost of the MLLT models. Further, the feature precomputation cost can be minimized using the hybrid feature technique of Visweswariah, Olsen, Gopinath and Axelrod (see ICASSP 2003); and the number of Gaussians one needs to compute can be greatly reducing using hierarchical clustering of the Gaussians (with fixed feature space). Finally, we show that reducing the quadratic and linear feature spaces separately produces models with better accuracy, but comparable computational complexity, to LDA/HDA based models.


IEEE Transactions on Information Theory | 2000

Separation of random number generation and resolvability

Karthik Visweswariah; Sanjeev R. Kulkarni; Sergio Verdú

We consider the problem of determining when a given source can be used to approximate the output due to any input to a given channel. We provide achievability and converse results for a general source and channel. For the special case of a full-rank discrete memoryless channel we give a stronger converse result than we can give for a general channel.

Researchain Logo
Decentralizing Knowledge