Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sathiya Keerthi Selvaraj is active.

Publication


Featured researches published by Sathiya Keerthi Selvaraj.


conference on information and knowledge management | 2011

A pairwise ranking based approach to learning with positive and unlabeled examples

Sundararajan Sellamanickam; Priyanka Garg; Sathiya Keerthi Selvaraj

A large fraction of binary classification problems arising in web applications are of the type where the positive class is well defined and compact while the negative class comprises everything else in the distribution for which the classifier is developed; it is hard to represent and sample from such a broad negative class. Classifiers based only on positive and unlabeled examples reduce human annotation effort significantly by removing the burden of choosing a representative set of negative examples. Various methods have been proposed in the literature for building such classifiers. Of these, the state of the art methods are Biased SVM and Elkan & Notos methods. While these methods often work well in practice, they are computationally expensive since hyperparameter tuning is very important, particularly when the size of labeled positive examples set is small and class imbalance is high. In this paper we propose a pairwise ranking based approach to learn from positive and unlabeled examples (LPU) and we give a theoretical justification for it. We present a pairwise RankSVM (RSVM) based method for our approach. The method is simple, efficient, and its hyperparameters are easy to tune. A detailed experimental study using several benchmark datasets shows that the proposed method gives competitive classification performance compared to the mentioned state of the art methods, while training 3-10 times faster. We also propose an efficient AUC based feature selection technique in the LPU setting and demonstrate its usefulness on the datasets. To get an idea of the goodness of the LPU methods we compare them against supervised learning (SL) methods that also make use of negative examples in training. SL methods give a slightly better performance than LPU methods when there is a rich set of negative examples; however, they are inferior when the number of negative training examples is not large enough.


conference on information and knowledge management | 2011

Semi-supervised multi-task learning of structured prediction models for web information extraction

Paramveer S. Dhillon; Sundararajan Sellamanickam; Sathiya Keerthi Selvaraj

Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user queries. In this paper we propose a novel structured prediction method to address two important aspects of the extraction problem: (1) labeled data is available only for a small number of sites and (2) a machine learned global model does not generalize adequately well across many websites. For this purpose, we propose a weight space based graph regularization method. This method has several advantages. First, it can use unlabeled data to address the limited labeled data problem and falls in the class of graph regularization based semi-supervised learning approaches. Second, to address the generalization inadequacy of a global model, this method builds a local model for each website. Viewing the problem of building a local model for each website as a task, we learn the models for a collection of sites jointly; thus our method can also be seen as a graph regularization based multi-task learning approach. Learning the models jointly with the proposed method is very useful in two ways: (1) learning a local model for a website can be effectively influenced by labeled and unlabeled data from other websites; and (2) even for a website with only unlabeled examples it is possible to learn a decent local model. We demonstrate the efficacy of our method on several real-life data; experimental results show that significant performance improvement can be obtained by combining semi-supervised and multi-task learning in a single framework.


conference on information and knowledge management | 2011

Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset

Sathiya Keerthi Selvaraj; Bigyan Bhar; Sundararajan Sellamanickam; Shirish Krishnaj Shevade

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.


Archive | 2006

Large scale semi-supervised linear support vector machines

Vikas Sindhwani; Sathiya Keerthi Selvaraj


Archive | 2009

Apparatus and methods for concept-centric information extraction

Daniel Kifer; Srujana Merugu; Ankur Jain; Sathiya Keerthi Selvaraj; Alok S. Kirpal; Philip Bohannon; Raghu Ramakrishnan


Archive | 2007

System and method for sparse gaussian process regression using predictive measures

Sundararajan Sellamanickam; Sathiya Keerthi Selvaraj


Archive | 2010

LARGE SCALE ENTITY-SPECIFIC RESOURCE CLASSIFICATION

Sathiya Keerthi Selvaraj; Philip Bohannon; Mridul Muralidharan; Cong Yu; Ashwin Machanavajjhala; Arun Shankar Iyer; Sundararajan Sellamanickam


Archive | 2009

EFFICIENT ALGORITHM FOR PAIRWISE PREFERENCE LEARNING

Olivier Chapelle; Sathiya Keerthi Selvaraj


Archive | 2009

Transductive approach to category-specific record attribute extraction

Rahul Gupta; Sathiya Keerthi Selvaraj; Daniel Kifer; Srujana Merugu


Archive | 2010

Extracting rich temporal context for business entities and events

Srujana Merugu; Sathiya Keerthi Selvaraj; Vipul Agarwal; Arup Kumar Choudhury

Collaboration


Dive into the Sathiya Keerthi Selvaraj's collaboration.

Researchain Logo
Decentralizing Knowledge