Shyam Sundar Rajaram | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shyam Sundar Rajaram is active.

Explore More

Publication

Featured researches published by Shyam Sundar Rajaram.

knowledge discovery and data mining | 2008

Locality sensitive hash functions based on concomitant rank order statistics

Kave Eshghi; Shyam Sundar Rajaram

Locality Sensitive Hash functions are invaluable tools for approximate near neighbor problems in high dimensional spaces. In this work, we are focused on LSH schemes where the similarity metric is the cosine measure. The contribution of this work is a new class of locality sensitive hash functions for the cosine similarity measure based on the theory of concomitants, which arises in order statistics. Consider n i.i.d sample pairs, {(X1; Y1); (X2; Y2); : : : ;(Xn; Yn)} obtained from a bivariate distribution f(X, Y). Concomitant theory captures the relation between the order statistics of X and Y in the form of a rank distribution given by Prob(Rank(Yi)=j-Rank(Xi)=k). We exploit properties of the rank distribution towards developing a locality sensitive hash family that has excellent collision rate properties for the cosine measure. The computational cost of the basic algorithm is high for high hash lengths. We introduce several approximations based on the properties of concomitant order statistics and discrete transforms that perform almost as well, with significantly reduced computational cost. We demonstrate the practical applicability of our algorithms by using it for finding similar images in an image repository.

knowledge discovery and data mining | 2009

Feature shaping for linear SVM classifiers

George Forman; Martin B. Scholz; Shyam Sundar Rajaram

Linear classifiers have been shown to be effective for many discrimination tasks. Irrespective of the learning algorithm itself, the final classifier has a weight to multiply by each feature. This suggests that ideally each input feature should be linearly correlated with the target variable (or anti-correlated), whereas raw features may be highly non-linear. In this paper, we attempt to re-shape each input feature so that it is appropriate to use with a linear weight and to scale the different features in proportion to their predictive value. We demonstrate that this pre-processing is beneficial for linear SVM classifiers on a large benchmark of text classification tasks as well as UCI datasets.

knowledge discovery and data mining | 2008

Scaling up text classification for large file systems

George Forman; Shyam Sundar Rajaram

We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifier that can scale to very large document corpora. We investigate the effect of different methods of formulating the query from the training set, as well as varying the query size. In empirical tests on the Reuters RCV1 corpus of 806,000 documents, we find runtime was easily reduced by a factor of 27x, with a somewhat surprising gain in F-measure compared with traditional text classification.

european conference on machine learning | 2008

Client-Friendly Classification over Random Hyperplane Hashes

Shyam Sundar Rajaram; Martin B. Scholz

In this work, we introduce a powerful and general feature representation based on a locality sensitive hash scheme called random hyperplane hashing. We are addressing the problem of centrally learning (linear) classification models from data that is distributed on a number of clients, and subsequently deploying these models on the same clients. Our main goal is to balance the accuracy of individual classifiers and different kinds of costs related to their deployment, including communication costs and computational complexity. We hence systematically study how well schemes for sparse high-dimensional data adapt to the much denser representations gained by random hyperplane hashing, how much data has to be transmitted to preserve enough of the semantics of each document, and how the representations affect the overall computational complexity. This paper provides theoretical results in the form of error bounds and margin based bounds to analyze the performance of classifiers learnt over the hash-based representation. We also present empirical evidence to illustrate the attractive properties of random hyperplane hashing over the conventional baseline representation of bag of words with and without feature selection.

computer vision and pattern recognition | 2007

Diverse Active Ranking for Multimedia Search

Shyam Sundar Rajaram; Charlie K. Dagli; Nemanja Petrovic; Thomas S. Huang

Interactively learning from a small sample of unlabeled examples is an enormously challenging task, one that often arises in vision applications. Relevance feedback and more recently active learning are two standard techniques that have received much attention towards solving this interactive learning problem. How to best utilize the users effort for labeling, however, remains unanswered. It has been shown in the past that labeling a diverse set of points is helpful, however, the notion of diversity has either been dependent on the learner used, or computationally expensive. In this paper, we intend to address these issues in the bipartite ranking setting. First, we introduce a scheme for picking the query set which will be labeled by an oracle so that it will aid us in learning the ranker in as few active learning rounds as possible. Secondly, we propose a fundamentally motivated, information theoretic view of diversity and its use in a fast, non-degenerate active learning-based relevance feedback setting. Finally, we report comparative testing and results in a real-time image retrieval setting.

Archive | 2009