Sreangsu Acharyya
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sreangsu Acharyya.
international conference on multiple classifier systems | 2011
Ayan Acharya; Eduardo R. Hruschka; Joydeep Ghosh; Sreangsu Acharyya
The combination of multiple classifiers to generate a single classifier has been shown to be very useful in practice. Similarly, several efforts have shown that cluster ensembles can improve the quality of results as compared to a single clustering solution. These observations suggest that ensembles containing both classifiers and clusterers are potentially useful as well. Specifically, clusterers provide supplementary constraints that can improve the generalization capability of the resulting classifier. This paper introduces a new algorithm named C3E that combines ensembles of classifiers and clusterers. Our experimental evaluation of C3E shows that it provides good classification accuracies in eleven tasks derived from three real-world applications. In addition, C3E produces better results than the recently introduced Bipartite Graph-based Consensus Maximization (BGCM) Algorithm, which combines multiple supervised and unsupervised models and is the algorithm most closely related to C3E.
ACM Transactions on Knowledge Discovery From Data | 2014
Ayan Acharya; Eduardo R. Hruschka; Joydeep Ghosh; Sreangsu Acharyya
Unsupervised models can provide supplementary soft constraints to help classify new “target” data because similar instances in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place, as in transfer learning settings. This article describes a general optimization framework that takes as input class membership estimates from existing classifiers learned on previously encountered “source” (or training) data, as well as a similarity matrix from a cluster ensemble operating solely on the target (or test) data to be classified, and yields a consensus labeling of the target data. More precisely, the application settings considered are nontransductive semisupervised and transfer learning scenarios where the training data are used only to build an ensemble of classifiers and are subsequently discarded before classifying the target data. The framework admits a wide range of loss functions and classification/clustering methods. It exploits properties of Bregman divergences in conjunction with Legendre duality to yield a principled and scalable approach. A variety of experiments show that the proposed framework can yield results substantially superior to those provided by naïvely applying classifiers learned on the original task to the target data. In addition, we show that the proposed approach, even not being conceptually transductive, can provide better results compared to some popular transductive learning techniques.
International Journal on Document Analysis and Recognition | 2009
Sreangsu Acharyya; Sumit Negi; L. Venkata Subramaniam; Shourya Roy
Noise in textual data such as those introduced by multilinguality, misspellings, abbreviations, deletions, phonetic spellings, non-standard transliteration, etc. pose considerable problems for text-mining. Such corruptions are very common in instant messenger and short message service data and they adversely affect off-the-shelf text mining methods. Most techniques address this problem by supervised methods by making use of hand labeled corrections. But they require human generated labels and corrections that are very expensive and time consuming to obtain because of multilinguality and complexity of the corruptions. While we do not champion unsupervised methods over supervised when quality of results is the singular concern, we demonstrate that unsupervised methods can provide cost effective results without the need for expensive human intervention that is necessary to generate a parallel labeled corpora. A generative model based unsupervised technique is presented that maps non-standard words to their corresponding conventional frequent form. A hidden Markov model (HMM) over a “subsequencized” representation of words is used, where a word is represented as a bag of weighted subsequences. The approximate maximum likelihood inference algorithm used is such that the training phase involves clustering over vectors and not the customary and expensive dynamic programming (Baum–Welch algorithm) over sequences that is necessary for HMMs. A principled transformation of maximum likelihood based “central clustering” cost function of Baum–Welch into a “pairwise similarity” based clustering is proposed. This transformation makes it possible to apply “subsequence kernel” based methods that model delete and insert corruptions well. The novelty of this approach lies in that the expensive (Baum–Welch) iterations required for HMM, can be avoided through an approximation of the loglikelihood function and by establishing a connection between the loglikelihood and a pairwise distance. Anecdotal evidence of efficacy is provided on public and proprietary data.
conference on information and knowledge management | 2008
Sreangsu Acharyya; Joydeep Ghosh
A parameterized family of non-linear, link analytic ranking functions is proposed that includes Pagerank as a special case and uses the convexity property of those functions to be more resistant to link spam attacks. A contribution of the paper is the construction of such a scheme with provable uniqueness and convergence guarantees. The paper also demonstrates that even in an unlabelled scenario this family can have spam resistance comparable to Trustrank [3] that uses labels of spam or nat-spam on a training set. The proposed method can use labels, if available, to improve its performance to provide state of the art level of link spam protection.
Nature | 1999
Sreangsu Acharyya; Pulak K. Chakraborty; Sajal Lahiri; B. C. Raymahashay; Sujoy K. Guha; Asit Bhowmik; Tarit Roy Chowdhury; Gautam Basu; Badal K. Mandal; Bhajan Kumar Biswas; G. Samanta; Uttam Kumar Chowdhury; Chitta Ranjan Chanda; Dilip Lodh; S. Lal Roy; Khitish Chandra Saha; S Roy; S. Kabir; Quazi Quamruzzaman; Dipankar Chakraborti; J. M. Mcarthur
uncertainty in artificial intelligence | 2012
Sreangsu Acharyya; Oluwasanmi Koyejo; Joydeep Ghosh
international conference on machine learning | 2012
Ayan Acharya; Eduardo R. Hruschka; Joydeep Ghosh; Sreangsu Acharyya
analytics for noisy unstructured text data | 2008
Sreangsu Acharyya; Sumit Negi; L. V. Subramaniam; Shourya Roy
conference on recommender systems | 2013
Oluwasanmi Koyejo; Sreangsu Acharyya; Joydeep Ghosh
siam international conference on data mining | 2013
Sreangsu Acharyya; Arindam Banerjee; Daniel Boley