Ayan Acharya
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ayan Acharya.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2011
Joydeep Ghosh; Ayan Acharya
Cluster ensembles combine multiple clusterings of a set of objects into a single consolidated clustering, often referred to as the consensus solution. Consensus clustering can be used to generate more robust and stable clustering results compared to a single clustering approach, perform distributed computing under privacy or sharing constraints, or reuse existing knowledge. This paper describes a variety of algorithms that have been proposed to address the cluster ensemble problem, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches.
Integrated Computer-aided Engineering | 2015
Luiz F. S. Coletta; Eduardo R. Hruschka; Ayan Acharya; Joydeep Ghosh
We investigate how to make a simpler version of an existing algorithm, named C 3 E, from Consensus between Classi- fication and Clustering Ensembles, more user-friendly by automatically tuning its main parameters with the use of metaheuris- tics. In particular, C 3 E based on a Squared Loss function, C 3 E-SL, assumes an optimization procedure that takes as input class membership estimates from existing classifiers, as well as a similarity matrix from a cluster ensemble operating solely on the new target data, to provide a consolidated classification of the target data. To do so, two parameters have to be defined a pri- ori, namely: the relative importance of classifier and cluster ensembles and the number of iterations of the algorithm. In some practical applications, these parameters can be optimized via time consuming grid search approaches based on cross-validation procedures. This paper shows that seven metaheuristics for parameter optimization yield classifiers as accurate as those obtained from grid search, but taking half the running time. More precisely, and by assuming a trade-off between user-friendliness and accuracy, experiments performed on twenty real-world datasets suggest that CMA-ES, DE, and SaDE are the best alternatives to optimize the C 3 E-SL parameters.
international conference on multiple classifier systems | 2011
Ayan Acharya; Eduardo R. Hruschka; Joydeep Ghosh; Sreangsu Acharyya
The combination of multiple classifiers to generate a single classifier has been shown to be very useful in practice. Similarly, several efforts have shown that cluster ensembles can improve the quality of results as compared to a single clustering solution. These observations suggest that ensembles containing both classifiers and clusterers are potentially useful as well. Specifically, clusterers provide supplementary constraints that can improve the generalization capability of the resulting classifier. This paper introduces a new algorithm named C3E that combines ensembles of classifiers and clusterers. Our experimental evaluation of C3E shows that it provides good classification accuracies in eleven tasks derived from three real-world applications. In addition, C3E produces better results than the recently introduced Bipartite Graph-based Consensus Maximization (BGCM) Algorithm, which combines multiple supervised and unsupervised models and is the algorithm most closely related to C3E.
ACM Transactions on Knowledge Discovery From Data | 2014
Ayan Acharya; Eduardo R. Hruschka; Joydeep Ghosh; Sreangsu Acharyya
Unsupervised models can provide supplementary soft constraints to help classify new “target” data because similar instances in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place, as in transfer learning settings. This article describes a general optimization framework that takes as input class membership estimates from existing classifiers learned on previously encountered “source” (or training) data, as well as a similarity matrix from a cluster ensemble operating solely on the target (or test) data to be classified, and yields a consensus labeling of the target data. More precisely, the application settings considered are nontransductive semisupervised and transfer learning scenarios where the training data are used only to build an ensemble of classifiers and are subsequently discarded before classifying the target data. The framework admits a wide range of loss functions and classification/clustering methods. It exploits properties of Bregman divergences in conjunction with Legendre duality to yield a principled and scalable approach. A variety of experiments show that the proposed framework can yield results substantially superior to those provided by naïvely applying classifiers learned on the original task to the target data. In addition, we show that the proposed approach, even not being conceptually transductive, can provide better results compared to some popular transductive learning techniques.
european conference on machine learning | 2013
Suriya Gunasekar; Ayan Acharya; Neeraj Gaur; Joydeep Ghosh
The task of matrix completion involves estimating the entries of a matrix, M ∈ ℝm×n, when a subset, Ω ⊂{(i,j):1≤i≤m,1≤j≤n} of the entries are observed. A particular set of low rank models for this task approximate the matrix as a product of two low rank matrices, M = UVT, where U ∈ ℝm×k and V ∈ ℝn×k and k ≪ min {m,n}. A popular algorithm of choice in practice for recovering M from the partially observed matrix using the low rank assumption is alternating least square (ALS) minimization, which involves optimizing over U and V in an alternating manner to minimize the squared error over observed entries while keeping the other factor fixed. Despite being widely experimented in practice, only recently were theoretical guarantees established bounding the error of the matrix estimated from ALS to that of the original matrix M. In this work we extend the results for a noiseless setting and provide the first guarantees for recovery under noise for alternating minimization. We specifically show that for well conditioned matrices corrupted by random noise of bounded Frobenius norm, if the number of observed entries is O(k7n log n), then the ALS algorithm recovers the original matrix within an error bound that depends on the norm of the noise matrix. The sample complexity is the same as derived in [7] for the noise-free matrix completion using ALS.
International Journal of Bio-inspired Computation | 2015
Luiz F. S. Coletta; Eduardo R. Hruschka; Ayan Acharya; Joydeep Ghosh
Unsupervised models can provide supplementary soft constraints to help classify new data since similar instances are more likely to share the same class label. In this context, this paper reports on a study on how to make an existing algorithm, named C³E from consensus between classification and clustering ensembles, more convenient by automatically tuning its main parameters. The C³E algorithm is based on a general optimisation framework that takes as input class membership estimates from existing classifiers, and a similarity matrix from a cluster ensemble operating solely on the new target data to be classified, in order to yield a consensus labelling of the new data. To do so, two parameters have to be defined a priori by the user: the relative importance of classifier and cluster ensembles, and the number of iterations of the algorithm. We propose a differential evolution DE algorithm, named dynamic DE D²E, which is a computationally efficient alternative for optimising such parameters. D²E provides better results than DE by dynamically updating its control parameters. Moreover, competitive results were achieved when comparing D²E with three state-of-the-art algorithms.
european conference on machine learning | 2015
Ayan Acharya; Dean Teffer; Jette Henderson; Marcus Tyler; Mingyuan Zhou; Joydeep Ghosh
Developing models to discover, analyze, and predict clusters within networked entities is an area of active and diverse research. However, many of the existing approaches do not take into consideration pertinent auxiliary information. This paper introduces Joint Gamma Process Poisson Factorization (J-GPPF) to jointly model network and side-information. J-GPPF naturally fits sparse networks, accommodates separately-clustered side information in a principled way, and effectively addresses the computational challenges of analyzing large networks. Evaluated with hold-out link prediction performance on sparse networks (both synthetic and real-world) with side information, J-GPPF is shown to clearly outperform algorithms that only model the network adjacency matrix.
siam international conference on data mining | 2014
Ayan Acharya; Raymond J. Mooney; Joydeep Ghosh
Multitask learning (MTL) via a shared representation has been adopted to alleviate problems with sparsity of labeled data across different learning tasks. Active learning, on the other hand, reduces the cost of labeling examples by making informative queries over an unlabeled pool of data. Therefore, a unification of both of these approaches can potentially be useful in settings where labeled information is expensive to obtain but the learning tasks or domains have some common characteristics. This paper introduces two such models – Active Doubly Supervised Latent Dirichlet Allocation (Act-DSLDA) and its non-parametric variation (ActNPDSLDA) that integrate MTL and active learning in the same framework. These models make use of both latent and supervised shared topics to accomplish multitask learning. Experimental results on both document and image classification show that integrating MTL and active learning along with shared latent and supervised topics is superior to other methods which do not employ all of these components.
european conference on machine learning | 2013
Ayan Acharya; Aditya Rawal; Raymond J. Mooney; Eduardo R. Hruschka
This paper introduces two new frameworks, Doubly Supervised Latent Dirichlet Allocation (DSLDA) and its non-parametric variation (NP-DSLDA), that integrate two different types of supervision: topic labels and category labels. This approach is particularly useful for multitask learning, in which both latent and supervised topics are shared between multiple categories. Experimental results on both document and image classification show that both types of supervision improve the performance of both DSLDA and NP-DSLDA and that sharing both latent and supervised topics allows for better multitask learning.
international conference on connected vehicles and expo | 2012
Ayan Acharya; Jangwon Lee; An Chen
The paper presents a real time car detection and tracking system that can be used in a mobile device equipped with a camera (e.g. handset, tablet etc.). The underlying detector is implemented using a variant of the AdaBoost [5], and the tracker is built on Lucas-Kanade optical flow algorithm. The uniqueness of the work lies in the simplicity and the portability of the framework that can assist drivers in real time without much computation effort.