Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hanhuai Shan is active.

Publication


Featured researches published by Hanhuai Shan.


international conference on data mining | 2008

Bayesian Co-clustering

Hanhuai Shan; Arindam Banerjee

In recent years, co-clustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing co-clustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current applications, such as recommendation systems and market basket analysis, can substantially benefit from a mixed membership of rows and columns. In this paper, we present Bayesian co-clustering (BCC) models, that allow a mixed membership in row and column clusters. BCC maintains separate Dirichlet priors for rows and columns over the mixed membership and assumes each observation to be generated by an exponential family distribution corresponding to its row and column clusters. We propose a fast variational algorithm for inference and parameter estimation. The model is designed to naturally handle sparse matrices as the inference is done only based on the non-missing entries. In addition to finding a co-cluster structure in observations, the model outputs a low dimensional co-embedding, and accurately predicts missing values in the original matrix. We demonstrate the efficacy of the model through experiments on both simulated and real data.


international conference on data mining | 2010

Generalized Probabilistic Matrix Factorizations for Collaborative Filtering

Hanhuai Shan; Arindam Banerjee

Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can one get better predictive performance with different priors? Are there suitable extensions to leverage side information? Are there benefits to taking into account row and column biases? We develop new families of PMF models to address these questions along with efficient approximate inference algorithms for learning and prediction. Through extensive experiments on movie recommendation datasets, we illustrate that simpler models directly capturing correlations among latent factors can outperform existing PMF models, side information can benefit prediction accuracy, and accounting for row/column biases leads to improvements in predictive performance.


Statistical Analysis and Data Mining | 2011

Bayesian cluster ensembles

Hongjun Wang; Hanhuai Shan; Arindam Banerjee

Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, row- or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to a small subset of these variants. In this paper, we propose Bayesian cluster ensemble (BCE), which is a mixed-membership model for learning cluster ensembles, and is applicable to all the primary variants of the problem. We propose a variational approximation based algorithm for learning Bayesian cluster ensembles. BCE is further generalized to deal with the case where the features of original data points are available, referred to as generalized BCE (GBCE). We compare BCE extensively with several other cluster ensemble algorithms, and demonstrate that BCE is not only versatile in terms of its applicability but also outperforms other algorithms in terms of stability and accuracy. Moreover, GBCE can have higher accuracy than BCE, especially with only a small number of available base clusterings.


international conference on data mining | 2007

Latent Dirichlet Conditional Naive-Bayes Models

Arindam Banerjee; Hanhuai Shan

In spite of the popularity of probabilistic mixture models for latent structure discovery from data, mixture models do not have a natural mechanism for handling sparsity, where each data point only has a few non-zero observations. In this paper, we introduce conditional naive-Bayes (CNB) models, which generalize naive-Bayes mixture models to naturally handle sparsity by conditioning the model on observed features. Further, we present latent Dirichlet conditional naive-Bayes (LD-CNB) models, which constitute a family of powerful hierarchical Bayesian models for latent structure discovery from sparse data. The proposed family of models are quite general and can work with arbitrary regular exponential family conditional distributions. We present a variational inference based EM algorithm for learning along with special case analyses for Gaussian and discrete distributions. The efficacy of the proposed models are demonstrated by extensive experiments on a wide variety of different datasets.


international conference on data mining | 2009

Discriminative Mixed-Membership Models

Hanhuai Shan; Arindam Banerjee; Nikunj C. Oza

Although mixed-membership models have achieved great success in unsupervised learning, they have not been widely applied to classification problems. In this paper, we propose a family of discriminative mixed-membership models for classification by combining unsupervised mixed-membership models with multi-class logistic regression. In particular, we propose two variants respectively applicable to text classification based on latent Dirichlet allocation and usual feature vector classification based on mixed-membership naive Bayes models. The proposed models allow the number of components in the mixed membership to be different from the number of classes. We propose two variational inference based algorithms for learning the models, including a fast variational inference which is substantially more efficient than mean-field variational approximation. Through extensive experiments on UCI and text classification benchmark datasets, we show that the models are competitive with the state of the art, and can discover components not explicitly captured by the class labels.


siam international conference on data mining | 2012

Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information.

Tinghui Zhou; Hanhuai Shan; Arindam Banerjee; Guillermo Sapiro


Data Mining and Knowledge Discovery | 2011

Mixed-membership naive Bayes models

Hanhuai Shan; Arindam Banerjee


international conference on machine learning | 2012

Gap Filling in the Plant Kingdom---Trait Prediction Using Hierarchical Probabilistic Matrix Factorization

Hanhuai Shan; Jens Kattge; Peter B. Reich; Arindam Banerjee; Franziska Schrodt; Markus Reichstein


Global Ecology and Biogeography | 2015

BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography

Franziska Schrodt; Jens Kattge; Hanhuai Shan; Farideh Fazayeli; Julia Joswig; Arindam Banerjee; Markus Reichstein; Gerhard Bönisch; Sandra Díaz; John B. Dickie; Andy Gillison; Sandra Lavorel; Paul W. Leadley; Christian Wirth; Ian J. Wright; S. Joseph Wright; Peter B. Reich


siam international conference on data mining | 2010

Residual Bayesian co-clustering for matrix approximation

Hanhuai Shan; Arindam Banerjee

Collaboration


Dive into the Hanhuai Shan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tinghui Zhou

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge