Bo Long | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bo Long is active.

Explore More

Publication

Featured researches published by Bo Long.

international world wide web conferences | 2011

Like like alike: joint friendship and interest propagation in social networks

Shuang-Hong Yang; Bo Long; Alexander J. Smola; Narayanan Sadagopan; Zhaohui Zheng; Hongyuan Zha

Targeting interest to match a user with services (e.g. news, products, games, advertisements) and predicting friendship to build connections among users are two fundamental tasks for social network systems. In this paper, we show that the information contained in interest networks (i.e. user-service interactions) and friendship networks (i.e. user-user connections) is highly correlated and mutually helpful. We propose a framework that exploits homophily to establish an integrated network linking a user to interested services and connecting different users with common interests, upon which both friendship and interests could be efficiently propagated. The proposed friendship-interest propagation (FIP) framework devises a factor-based random walk model to explain friendship connections, and simultaneously it uses a coupled latent factor model to uncover interest interactions. We discuss the flexibility of the framework in the choices of loss objectives and regularization penalties and benchmark different variants on the Yahoo! Pulse social networking system. Experiments demonstrate that by coupling friendship with interest, FIP achieves much higher performance on both interest targeting and friendship prediction than systems using only one source of information.

international conference on machine learning | 2006

Spectral clustering for multi-type relational data

Bo Long; Zhongfei Zhang; Xiaoyun Wu; Philip S. Yu

Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-commerce and bioinformatics. However, the research on general multi-type relational data clustering is still limited and preliminary. The contribution of the paper is three-fold. First, we propose a general model, the collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Second, under this model, we derive a novel algorithm, the spectral relational clustering, to cluster multi-type interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects. Extensive experiments demonstrate the promise and effectiveness of the proposed algorithm. Third, we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm. This demonstrates the good theoretic generality of the proposed model and algorithm.

knowledge discovery and data mining | 2005

Co-clustering by block value decomposition

Bo Long; Zhongfei Zhang; Philip S. Yu

Dyadic data matrices, such as co-occurrence matrix, rating matrix, and proximity matrix, arise frequently in various important applications. A fundamental problem in dyadic data analysis is to find the hidden block structure of the data matrix. In this paper, we present a new co-clustering framework, block value decomposition(BVD), for dyadic data, which factorizes the dyadic data matrix into three components, the row-coefficient matrix R, the block value matrix B, and the column-coefficient matrix C. Under this framework, we focus on a special yet very popular case -- non-negative dyadic data, and propose a specific novel co-clustering algorithm that iteratively computes the three decomposition matrices based on the multiplicative updating rules. Extensive experimental evaluations also demonstrate the effectiveness and potential of this framework as well as the specific algorithms for co-clustering, and in particular, for discovering the hidden block structure in the dyadic data.

knowledge discovery and data mining | 2006

Unsupervised learning on k-partite graphs

Bo Long; Xiaoyun Wu; Zhongfei Zhang; Philip S. Yu

Various data mining applications involve data objects of multiple types that are related to each other, which can be naturally formulated as a k-partite graph. However, the research on mining the hidden structures from a k-partite graph is still limited and preliminary. In this paper, we propose a general model, the relation summary network, to find the hidden structures (the local cluster structures and the global community structures) from a k-partite graph. The model provides a principal framework for unsupervised learning on k-partite graphs of various structures. Under this model, we derive a novel algorithm to identify the hidden structures of a k-partite graph by constructing a relation summary network to approximate the original k-partite graph under a broad range of distortion measures. Experiments on both synthetic and real datasets demonstrate the promise and effectiveness of the proposed model and algorithm. We also establish the connections between existing clustering approaches and the proposed model to provide a unified view to the clustering approaches.

knowledge discovery and data mining | 2007

A probabilistic framework for relational clustering

Bo Long; Zhongfei Mark Zhang; Philip S. Yu

Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, we propose parametric hard and soft relational clustering algorithms under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unifies a number of stat-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, Bregman k-means, and semi-supervised clustering based on hidden Markov random fields.

international acm sigir conference on research and development in information retrieval | 2010

Active learning for ranking through expected loss optimization

Bo Long; Olivier Chapelle; Ya Zhang; Yi Chang; Zhaohui Zheng; Belle L. Tseng

Learning to rank arises in many data mining applications, ranging from web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time-consuming. This presents a great need for the active learning approaches to select most informative examples for ranking learning; however, in the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning framework, expected loss optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions. Under this framework, we derive a novel algorithm, expected discounted cumulative gain (DCG) loss optimization (ELO-DCG), to select most informative examples. Then, we investigate both query and document level active learning for raking and propose a two-stage ELO-DCG algorithm which incorporate both query and document selection into active learning. Furthermore, we show that it is flexible for the algorithm to deal with the skewed grade distribution problem with the modification of the loss function. Extensive experiments on real-world web search data sets have demonstrated great potential and effectiveness of the proposed framework and algorithms.

international conference on data engineering | 2009

A Latent Topic Model for Complete Entity Resolution

Liangcai Shu; Bo Long; Weiyi Meng

In bibliographies like DBLP and Citeseer, there are three kinds of entity-name problems that need to be solved. First, multiple entities share one name, which is called the name sharing problem. Second, one entity has different names, which is called the name variant problem. Third, multiple entities share multiple names, which is called the name mixing problem. We aim to solve these problems based on one model in this paper. We call this task complete entity resolution. Different from previous work, our work use global information based on data with two types of information, words and author names. We propose a generative latent topic model that involves both author names and words — the LDA-dual model, by extending the LDA (Latent Dirichlet Allocation) model. We also propose a method to obtain model parameters that is global information. Based on obtained model parameters, we propose two algorithms to solve the three problems mentioned above. Experimental results demonstrate the effectiveness and great potential of the proposed model and algorithms.

international conference on data mining | 2008

Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State

Tianbing Xu; Zhongfei Zhang; Philip S. Yu; Bo Long

This paper studies evolutionary clustering, which is a recently hot topic with many important applications, noticeably in social network analysis. In this paper, based on the recent literature on Hierarchical Dirichlet Process (HDP) and Hidden Markov Model (HMM), we have developed a statistical model HDP-HTM that combines HDP with a Hierarchical Transition Matrix (HTM) based on the proposed Infinite Hierarchical Hidden Markov State model (iH2MS) as an effective solution to this problem. The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of this solution against the state-of-the-art literature.

knowledge discovery and data mining | 2011

Localized factor models for multi-context recommendation

Deepak Agarwal; Bee-Chung Chen; Bo Long

Combining correlated information from multiple contexts can significantly improve predictive accuracy in recommender problems. Such information from multiple contexts is often available in the form of several incomplete matrices spanning a set of entities like users, items, features, and so on. Existing methods simultaneously factorize these matrices by sharing a single set of factors for entities across all contexts. We show that such a strategy may introduce significant bias in estimates and propose a new model that ameliorates this issue by positing local, context-specific factors for entities. To avoid over-fitting in contexts with sparse data, the local factors are connected through a shared global model. This sharing of parameters allows information to flow across contexts through multivariate regressions among local factors, instead of enforcing exactly the same factors for an entity, everywhere. Model fitting is done in an EM framework, we show that the E-step can be fitted through a fast multi-resolution Kalman filter algorithm that ensures scalability. Experiments on benchmark and real-world Yahoo! datasets clearly illustrate the usefulness of our approach. Our model significantly improves predictive accuracy, especially in cold-start scenarios.

international conference on data mining | 2008

Dirichlet Process Based Evolutionary Clustering

Tianbing Xu; Zhongfei Zhang; Philip S. Yu; Bo Long

Evolutionary Clustering has emerged as an important research topic in recent literature of data mining, and solutions to this problem have found a wide spectrum of applications, particularly in social network analysis. In this paper, based on the recent literature on Dirichlet processes, we have developed two different and specific models as solutions to this problem: DPChain and HDP-EVO. Both models substantially advance the literature on evolutionary clustering in the sense that not only they both perform better than the existing literature, but more importantly they are capable of automatically learning the cluster numbers and structures during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of these models against the state-of-the-art literature.

Explore More