Is this you? Create Your Porfile

Xiangnan Kong

Worcester Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiangnan Kong is active.

Explore More

Publication

Featured researches published by Xiangnan Kong.

conference on information and knowledge management | 2013

Inferring anchor links across multiple heterogeneous social networks

Xiangnan Kong; Jiawei Zhang; Philip S. Yu

Online social networks can often be represented as heterogeneous information networks containing abundant information about: who, where, when and what. Nowadays, people are usually involved in multiple social networks simultaneously. The multiple accounts of the same user in different networks are mostly isolated from each other without any connection between them. Discovering the correspondence of these accounts across multiple social networks is a crucial prerequisite for many interesting inter-network applications, such as link recommendation and community analysis using information from multiple networks. In this paper, we study the problem of anchor link prediction across multiple heterogeneous social networks, i.e., discovering the correspondence among different accounts of the same user. Unlike most prior work on link prediction and network alignment, we assume that the anchor links are one-to-one relationships (i.e., no two edges share a common endpoint) between the accounts in two social networks, and a small number of anchor links are known beforehand. We propose to extract heterogeneous features from multiple heterogeneous networks for anchor link prediction, including users social, spatial, temporal and text information. Then we formulate the inference problem for anchor links as a stable matching problem between the two sets of user accounts in two different networks. An effective solution, MNA (Multi-Network Anchoring), is derived to infer anchor links w.r.t. the one-to-one constraint. Extensive experiments on two real-world heterogeneous social networks show that our MNA model consistently outperform other commonly-used baselines on anchor link prediction.

IEEE Transactions on Knowledge and Data Engineering | 2014

HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks

Chuan Shi; Xiangnan Kong; Yue Huang; Philip S. Yu; Bin Wu

Similarity search is an important function in many applications, which usually focuses on measuring the similarity between objects with the same type. However, in many scenarios, we need to measure the relatedness between objects with different types. With the surge of study on heterogeneous networks, the relevance measure on objects with different types becomes increasingly important. In this paper, we study the relevance search problem in heterogeneous networks, where the task is to measure the relatedness of heterogeneous objects (including objects with the same type or different types). A novel measure HeteSim is proposed, which has the following attributes: (1) a uniform measure: it can measure the relatedness of objects with the same or different types in a uniform framework; (2) a path-constrained measure: the relatedness of object pairs are defined based on the search path that connects two objects through following a sequence of node types; (3) a semi-metric measure: HeteSim has some good properties (e.g., selfmaximum and symmetric), which are crucial to many data mining tasks. Moreover, we analyze the computation characteristics of HeteSim and propose the corresponding quick computation strategies. Empirical studies show that HeteSim can effectively and efficiently evaluate the relatedness of heterogeneous objects.

knowledge discovery and data mining | 2010

Semi-supervised feature selection for graph classification

Xiangnan Kong; Philip S. Yu

The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while there are often copious amounts of unlabeled graph data available. In this paper, we study the problem of semi-supervised feature selection for graph classification and propose a novel solution, called gSSC, to efficiently search for optimal subgraph features with labeled and unlabeled graphs. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform semi-supervised feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive a feature evaluation criterion, named gSemi, to estimate the usefulness of subgraph features based upon both labeled and unlabeled graphs. Then we propose a branch-and-bound algorithm to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space. Empirical studies on several real-world tasks demonstrate that our semi-supervised feature selection approach can effectively boost graph classification performances with semi-supervised feature selection and is very efficient by pruning the subgraph search space using both labeled and unlabeled graphs.

international world wide web conferences | 2012

Community detection in incomplete information networks

Wangqun Lin; Xiangnan Kong; Philip S. Yu; Quanyuan Wu; Yan Jia; Chuan Li

With the recent advances in information networks, the problem of community detection has attracted much attention in the last decade. While network community detection has been ubiquitous, the task of collecting complete network data remains challenging in many real-world applications. Usually the collected network is incomplete with most of the edges missing. Commonly, in such networks, all nodes with attributes are available while only the edges within a few local regions of the network can be observed. In this paper, we study the problem of detecting communities in incomplete information networks with missing edges. We first learn a distance metric to reproduce the link-based distance between nodes from the observed edges in the local information regions. We then use the learned distance metric to estimate the distance between any pair of nodes in the network. A hierarchical clustering approach is proposed to detect communities within the incomplete information networks. Empirical studies on real-world information networks demonstrate that our proposed method can effectively detect community structures within incomplete information networks.

web search and data mining | 2014

Inferring the impacts of social media on crowdfunding

Chun Ta Lu; Sihong Xie; Xiangnan Kong; Philip S. Yu

Crowdfunding -- in which people can raise funds through collaborative contributions of general public (i.e., crowd) -- has emerged as a billion dollars business for supporting more than one million ventures. However, very few research works have examined the process of crowdfunding. In particular, none has studied how social networks help crowdfunding projects to succeed. To gain insights into the effects of social networks in crowdfunding, we analyze the hidden connections between the fundraising results of projects on crowdfunding websites and the corresponding promotion campaigns in social media. Our analysis considers the dynamics of crowdfunding from two aspects: how fundraising activities and promotional activities on social media simultaneously evolve over time, and how the promotion campaigns influence the final outcomes. From our investigation, we identify a number of important principles that provide a useful guide for devising effective campaigns. For example, we observe temporal distribution of customer interest, strong correlations between a crowdfunding projects early promotional activities and the final outcomes, and the importance of concurrent promotion from multiple sources. We then show that these discoveries can help predict several important quantities, including overall popularity and the success rate of the project. Finally, we show how to use these discoveries to help design crowdfunding sites.

conference on information and knowledge management | 2012

Meta path-based collective classification in heterogeneous information networks

Xiangnan Kong; Philip S. Yu; Ying Ding; David J. Wild

Collective classification approaches exploit the dependencies of a group of linked objects whose class labels are correlated and need to be predicted simultaneously. In this paper, we focus on studying the collective classification problem in heterogeneous networks, which involves multiple types of data objects interconnected by multiple types of links. Intuitively, two objects are correlated if they are linked by many paths in the network. By considering different linkage paths in the network, one can capture the subtlety of different types of dependencies among objects. We introduce the concept of meta-path based dependencies among objects, where a meta path is a path consisting a certain sequence of linke types. We show that the quality of collective classification results strongly depends upon the meta paths used. To accommodate the large network size, a novel solution, called HCC (meta-path based Heterogenous Collective Classification), is developed to effectively assign labels to a group of instances that are interconnected through different meta-paths. The proposed HCC model can capture different types of dependencies among objects with respect to different meta paths. Empirical studies on real-world networks demonstrate that effectiveness of the proposed meta path-based collective classification approach.

Knowledge and Information Systems | 2012

gMLC: a multi-label feature selection framework for graph classification

Xiangnan Kong; Philip S. Yu

Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space using multiple labels.

knowledge discovery and data mining | 2011

Dual active feature and sample selection for graph classification

Xiangnan Kong; Wei Fan; Philip S. Yu

Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labeling graph data is quite expensive and time consuming for many real-world applications. In order to reduce the labeling cost for graph data, we address the problem of how to select the most important graph to query for the label. This problem is challenging and different from conventional active learning problems because there is no predefined feature vector. Moreover, the subgraph enumeration problem is NP-hard. The active sample selection problem and the feature selection problem are correlated for graph data. Before we can solve the active sample selection problem, we need to find a set of optimal subgraph features. To address this challenge, we demonstrate how one can simultaneously estimate the usefulness of a query graph and a set of subgraph features. The idea is to maximize the dependency between subgraph features and graph labels using an active learning framework. We propose a branch-and-bound algorithm to search for the optimal query graph and optimal features simultaneously. Empirical studies on nine real-world tasks demonstrate that the proposed method can obtain better accuracy on graph data than alternative approaches.

siam international conference on data mining | 2014

DuSK: A dual structure-preserving kernel for supervised tensor learning with applications to neuroimages

Lifang He; Xiangnan Kong; Philip S. Yu; Ann B. Ragin; Zhifeng Hao; Xiaowei Yang

With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimers disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.

international conference on data mining | 2014

Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks

Bokai Cao; Xiangnan Kong; Philip S. Yu

Link prediction has become an important and active research topic in recent years, which is prevalent in many real-world applications. Current research on link prediction focuses on predicting one single type of links, such as friendship links in social networks, or predicting multiple types of links independently. However, many real-world networks involve more than one type of links, and different types of links are not independent, but related with complex dependencies among them. In such networks, the prediction tasks for different types of links are also correlated and the links of different types should be predicted collectively. In this paper, we study the problem of collective prediction of multiple types of links in heterogeneous information networks. To address this problem, we introduce the linkage homophily principle and design a relatedness measure, called RM, between different types of objects to compute the existence probability of a link. We also extend conventional proximity measures to heterogeneous links. Furthermore, we propose an iterative framework for heterogeneous collective link prediction, called HCLP, to predict multiple types of links collectively by exploiting diverse and complex linkage information in heterogeneous information networks. Empirical studies on real-world tasks demonstrate that the proposed collective link prediction approach can effectively boost link prediction performances in heterogeneous information networks.

Explore More