Xiaokai Wei
University of Illinois at Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaokai Wei.
web search and data mining | 2017
Linchuan Xu; Xiaokai Wei; Jiannong Cao; Philip S. Yu
Network embedding is increasingly employed to assist network analysis as it is effective to learn latent features that encode linkage information. Various network embedding methods have been proposed, but they are only designed for a single network scenario. In the era of big data, different types of related information can be fused together to form a coupled heterogeneous network, which consists of two different but related sub-networks connected by inter-network edges. In this scenario, the inter-network edges can act as comple- mentary information in the presence of intra-network ones. This complementary information is important because it can make latent features more comprehensive and accurate. And it is more important when the intra-network edges are ab- sent, which can be referred to as the cold-start problem. In this paper, we thus propose a method named embedding of embedding (EOE) for coupled heterogeneous networks. In the EOE, latent features encode not only intra-network edges, but also inter-network ones. To tackle the challenge of heterogeneities of two networks, the EOE incorporates a harmonious embedding matrix to further embed the em- beddings that only encode intra-network edges. Empirical experiments on a variety of real-world datasets demonstrate the EOE outperforms consistently single network embedding methods in applications including visualization, link prediction multi-class classification, and multi-label classification.
international conference on data mining | 2016
Weixiang Shao; Lifang He; Chun Ta Lu; Xiaokai Wei; Philip S. Yu
In this paper, we propose an Online unsupervised Multi-View Feature Selection method, OMVFS, which deals with large-scale/streaming multi-view data in an online fashion. OMVFS embeds unsupervised feature selection into a clustering algorithm via nonnegative matrix factorization with sparse learning. It further incorporates the graph regularization to preserve the local structure information and help select discriminative features. Instead of storing all the historical data, OMVFS processes the multi-view data chunk by chunk and aggregates all the necessary information into several small matrices. By using the buffering technique, the proposed OMVFS can reduce the computational and storage cost while taking advantage of the structure information. Furthermore, OMVFS can capture the concept drifts in the data streams. Extensive experiments on four real-world datasets show the effectiveness and efficiency of the proposed OMVFS method. More importantly, OMVFS is about 100 times faster than the off-line methods.
international world wide web conferences | 2017
Xiaokai Wei; Linchuan Xu; Bokai Cao; Philip S. Yu
Link Prediction has been an important task for social and information networks. Existing approaches usually assume the completeness of network structure. However, in many real-world networks, the links and node attributes can usually be partially observable. In this paper, we study the problem of Cross View Link Prediction (CVLP) on partially observable networks, where the focus is to recommend nodes with only links to nodes with only attributes (or vice versa). We aim to bridge the information gap by learning a robust consensus for link-based and attribute-based representations so that nodes become comparable in the latent space. Also, the link-based and attribute-based representations can lend strength to each other via this consensus learning. Moreover, attribute selection is performed jointly with the representation learning to alleviate the effect of noisy high-dimensional attributes. We present two instantiations of this framework with different loss functions and develop an alternating optimization framework to solve the problem. Experimental results on four real-world datasets show the proposed algorithm outperforms the baseline methods significantly for cross-view link prediction.
international symposium on neural networks | 2017
Xiaokai Wei; Bokai Cao; Philip S. Yu
Multi-view high-dimensional data become increasingly popular in the big data era. Feature selection is a useful technique for alleviating the curse of dimensionality in multi-view learning. In this paper, we study unsupervised feature selection for multi-view data, as class labels are usually expensive to obtain. Traditional feature selection methods are mostly designed for single-view data and cannot fully exploit the rich information from multi-view data. Existing multi-view feature selection methods are usually based on noisy cluster labels which might not preserve sufficient information from multi-view data. To better utilize multi-view information, we propose a method, CDMA-FS, to select features for each view by performing alignment on a cross diffused matrix. We formulate it as a constrained optimization problem and solve it using Quasi-Newton based method. Experiments results on four real-world datasets show that the proposed method is more effective than the state-of-the-art methods in multi-view setting.
international conference on big data | 2016
Xiaokai Wei; Bokai Cao; Weixiang Shao; Chun Ta Lu; Philip S. Yu
Community detection has been an important task for social and information networks. Existing approaches usually assume the completeness of linkage and content information. However, the links and node attributes can usually be partially observable in many real-world networks. For example, users can specify their privacy settings to prevent non-friends from viewing their posts or connections. Such incompleteness poses additional challenges to community detection algorithms. In this paper, we aim to detect communities with partially observable link structure and node attributes. To fuse such incomplete information, we learn link-based and attribute-based representations via kernel alignment and a co-regularization approach is proposed to combine the information from both sources (i.e., links and attributes). The link-based and attribute-based representations can lend strength to each other via the partial consensus learning. We present two instantiations of this framework by enforcing hard and soft consensus constraint respectively. Experimental results on real-world datasets show the superiority of the proposed approaches over the baseline methods and its robustness under different observable levels.
siam international conference on data mining | 2016
Xiaokai Wei; Bokai Cao; Philip S. Yu
In the era of big data, one is often confronted with the problem of high dimensional data for many machine learning or data mining tasks. Feature selection, as a dimension reduction technique, is useful for alleviating the curse of dimensionality while preserving interpretability. In this paper, we focus on unsupervised feature selection, as class labels are usually expensive to obtain. Unsupervised feature selection is typically more challenging than its supervised counterpart due to the lack of guidance from class labels. Recently, regression-based methods with L2,1 norms have gained much popularity as they are able to evaluate features jointly which, however, consider only linear correlations between features and pseudo-labels. In this paper, we propose a novel nonlinear joint unsupervised feature selection method based on kernel alignment. The aim is to find a succinct set of features that best aligns with the original features in the kernel space. It can evaluate features jointly in a nonlinear manner and provides a good ‘0/1’ approximation for the selection indicator vector. We formulate it as a constrained optimization problem and develop a Spectral Projected Gradient (SPG) method to solve the optimization problem. Experimental results on several real-world datasets demonstrate that our proposed method outperforms the state-of-the-art approaches significantly.
international world wide web conferences | 2017
Linchuan Xu; Xiaokai Wei; Jiannong Cao; Philip S. Yu
Network embedding fills the gap of applying tuple-based data mining models to networked datasets through learning latent representations or embeddings. However, it may not be likely to associate latent embeddings with physical meanings just as the name, latent embedding, literally suggests. Hence, models built on embeddings may not be interpretable. In this paper, we thus propose to learn identity embeddings and interest embeddings, where user identity includes demographic and affiliation information, and interest is demonstrated by activities or topics users are interested in. With identity and interest information, we can make data mining models not only more interpretable, but also more accurate, which is demonstrated on three real-world social networks in link prediction and multi-task classification.
international world wide web conferences | 2018
Linchuan Xu; Xiaokai Wei; Jiannong Cao; Philip S. Yu
There are increasing interests in learning low-dimensional and dense node representations from the network structure which is usually high-dimensional and sparse. However, most existing methods fail to consider semantic meanings of links. Different links may have different semantic meanings because the similarities between two nodes can be different, e.g., two nodes share common neighbors and two nodes share similar interests which are demonstrated in node-generated content. In this paper, the former type of links are referred to as structure-close links while the latter type are referred to as content-close links. These two types of links naturally indicate there are two types of characteristics that nodes expose in a social network. Hence, we propose to learn two representations for each node, and render each representation responsible for encoding the corresponding type of node characteristics, which is achieved by jointly embedding the network structure and inferring the type of each link. In the experiments, the proposed method is demonstrated to be more effective than five recent methods on four social networks through applications including visualization, link prediction and multi-label classification.
conference on information and knowledge management | 2017
Xiaokai Wei; Bokai Cao; Philip S. Yu
Compared to supervised feature selection, unsupervised feature selection tends to be more challenging due to the lack of guidance from class labels. Along with the increasing variety of data sources, many datasets are also equipped with certain side information of heterogeneous structure. Such side information can be critical for feature selection when class labels are unavailable. In this paper, we propose a new feature selection method, SideFS, to exploit such rich side information. We model the complex side information as a heterogeneous network and derive instance correlations to guide subsequent feature selection. Representations are learned from the side information network and the feature selection is performed in a unified framework. Experimental results show that the proposed method can effectively enhance the quality of selected features by incorporating heterogeneous side information.
Sensors | 2017
Junxing Zhu; Jiawei Zhang; Quanyuan Wu; Yan Jia; Bin Zhou; Xiaokai Wei; Philip S. Yu
Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the one-to-one constraint on anchor links, if an unlabeled anchor link a=(u,v) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account u or account v will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed Mean-entropy-based Constrained Active Learning (MC) method can outperform other methods with significant advantages.