Weixiang Shao
University of Illinois at Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Weixiang Shao.
european conference on machine learning | 2015
Weixiang Shao; Lifang He; Philip S. Yu
With the advance of technology, data are often with multiple modalities or coming from multiple sources. Multi-view clustering provides a natural way for generating clusters from such data. Although multi-view clustering has been successfully applied in many applications, most of the previous methods assumed the completeness of each view (i.e., each instance appears in all views). However, in real-world applications, it is often the case that a number of views are available for learning but none of them is complete. The incompleteness of all the views and the number of available views make it difficult to integrate all the incomplete views and get a better clustering solution. In this paper, we propose MIC (Multi-Incomplete-view Clustering), an algorithm based on weighted nonnegative matrix factorization with L2,1 regularization. The proposed MIC works by learning the latent feature matrices for all the views and generating a consensus matrix so that the difference between each view and the consensus is minimized. MIC has several advantages comparing with other existing methods. First, MIC incorporates weighted nonnegative matrix factorization, which handles the missing instances in each incomplete view. Second, MIC uses a co-regularized approach, which pushes the learned latent feature matrices of all the views towards a common consensus. By regularizing the disagreement between the latent feature matrices and the consensus, MIC can be easily extended to more than two incomplete views. Third, MIC incorporates L2,1 regularization into the weighted nonnegative matrix factorization, which makes it robust to noises and outliers. Forth, an iterative optimization framework is used in MIC, which is scalable and proved to converge. Experiments on real datasets demonstrate the advantages of MIC.
information reuse and integration | 2015
Jiawei Zhang; Weixiang Shao; Senzhang Wang; Xiangnan Kong; Philip S. Yu
To enjoy more social network services, users nowadays are usually involved in multiple online social networks simultaneously. The shared users between different networks are called anchor users, while the remaining unshared users are named as non-anchor users. Connections between accounts of anchor users in different networks are defined as anchor links and networks partially aligned by anchor links can be represented as partially aligned networks. In this paper, we want to predict anchor links between partially aligned social networks, which is formally defined as the partial network alignment problem. The partial network alignment problem is very difficult to solve because of the following two challenges: (1) the lack of general features for anchor links, and (2) the “one - to - one≤” (one to at most one) constraint on anchor links. To address these two challenges, a new method PNA (Partial Network Aligner) is proposed in this paper. PNA (1) extracts various adjacency scores among users across networks based on a set of internetwork anchor meta paths, and (2) utilizes the generic stable matching to identify the non-anchor users to prune the redundant anchor links attached to them. Extensive experiments conducted on two real-world partially aligned social networks demonstrate that PNA can solve the partial network alignment problem very well and outperform all the other comparison methods with significant advantages.
pacific-asia conference on knowledge discovery and data mining | 2015
Weixiang Shao; Lifang He; Philip S. Yu
With advances in data collection technologies, multiple data sources are assuming increasing prominence in many applications. Clustering from multiple data sources has emerged as a topic of critical significance in the data mining and machine learning community. Different data sources provide different levels of necessarily detailed knowledge. Thus, combining multiple data sources is pivotal to facilitate the clustering process. However, in reality, the data usually exhibits heterogeneity and incompleteness. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of missing data. Conventional methods mainly focus on clustering heterogeneous data with full information in all sources or at least one source without missing values. In this paper, we propose a more general framework T-MIC (Tensor based Multi-source Incomplete data Clustering) to integrate multiple incomplete data sources. Specifically, we first use the kernel matrices to form an initial tensor across all the multiple sources. Then we formulate a joint tensor factorization process with the sparsity constraint and use it to iteratively push the initial tensor towards a quality-driven exploration of the latent factors by taking into account missing data uncertainty. Finally, these factors serve as features to clustering. Extensive experiments on both synthetic and real datasets demonstrate that our proposed approach can effectively boost clustering performance, even with large amounts of missing data.
international conference on data mining | 2016
Weixiang Shao; Lifang He; Chun Ta Lu; Xiaokai Wei; Philip S. Yu
In this paper, we propose an Online unsupervised Multi-View Feature Selection method, OMVFS, which deals with large-scale/streaming multi-view data in an online fashion. OMVFS embeds unsupervised feature selection into a clustering algorithm via nonnegative matrix factorization with sparse learning. It further incorporates the graph regularization to preserve the local structure information and help select discriminative features. Instead of storing all the historical data, OMVFS processes the multi-view data chunk by chunk and aggregates all the necessary information into several small matrices. By using the buffering technique, the proposed OMVFS can reduce the computational and storage cost while taking advantage of the structure information. Furthermore, OMVFS can capture the concept drifts in the data streams. Extensive experiments on four real-world datasets show the effectiveness and efficiency of the proposed OMVFS method. More importantly, OMVFS is about 100 times faster than the off-line methods.
web search and data mining | 2017
Chun Ta Lu; Lifang He; Weixiang Shao; Bokai Cao; Philip S. Yu
Many real-world problems, such as web image analysis, document categorization and product recommendation, often exhibit dual-heterogeneity: heterogeneous features obtained in multiple views, and multiple tasks might be related to each other through one or more shared views. To address these Multi-Task Multi-View (MTMV) problems, we propose a tensor-based framework for learning the predictive multilinear structure from the full-order feature interactions within the heterogeneous data. The usage of tensor structure is to strengthen and capture the complex relationships between multiple tasks with multiple views. We further develop efficient multilinear factorization machines (MFMs) that can learn the task-specific feature map and the task-view shared multilinear structures, without physically building the tensor. In the proposed method, a joint factorization is applied to the full-order interactions such that the consensus representation can be learned. In this manner, it can deal with the partially incomplete data without difficulty as the learning procedure does not simply rely on any particular view. Furthermore, the complexity of MFMs is linear in the number of parameters, which makes MFMs suitable to large-scale real-world problems. Extensive experiments on four real-world datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods in a wide variety of MTMV problems.
international conference on big data | 2016
Xiaokai Wei; Bokai Cao; Weixiang Shao; Chun Ta Lu; Philip S. Yu
Community detection has been an important task for social and information networks. Existing approaches usually assume the completeness of linkage and content information. However, the links and node attributes can usually be partially observable in many real-world networks. For example, users can specify their privacy settings to prevent non-friends from viewing their posts or connections. Such incompleteness poses additional challenges to community detection algorithms. In this paper, we aim to detect communities with partially observable link structure and node attributes. To fuse such incomplete information, we learn link-based and attribute-based representations via kernel alignment and a co-regularization approach is proposed to combine the information from both sources (i.e., links and attributes). The link-based and attribute-based representations can lend strength to each other via the partial consensus learning. We present two instantiations of this framework by enforcing hard and soft consensus constraint respectively. Experimental results on real-world datasets show the superiority of the proposed approaches over the baseline methods and its robustness under different observable levels.
Methods | 2015
Weixiang Shao; Clive E Adams; Aaron M. Cohen; John M. Davis; Marian McDonagh; Sujata Thakurta; Philip S. Yu; Neil R. Smalheiser
OBJECTIVE It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. METHODS We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. RESULTS Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. DISCUSSION Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.
conference on information and knowledge management | 2017
Guixiang Ma; Lifang He; Chun Ta Lu; Weixiang Shao; Philip S. Yu; Alex D. Leow; Ann B. Ragin
Multi-view clustering has become a widely studied problem in the area of unsupervised learning. It aims to integrate multiple views by taking advantages of the consensus and complimentary information from multiple views. Most of the existing works in multi-view clustering utilize the vector-based representation for features in each view. However, in many real-world applications, instances are represented by graphs, where those vector-based models cannot fully capture the structure of the graphs from each view. To solve this problem, in this paper we propose a Multi-view Clustering framework on graph instances with Graph Embedding (MCGE). Specifically, we model the multi-view graph data as tensors and apply tensor factorization to learn the multi-view graph embeddings, thereby capturing the local structure of graphs. We build an iterative framework by incorporating multi-view graph embedding into the multi-view clustering task on graph instances, jointly performing multi-view clustering and multi-view graph embedding simultaneously. The multi-view clustering results are used for refining the multi-view graph embedding, and the updated multi-view graph embedding results further improve the multi-view clustering. Extensive experiments on two real brain network datasets (i.e., HIV and Bipolar) demonstrate the superior performance of the proposed MCGE approach in multi-view connectome analysis for clinical investigation and application.
international conference on big data | 2016
Weixiang Shao; Lifang He; Chun Ta Lu; Philip S. Yu
In this paper, we propose an online multi-view clustering algorithm, OMVC, which deals with large-scale incomplete views. We model the multi-view clustering problem as a joint weighted NMF problem and process the multi-view data chunk by chunk to reduce the memory requirement. OMVC learns the latent feature matrices for all the views and pushes them towards a consensus. We further increase the robustness of the learned latent feature matrices in OMVC via lasso regularization. To minimize the influence of incompleteness, dynamic weight setting is introduced to give lower weights to the incoming missing instances in different views. More importantly, to reduce the computational time, we incorporate a faster projected gradient descent by utilizing the Hessian matrices in OMVC. Extensive experiments conducted on four real data demonstrate the effectiveness of OMVC.
international symposium on neural networks | 2016
Weixiang Shao; Jiawei Zhang; Lifang He; Philip S. Yu
With the advance of technology, entities can be observed in multiple views. Multiple views containing different types of features can be used for clustering. Although multi-view clustering has been successfully applied in many applications, the previous methods usually assume the complete instance mapping between different views. In many real-world applications, information can be gathered from multiple sources, while each source can contain multiple views, which are more cohesive for learning. The views under the same source are usually fully mapped, but they can be very heterogeneous. Moreover, the mappings between different sources are usually incomplete and partially observed, which makes it more difficult to integrate all the views across different sources. In this paper, we propose MMC (Multi-source Multi-view Clustering), which is a framework based on collective spectral clustering with a discrepancy penalty across sources, to tackle these challenges. MMC has several advantages compared with other existing methods. First, MMC can deal with incomplete mapping between sources. Second, it considers the disagreements across sources while treating views in the same source as a cohesive set. Third, MMC also tries to infer the instance similarities across sources to enhance the clustering performance. Extensive experiments conducted on real-world data demonstrate the effectiveness of the proposed approach.