Zhong-Yuan Zhang
Central University of Finance and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhong-Yuan Zhang.
international conference on data mining | 2007
Zhong-Yuan Zhang; Chris H. Q. Ding; Tao Li; Xiang-Sun Zhang
An interesting problem in nonnegative matrix factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to binary matrix factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W, H (thus conserving the most important integer property of the objective matrix X) satisfying X ap WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF.
Data Mining and Knowledge Discovery | 2010
Zhong-Yuan Zhang; Tao Li; Chris H. Q. Ding; Xianwen Ren; Xiang-Sun Zhang
The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.
Physical Review E | 2013
Zhong-Yuan Zhang; Yong Wang; Yong-Yeol Ahn
Discovering overlapping community structures is a crucial step to understanding the structure and dynamics of many networks. In this paper we develop a symmetric binary matrix factorization model to identify overlapping communities. Our model allows us not only to assign community memberships explicitly to nodes, but also to distinguish outliers from overlapping nodes. In addition, we propose a modified partition density to evaluate the quality of community structures. We use this to determine the most appropriate number of communities. We evaluate our methods using both synthetic benchmarks and real-world networks, demonstrating the effectiveness of our approach.
EPL | 2013
Zhong-Yuan Zhang
Constrained clustering has been well-studied in the unsupervised learning society. However, how to encode constraints into community structure detection, within complex networks, remains a challenging problem. In this paper, we propose a semi-supervised learning framework for community structure detection. This framework implicitly encodes the must-link and cannot-link constraints by modifying the adjacency matrix of network, which can also be regarded as de-noising the consensus matrix of community structures. Our proposed method gives consideration to both the topology and the functions (background information) of complex network, which enhances the interpretability of the results. The comparisons performed on both the synthetic benchmarks and the real-world networks show that the proposed framework can significantly improve the community detection performance with few constraints, which makes it an attractive methodology in the analysis of complex networks.
Knowledge and Information Systems | 2013
Zhong-Yuan Zhang; Tao Li; Chris H. Q. Ding
Non-negative matrix factorization (NMF) mainly focuses on the hidden pattern discovery behind a series of vectors for two-way data. Here, we propose a tensor decomposition model Tri-ONTD to analyze three-way data. The model aims to discover the common characteristics of a series of matrices and at the same time identify the peculiarity of each matrix, thus enabling the discovery of the cluster structure in the data. In particular, the Tri-ONTD model performs adaptive dimension reduction for tensors as it integrates the subspace identification (i.e., the low-dimensional representation with a common basis for a set of matrices) and the clustering process into a single process. The Tri-ONTD model can also be regarded as an extension of the Tri-factor NMF model. We present the detailed optimization algorithm and also provide the convergence proof. Experimental results on real-world datasets demonstrate the effectiveness of our proposed method in author clustering, image clustering, and image reconstruction. In addition, the results of our proposed model have sparse and localized structures.
Science in China Series F: Information Sciences | 2013
Zhong-Yuan Zhang
Discovering community structures is a fundamental problem concerning how to understand the topology and the functions of complex network. In this paper, we propose how to apply dictionary learning algorithm to community structure detection. We present a new dictionary learning algorithm and systematically compare it with other state-of-the-art models/algorithms. The results show that the proposed algorithm is highly effectively at finding the community structures in both synthetic datasets, including three types of data structures, and real world networks coming from different areas.
Archive | 2012
Zhong-Yuan Zhang
In recent years, Nonnegative Matrix Factorization (NMF) has become a popular model in data mining society. NMF aims to extract hidden patterns from a series of high-dimensional vectors automatically, and has been applied for dimensional reduction, unsupervised learning (clustering, semi-supervised clustering and co-clustering, etc.) and prediction successfully. This chapter surveys NMF in terms of the model formulation and its variations and extensions, algorithms and applications, as well as its relations with K-means and Probabilistic Latent Semantic Indexing (PLSI). In summary, we draw the following conclusions: 1) NMF has a good interpretability due to its nonnegative constraints; 2) NMF is very flexible regarding the choices of its objective functions and the algorithms employed to solve it; 3) NMF has a variety of applications; 4) NMF has a solid theoretical foundation and a close relationship with the existing state-of-the-art unsupervised learning models. However, as a new and developing technology, there are still many interesting open issues remained unsolved and waiting for research from theoretical and algorithmic perspectives.
International Journal of Modern Physics C | 2015
Zhong-Yuan Zhang; Yong-Yeol Ahn
In this paper we propose weighted symmetric binary matrix factorization (wSBMF) framework to detect overlapping communities in bipartite networks, which describe relationships between two types of nodes. Our method improves performance by recognizing the distinction between two types of missing edges---ones among the nodes in each node type and the others between two node types. Our method can also explicitly assign community membership and distinguish outliers from overlapping nodes, as well as incorporating existing knowledge on the network. We propose a generalized partition density for bipartite networks as a quality function, which identifies the most appropriate number of communities. The experimental results on both synthetic and real-world networks demonstrate the effectiveness of our method.
Communications in Statistics-theory and Methods | 2014
Zhong-Yuan Zhang; Tao Li; Chris H. Q. Ding; Jie Tang
In document clustering, a document may be assigned to multiple clusters and the probabilities of a document belonging to different clusters are directly normalized. We propose a new Posterior Probabilistic Clustering (PPC) model that has this normalization property. The clustering model is based on Nonnegative Matrix Factorization (NMF) and flexible such that if we use class conditional probability normalization, the model reduces to Probabilistic Latent Semantic Indexing (PLSI). Systematic comparison and evaluation indicates that PPC is competitive with other state-of-art clustering methods. Furthermore, the results of PPC are more sparse and orthogonal, both of which are highly desirable.
fuzzy systems and knowledge discovery | 2012
Jia-Yi Li; Zhong-Yuan Zhang; Ruo-Yang Zhang; Xiao-Wei Mo; Shuang Wang
Social Networking Sites have recently been popular with researchers due to its wide coverage and heavy usage patterns. Much work has been done to investigate the interactions between web-based and non-web-based social networks, but not much attention is paid on the behavior unconformity between them. By applying nonnegative matrix factorization algorithm and visualization method to data collected from online and real-life social networks, we discover the link-patterns of web-based and non-web-based social networks among a certain group of students. By comparing the networking patterns, we prove the existence of behavior unconformity and show how behavior unconformity might strengthen the ties between the individuals.