Zhaonian Zou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhaonian Zou is active.

Explore More

Publication

Featured researches published by Zhaonian Zou.

IEEE Transactions on Knowledge and Data Engineering | 2010

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Zhaonian Zou; Hong Gao; Shuo Zhang

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph data. This paper investigates the problem of mining uncertain graph data and especially focuses on mining frequent subgraph patterns on an uncertain graph database. A novel model of uncertain graphs is presented, and the frequent subgraph pattern mining problem is formalized by introducing a new measure, called expected support. This problem is proved to be NP-hard. An approximate mining algorithm is proposed to find a set of approximately frequent subgraph patterns by allowing an error tolerance on expected supports of discovered subgraph patterns. The algorithm uses efficient methods to determine whether a subgraph pattern can be output or not and a new pruning method to reduce the complexity of examining subgraph patterns. Analytical and experimental results show that the algorithm is very efficient, accurate, and scalable for large uncertain graph databases. To the best of our knowledge, this paper is the first one to investigate the problem of mining frequent subgraph patterns from uncertain graph data.

knowledge discovery and data mining | 2010

Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Zhaonian Zou; Hong Gao

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent subgraph mining on uncertain graphs under probabilistic semantics. Specifically, a measure called φ-frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two numbers 0 < φ,τ < 1, the goal is to quickly find all subgraphs with φ-frequent probability at least τ. Due to the NP-hardness of the problem, an approximate mining algorithm is proposed for this problem. Let 0 < δ < 1 be a parameter. The algorithm guarantees to find any frequent subgraph S with probability at least (1 - δ/2)s, where s is the number of edges of S. In addition, it is thoroughly discussed how to set δ to guarantee the overall approximation quality of the algorithm. The extensive experiments on real uncertain graph data verify that the algorithm is efficient and that the mining results have very high quality.

international conference on data engineering | 2010

Finding top-k maximal cliques in an uncertain graph

Zhaonian Zou; Hong Gao; Shuo Zhang

Existing studies on graph mining focus on exact graphs that are precise and complete. However, graph data tends to be uncertain in practice due to noise, incompleteness and inaccuracy. This paper investigates the problem of finding top-k maximal cliques in an uncertain graph. A new model of uncertain graphs is presented, and an intuitive measure is introduced to evaluate the significance of vertex sets. An optimized branch-and-bound algorithm is developed to find top-k maximal cliques, which adopts efficient pruning rules, a new searching strategy and effective preprocessing methods. The extensive experimental results show that the proposed algorithm is very efficient on real uncertain graphs, and the top-k maximal cliques are very useful for real applications, e.g. protein complex prediction.

conference on information and knowledge management | 2009

Frequent subgraph pattern mining on uncertain graph data

Zhaonian Zou; Hong Gao; Shuo Zhang

Graph data are subject to uncertainties in many applications due to incompleteness and imprecision of data. Mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data. This paper investigates the problem of mining frequent subgraph patterns from uncertain graph data. The frequent subgraph pattern mining problem is formalized by designing a new measure called expected support. An approximate mining algorithm is proposed to find an approximate set of frequent subgraph patterns by allowing an error tolerance on the expected supports of the discovered subgraph patterns. The algorithm uses an efficient approximation algorithm to determine whether a subgraph pattern can be output or not. The analytical and experimental results show that the algorithm is very efficient, accurate and scalable for large uncertain graph databases.

extending database technology | 2009

A novel approach for efficient supergraph query processing on graph databases

Shuo Zhang; Hong Gao; Zhaonian Zou

In recent years, large amount of data modeled by graphs, namely graph data, have been collected in various domains. Efficiently processing queries on graph databases has attracted a lot of research attentions. Supergraph query is a kind of new and important queries in practice. A supergraph query, q, on a graph database D is to retrieve all graphs in D such that q is a supergraph of them. Because the number of graphs in databases is large and subgraph isomorphism testing is NP-complete, efficiently processing such queries is a big challenge. This paper first proposes an optimal compact method for organizing graph databases. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating significant feature set with optimal order are proposed to construct indices on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm of testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all these techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outper-form the existing similar algorithms by one to two orders of magnitude.

international conference on data engineering | 2016

SimRank computation on uncertain graphs

Rong Zhu; Zhaonian Zou

SimRank is a similarity measure between vertices in a graph, which has become a fundamental technique in graph analytics. Recently, many algorithms have been proposed for efficient evaluation of SimRank similarities. However, the existing SimRank computation algorithms either overlook uncertainty in graph structures or is based on an unreasonable assumption (Du et al). In this paper, we study SimRank similarities on uncertain graphs based on the possible world model of uncertain graphs. Following the random-walk-based formulation of SimRank on deterministic graphs and the possible worlds model of uncertain graphs, we define random walks on uncertain graphs for the first time and show that our definition of random walks satisfies Markovs property. We formulate the SimRank measure based on random walks on uncertain graphs. We discover a critical difference between random walks on uncertain graphs and random walks on deterministic graphs, which makes all existing SimRank computation algorithms on deterministic graphs inapplicable to uncertain graphs. To efficiently compute SimRank similarities, we propose three algorithms, namely the baseline algorithm with high accuracy, the sampling algorithm with high efficiency, and the two-phase algorithm with comparable efficiency as the sampling algorithm and about an order of magnitude smaller relative error than the sampling algorithm. The extensive experiments and case studies verify the effectiveness of our SimRank measure and the efficiency of our SimRank computation algorithms.

international conference on data mining | 2015

Top-k Reliability Search on Uncertain Graphs

Rong Zhu; Zhaonian Zou

Uncertain graphs have been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper studies a new problem, the top-k reliability search problem on uncertain graphs, that is, finding k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stored them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. Moreover, we generalize the top-k reliability search problem to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1 - 2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph.

database systems for advanced applications | 2016

Triangle-Based Representative Possible Worlds of Uncertain Graphs

Shaoying Song; Zhaonian Zou; Kang Liu

Uncertain graph data has been collected, processed and analyzed in a wide range of applications. Under the possible world model, an uncertain graph represents a probability distribution over all its possible worlds. Each possible world is a deterministic graph in which the uncertain graph may be present in practice. To deal with the hardness of computations on uncertain graphs, Parchas et al. first proposed the concept of a degree-based representative possible world. This approach is distinguished from the sampling approach in that it computes on one representative possible world instead of on a large number of possible worlds sampled at random. However, the degree-based representative possible world only tries to preserve vertex degrees. In this paper, we are motivated by the fact that motif structures such as triangles can affect the structural properties of graphs and propose the concept of a triangle-based representative possible world. We also develop an algorithm for finding the triangle-based representative possible world of an uncertain graph. We conducted extensive experimental evaluations and show that the triangle-based representative possible world outperforms the degree-based representative possible world in preserving the structural characteristics of an uncertain graph in expectation.

Knowledge and Information Systems | 2017

Towards efficient top-k reliability search on uncertain graphs

Rong Zhu; Zhaonian Zou

Uncertain graph has been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper investigates on a new problem with broad real-world applications, the top-k reliability search problem on uncertain graphs, that is, finding the k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stores them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. In addition, we generalize the top-k reliability search problem from single-source case to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Moreover, we define two types of the reverse top-k reliability search problems with different semantics on uncertain graphs. We propose appropriate solutions for both of them. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1–2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph.

Journal of Combinatorial Optimization | 2017

Minimized-cost cube query on heterogeneous information networks

Dan Yin; Hong Gao; Zhaonian Zou

Data cube is the foundation of on-line analytical processing (OLAP), which can provide users with data views from different perspectives and granularities. Heterogeneous information networks consist of multiple types of nodes and edges which represent different semantic relations. With the rapid development of social networks and knowledge graphs, heterogeneous information networks have become increasingly popular. In heterogeneous information networks, cube is the set of aggregate graphs and cube query is required for supporting OLAP. The existing research mainly studies aggregate graph query on homogeneous networks, but only considers the attributes of nodes. To overcome these challenges, this paper investigates cube query problem on heterogeneous information networks. (1) A novel cube model for heterogeneous information networks is proposed, which captures both the attribute and structure semantics. (2) Because the total number of aggregate graphs is huge, computing and storing them cost plenty of time and storage. The problem of partial cube materialization on heterogeneous information networks is investigated. Given a fixed size of memory space, select a subset of aggregate graphs in cube, to minimize the computing cost of the whole cube. This optimization problem is proved to be NP-complete and there is no

Explore More