Junzhou Zhao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junzhou Zhao is active.

Explore More

Publication

Featured researches published by Junzhou Zhao.

ACM Transactions on Knowledge Discovery From Data | 2014

Efficiently Estimating Motif Statistics of Large Networks

Pinghui Wang; John C. S. Lui; Bruno F. Ribeiro; Donald F. Towsley; Junzhou Zhao; Xiaohong Guan

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

international conference on data engineering | 2016

Minfer: A method of inferring motif statistics from sampled edges

Pinghui Wang; John C. S. Lui; Donald F. Towsley; Junzhou Zhao

Characterizing motif (i.e., locally connected sub-graph patterns) statistics is important for understanding complex networks such as online social networks and communication networks. Previous work made the strong assumption that the graph topology of interest is known in advance. In practice, sometimes researchers have to deal with the situation where the graph topology is unknown because it is expensive to collect and store all topological and meta information. Hence, typically what is available to researchers is only a snapshot of the graph, i.e., a subgraph of the graph. Crawling methods such as breadth first sampling can be used to generate the snapshot. However, these methods fail to sample a streaming graph represented as a high speed stream of edges. Therefore, graph mining applications such as network traffic monitoring use random edge sampling (i.e., sample each edge with a fixed probability) to collect edges and generate a sampled graph, which we called a “RESampled graph”. Clearly, a RESampled graphs motif statistics may be quite different from those of the underlying original graph. To resolve this, we propose a framework and implement a system called Minfer, which takes the given RESampled graph and accurately infers the underlying graphs motif statistics. We also apply Fisher information to bound the errors of our estimates. Experiments using large scale datasets show the accuracy and efficiency of our method.

international conference on data engineering | 2015

A tale of three graphs: Sampling design on hybrid social-affiliation networks

Junzhou Zhao; John C. S. Lui; Donald F. Towsley; Pinghui Wang; Xiaohong Guan

Random walk-based graph sampling methods have become increasingly popular and important for characterizing large-scale complex networks. While powerful, they are known to exhibit problems when the graph is loosely connected, which slows down the convergence of a random walk and can result in poor estimation accuracy. In this work, we observe that many graphs under study, called target graphs, usually do not exist in isolation. In many situations, a target graph is often related to an auxiliary graph and an affiliation graph, and the target graph becomes better connected when viewed from these three graphs as a whole, or what we called a hybrid social-affiliation network. This viewpoint brings extra benefits to the graph sampling framework, e.g., when directly sampling a target graph is difficult or inefficient, we can efficiently sample it with the assistance of auxiliary and affiliation graphs. We propose three sampling methods on such a hybrid social-affiliation network to estimate target graph characteristics, and conduct extensive experiments on both synthetic and real datasets, to demonstrate the effectiveness of these new sampling methods.

international world wide web conferences | 2014

Measuring and maximizing group closeness centrality over disk-resident graphs

Junzhou Zhao; John C. S. Lui; Donald F. Towsley; Xiaohong Guan

As an important metric in graphs, group closeness centrality measures how close a group of vertices is to all other vertices in a graph, and it is used in numerous graph applications such as measuring the dominance and influence of a node group over the graph. However, when a large-scale graph contains hundreds of millions of nodes/edges which cannot reside entirely in computers main memory, measuring and maximizing group closeness become challenging tasks. In this paper, we present a systematic solution for efficiently calculating and maximizing the group closeness for disk-resident graphs. Our solution first leverages a probabilistic counting method to efficiently estimate the group closeness with high accuracy, rather than exhaustively computing it in an exact fashion. In addition, we design an I/O-efficient greedy algorithm to find a node group that maximizes group closeness. Our proposed algorithm significantly reduces the number of random accesses to disk, thereby dramatically improving computational efficiency. Experiments on real-world big graphs demonstrate the efficacy of our approach.

IEEE Transactions on Knowledge and Data Engineering | 2018

MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs

Pinghui Wang; Junzhou Zhao; Xiangliang Zhang; Zhenguo Li; Jiefeng Cheng; John C. S. Lui; Donald F. Towsley; Jing Tao; Xiaohong Guan

Counting 3-, 4-, and 5-node graphlets in graphs is important for graph mining applications such as discovering abnormal/evolution patterns in social and biology networks. In addition, it is recently widely used for computing similarities between graphs and graph classification applications such as protein function prediction and malware detection. However, it is challenging to compute these graphlet counts for a large graph or a large set of graphs due to the combinatorial nature of the problem. Despite recent efforts in counting 3-node and 4-node graphlets, little attention has been paid to characterizing 5-node graphlets. In this paper, we develop a computationally efficient sampling method to estimate 5-node graphlet counts. We not only provide a fast sampling method and unbiased estimators of graphlet counts, but also derive simple yet exact formulas for the variances of the estimators which are of great value in practice—the variances can be used to bound the estimates’ errors and determine the smallest necessary sampling budget for a desired accuracy. We conduct experiments on a variety of real-world datasets, and the results show that our method is several orders of magnitude faster than the state-of-the-art methods with the same accuracy.

international conference on data engineering | 2013

Sampling node pairs over large graphs

Pinghui Wang; Junzhou Zhao; John C. S. Lui; Donald F. Towsley; Xiaohong Guan

Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large scale nature of such networks, it is infeasible to enumerate all user pairs and so sampling is used. In this paper, we show that it is a great challenge even for OSN service providers to characterize user pair relationships even when they possess the complete graph topology. The reason is that when sampling techniques (i.e., uniform vertex sampling (UVS) and random walk (RW)) are naively applied, they can introduce large biases, in particular, for estimating similarity distribution of user pairs with constraints such as existence of mutual neighbors, which is important for applications such as identifying network homophily. Estimating statistics of user pairs is more challenging in the absence of the complete topology information, since an unbiased sampling technique such as UVS is usually not allowed, and exploring the OSN graph topology is expensive. To address these challenges, we present asymptotically unbiased sampling methods to characterize user pair properties based on UVS and RW techniques respectively. We carry out an evaluation of our methods to show their accuracy and efficiency. Finally, we apply our methods to two Chinese OSNs, Doudan and Xiami, and discover significant homophily is present in these two networks.

conference on online social networks | 2015

Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams

Junzhou Zhao; John C. S. Lui; Donald F. Towsley; Pinghui Wang; Xiaohong Guan

In everyday life, we often observe unusually frequent interactions among people before or during important events, i.e., people receive/send more greetings from/to their friends on Christmas Day than regular days. We also observe that some videos suddenly go viral through peoples sharing in online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with sudden surges of user activities in networks, which we call bursts in this work. We uncover that the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new and robust method to detect bursts in OSNs. We first introduce a new measure, triadic cardinality distribution, corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, within a network. We demonstrate that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can reliably detect bursts in OSNs. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient sample-estimate solution to provide maximum likelihood estimate on the triadic cardinality distribution from sampled data. Extensive experiments conducted on real data demonstrate the usefulness of this triadic cardinality distribution and effectiveness of our sample-estimate solution.

ACM Transactions on Knowledge Discovery From Data | 2015

Unbiased Characterization of Node Pairs over Large Graphs

Pinghui Wang; Junzhou Zhao; John C. S. Lui; Donald F. Towsley; Xiaohong Guan

Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and thus sampling is used. In this article, we show that it is a great challenge for OSN service providers to characterize user pair relationships, even when they possess the complete graph topology. The reason is that when sampling techniques (i.e., uniform vertex sampling (UVS) and random walk (RW)) are naively applied, they can introduce large biases, particularly for estimating similarity distribution of user pairs with constraints like existence of mutual neighbors, which is important for applications such as identifying network homophily. Estimating statistics of user pairs is more challenging in the absence of the complete topology information, as an unbiased sampling technique like UVS is usually not allowed and exploring the OSN graph topology is expensive. To address these challenges, we present unbiased sampling methods to characterize user pair properties based on UVS and RW techniques. We carry out an evaluation of our methods to show their accuracy and efficiency. Finally, we apply our methods to three OSNs—Foursquare, Douban, and Xiami—and discover that significant homophily is present in these networks.

Data Mining and Knowledge Discovery | 2018

Sampling online social networks by random walk with indirect jumps

Junzhou Zhao; Pinghui Wang; John C. S. Lui; Donald F. Towsley; Xiaohong Guan

Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected two-layered network structure. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.

IEEE Transactions on Knowledge and Data Engineering | 2017

Inferring Higher-Order Structure Statistics of Large Networks From Sampled Edges

Pinghui Wang; Yiyan Qi; John C. S. Lui; Donald F. Towsley; Junzhou Zhao; Jing Tao

Recently exploring locally connected subgraphs (also known as motifs or graphlets) of complex networks attracts a lot of attention. Previous work made the strong assumption that the graph topology of interest is known in advance. In practice, sometimes researchers have to deal with the situation where the graph topology is unknown because it is expensive to collect and store all topological information. Hence, typically what is available to researchers is only a snapshot of the graph, i.e., a subgraph of the graph. Crawling methods such as breadth first sampling can be used to generate the snapshot. However, these methods fail to sample a streaming graph represented as a high speed stream of edges. Therefore, graph mining applications such as network traffic monitoring usually use random edge sampling (i.e., sample each edge with a fixed probability) to collect edges and generate a sampled graph, which we call a “ RESampled graph”. Clearly, a RESampled graphs motif statistics may be quite different from those of the original graph. To resolve this, we propose a framework Minfer, which takes the given RESampled graph and accurately infers the underlying graphs motif statistics. Experiments using large scale datasets show the accuracy and efficiency of our method.

Explore More