Pinghui Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pinghui Wang is active.

Explore More

Publication

Featured researches published by Pinghui Wang.

international conference on computer communications | 2012

Sampling directed graphs with random walks

Bruno F. Ribeiro; Pinghui Wang; Fabricio Murai; Donald F. Towsley

Despite recent efforts to characterize complex networks such as citation graphs or online social networks (OSNs), little attention has been given to developing tools that can be used to characterize directed graphs in the wild, where no pre-processed data is available. The presence of hidden incoming edges but observable outgoing edges poses a challenge to characterize large directed graphs through crawling, as existing sampling methods cannot cope with hidden incoming links. The driving principle behind our random walk (RW) sampling method is to construct, in real-time, an undirected graph from the directed graph such that the random walk on the directed graph is consistent with one on the undirected graph. We then use the RW on the undirected graph to estimate the outdegree distribution. Our algorithm accurately estimates outdegree distributions of a variety of real world graphs. We also study the hardness of indegree distribution estimation when indegrees are latent (i.e., incoming links are only observed as outgoing edges). We observe that, in the same scenarios, indegree distribution estimates are highly innacurate unless the directed graph is highly symmetrical.

ACM Transactions on Knowledge Discovery From Data | 2014

Efficiently Estimating Motif Statistics of Large Networks

Pinghui Wang; John C. S. Lui; Bruno F. Ribeiro; Donald F. Towsley; Junzhou Zhao; Xiaohong Guan

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

IEEE Journal on Selected Areas in Communications | 2013

On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling

Fabricio Murai; Bruno F. Ribeiro; Donald F. Towsley; Pinghui Wang

In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning and network theory. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is difficult to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction lies below the threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.

Journal of Network and Computer Applications | 2011

Monitoring abnormal network traffic based on blind source separation approach

Tao Qin; Xiaohong Guan; Wei Li; Pinghui Wang; Qiuzhen Huang

The randomness in network behaviors poses serious challenges for discovering abnormal patterns in network traffic flows. This paper presents a systematic approach for monitoring abnormal network traffic. The DFlow model is proposed to reduce the flow records and extract four features to capture the traffic patterns. The blind source separation method is applied to obtain the routine and abnormal behaviors from those features. A scale space filter is applied to filter the randomness in the traffic flows without affecting the behavior patterns. A threshold is selected based on a systematic criterion to evaluate the degree of abnormality. The contributions of different traffic features to the abnormal behavior detection are analyzed. It is found that the number of connection degree is the most important feature for traffic monitoring. A salient feature of this method is that it is effective for detecting the abnormal behaviors not associated with significant changes in traffic volumes. Another advantage of the new method is that no supervised learning process is needed. This is very important since high quality labeled samples are very difficult to acquire in actual networks especially the data traces associated with attacks. The experimental results based on the actual network data show that the method presented in the paper is effective for monitoring abnormal traffic flows in the gigabytes traffic environment and the accuracy is above 95%.

IEEE Transactions on Information Forensics and Security | 2011

A Data Streaming Method for Monitoring Host Connection Degrees of High-Speed Links

Pinghui Wang; Xiaohong Guan; Tao Qin; Qiuzhen Huang

Due to the massive amount of data in high-speed network traffic and the limit on processing capability, it is a great challenge to accurately measure and monitor network traffic over high-speed links online. A new data structure is presented in this paper for locating the hosts associated with large connection degrees or significant changes in connection degrees based on the reversible connection degree sketch to monitor anomalous network traffic. The reversible connection degree sketch builds a compact summary of host connection degrees efficiently and accurately. For each packet coming, it only needs to set several bits selected in a bit array by a group of hash functions. These hash functions are designed based on the Chinese Remainder Theorem so that the in-degree or out-degree associated with a given host can be accurately estimated. With this new data structure, we develop a new reverse sketch method for locating abnormal hosts. Although the reversible connection degree sketch does not preserve any host address information, we can analytically reconstruct the host addresses associated with large connection degrees or significant changes in connection degrees by a simple calculation purely based on the characteristics of the hash functions. Furthermore, a reinforced reversible connection degree sketch, the double connection degree sketch, is developed to reduce false positives which are commonly encountered in the sketch-based methods. A traffic monitoring system based on this double connection degree is developed to detect and classify the abnormal hosts associated with large connection degrees or significant changes in connection degrees. The experiments are conducted based on the actual network traffic and the testing results show that our method is accurate and efficient.

global communications conference | 2009

A New Data Streaming Method for Locating Hosts with Large Connection Degree

Xiaohong Guan; Pinghui Wang; Tao Qin

Locating hosts with large connection degree is very important for monitoring anomalous network traffics. The in-degree (out-degree), defined as the number of distinct sources (destinations) that a network host is connected with (connects) during a given time interval. Due to massive amount of data in high speed network traffics and limit on processing capability, it is difficult to accurately locate hosts with large connection degree over high speed links on line. In this paper we present a new data streaming method for locating hosts with large connection degree based on the reversible connection degree sketch to monitor anomalous network traffics. The required memory space is small and constant, and more importantly the update/query complexity would not depend on the amount of data. The hash functions for data sketch are designed based on the remainder characteristics of the number theory so that in-degree/out-degree associated with a given host can be accurately estimated. Although the connection degree sketch does not preserve any host address information, we can analytically reconstruct the host addresses associated with large in-degree/out-degree by a simply equation purely based on the characteristics of the hash functions without using any host address information. This procedure is highly efficient since the computational time is constant and ignorable. Furthermore, this reversible connection degree sketch based method can be easily implemented in distributed systems. The experimental and testing results based on the actual network traffics show that the new method is truly accurate and efficient.

international conference on data engineering | 2016

Minfer: A method of inferring motif statistics from sampled edges

Pinghui Wang; John C. S. Lui; Donald F. Towsley; Junzhou Zhao

Characterizing motif (i.e., locally connected sub-graph patterns) statistics is important for understanding complex networks such as online social networks and communication networks. Previous work made the strong assumption that the graph topology of interest is known in advance. In practice, sometimes researchers have to deal with the situation where the graph topology is unknown because it is expensive to collect and store all topological and meta information. Hence, typically what is available to researchers is only a snapshot of the graph, i.e., a subgraph of the graph. Crawling methods such as breadth first sampling can be used to generate the snapshot. However, these methods fail to sample a streaming graph represented as a high speed stream of edges. Therefore, graph mining applications such as network traffic monitoring use random edge sampling (i.e., sample each edge with a fixed probability) to collect edges and generate a sampled graph, which we called a “RESampled graph”. Clearly, a RESampled graphs motif statistics may be quite different from those of the underlying original graph. To resolve this, we propose a framework and implement a system called Minfer, which takes the given RESampled graph and accurately infers the underlying graphs motif statistics. We also apply Fisher information to bound the errors of our estimates. Experiments using large scale datasets show the accuracy and efficiency of our method.

international conference on data engineering | 2014

An efficient sampling method for characterizing points of interests on maps

Pinghui Wang; Wenbo He; Xue Liu

Recently map services (e.g., Google maps) and location-based online social networks (e.g., Foursquare) attract a lot of attention and businesses. With the increasing popularity of these location-based services, exploring and characterizing points of interests (PoIs) such as restaurants and hotels on maps provides valuable information for applications such as start-up marketing research. Due to the lack of a direct fully access to PoI databases, it is infeasible to exhaustively search and collect all PoIs within a large area using public APIs, which usually impose a limit on the maximum query rate. In this paper, we propose an effective and efficient method to sample PoIs on maps, and give unbiased estimators to calculate PoI statistics such as sum and average aggregates. Experimental results based on real datasets show that our method is efficient, and requires six times less queries than state-of-the-art methods to achieve the same accuracy.

international conference on communications | 2008

Dynamic Features Measurement and Analysis for Large-Scale Networks

Tao Qin; Xiaohong Guan; Wei Li; Pinghui Wang

Detecting and measuring the changes of temporal traffic patterns in large scale networks are crucial for effective network management. This paper presents the concept of region flow to aggregate traffic packets. Regions are defined by the IP prefix, and a region flow is a group of packets with the same source and destination region during a time interval. In this way, the number of flows can be reduced significantly and a better extraction of pivotal traffic metrics is generated. Three traffic features: source connection degree, destination connection degree and packet distribution ratio are proposed to capture the dynamic change of the flow patterns between regions and the Renyi cross entropy are applied to measure and detect the changes. The experimental results show that the method proposed in this paper can capture the dynamic traffic features effectively for 10Gbps backbone networks, and can be used for detecting abnormal network behaviors.

IEEE Transactions on Information Forensics and Security | 2010

Dynamic Feature Analysis and Measurement for Large-Scale Network Traffic Monitoring

Xiaohong Guan; Tao Qin; Wei Li; Pinghui Wang

Measuring and monitoring the changes of network traffic patterns in large-scale networks are crucial for effective network management. In this paper, we present a framework and method for detecting and measuring the dynamic changes of the pivotal traffic patterns. A bidirectional regional flow model is established to aggregate traffic packets and extract the traffic metrics and profiles. The characteristics of the regional flows are analyzed and interesting findings are obtained. A directed graph model is applied to describe the flow metrics and six flow features are extracted to capture the dynamic changes of the flow patterns. The measurements based on Renyi entropy are developed to quantitatively monitor these changes. The experimental results based on the actual network traffic data traces show that the method presented in this paper can capture the dynamic changes of pivotal traffic patterns effectively.

Explore More