Is this you? Create Your Porfile

Sibo Wang

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sibo Wang is active.

Explore More

Publication

Featured researches published by Sibo Wang.

international conference on management of data | 2015

Crowd-Based Deduplication: An Adaptive Approach

Sibo Wang; Xiaokui Xiao; Chun-Hee Lee

Data deduplication stands as a building block for data integration and data cleaning. The state-of-the-art techniques focus on how to exploit crowdsourcing to improve the accuracy of deduplication. However, they either incur significant overheads on the crowd or offer inferior accuracy. This paper presents ACD, a new crowd-based algorithm for data deduplication. The basic idea of ACD is to adopt correlation clustering (which is a classic machine-based algorithm for data deduplication) under a crowd-based setting. We propose non-trivial techniques to reduce the time required in performing correlation clustering with the crowd, and devise methods to postprocess the results of correlation clustering for better accuracy of deduplication. With extensive experiments on the Amazon Mechanical Turk, we demonstrate that ACD outperforms the states of the art by offering a high precision of deduplication while incurring moderate crowdsourcing overheads.

international conference on management of data | 2014

Reachability queries on large dynamic graphs: a total order approach

Andy Diwen Zhu; Wenqing Lin; Sibo Wang; Xiaokui Xiao

Reachability queries are a fundamental type of queries on graphs that find important applications in numerous domains. Although a plethora of techniques have been proposed for reachability queries, most of them require that the input graph is static, i.e., they are inapplicable to the {\em dynamic} graphs (e.g., social networks and the Semantic Web) commonly encountered in practice. There exist a few techniques that can handle dynamic graphs, but none of them can scale to sizable graphs without significant loss of efficiency. To address this deficiency, this paper presents a novel study on reachability indices for large dynamic graphs. We first introduce a general indexing framework that summarizes a family of reachability indices with the best performance among the existing techniques for static graphs. Then, we propose general and efficient algorithms for handling vertex insertions and deletions under the proposed framework. In addition, we show that our update algorithms can be used to improve the existing reachability techniques on static graphs, and we also propose a new approach for constructing a reachability index from scratch under our framework. We experimentally evaluate our solution on a large set of benchmark datasets, and we demonstrate that our solution not only supports efficient updates on dynamic graphs, but also provides even better query performance than the state-of-the-art techniques for static graphs.

international conference on management of data | 2015

Efficient Route Planning on Public Transportation Networks: A Labelling Approach

Sibo Wang; Wenqing Lin; Yi Yang; Xiaokui Xiao; Shuigeng Zhou

A public transportation network can often be modeled as a timetable graph where (i) each node represents a station; and (ii) each directed edge (u,v) is associated with a timetable that records the departure (resp. arrival) time of each vehicle at station u (resp. v). Several techniques have been proposed for various types of route planning on timetable graphs, e.g., retrieving the route from a node to another with the shortest travel time. These techniques, however, either provide insufficient query efficiency or incur significant space overheads. This paper presents Timetable Labelling (TTL), an efficient indexing technique for route planning on timetable graphs. The basic idea of TTL is to associate each node

knowledge discovery and data mining | 2013

Efficient single-source shortest path and distance queries on large graphs

Andy Diwen Zhu; Xiaokui Xiao; Sibo Wang; Wenqing Lin

very large data bases | 2017

Revisiting the stop-and-stare algorithms for influence maximization

Keke Huang; Sibo Wang; Glenn S. Bevilacqua; Xiaokui Xiao; Laks V. S. Lakshmanan

with a set of labels, each of which records the shortest travel time from u to some other node v given a certain departure time from u; such labels would then be used during query processing to improve efficiency. In addition, we propose query algorithms that enable TTL to support three popular types of route planning queries, and investigate how we reduce the space consumption of TTL with advanced preprocessing and label compression methods. By conducting an extensive set of experiments on real world datasets, we demonstrate that TTL significantly outperforms the states of the art in terms of query efficiency, while incurring moderate preprocessing and space overheads.

very large data bases | 2016

Effective indexing for approximate constrained shortest path queries on large road networks

Sibo Wang; Xiaokui Xiao; Yin Yang; Wenqing Lin

This paper investigates two types of graph queries: single source distance (SSD) queries and single source shortest path (SSSP) queries. Given a node v in a graph G, an SSD query from v asks for the distance from

very large data bases | 2016

HubPPR: effective indexing for approximate personalized pagerank

Sibo Wang; Youze Tang; Xiaokui Xiao; Yin Yang; Zengxiang Li

knowledge discovery and data mining | 2017

FORA: Simple and Effective Approximate Single-Source Personalized PageRank

Sibo Wang; Renchi Yang; Xiaokui Xiao; Zhewei Wei; Yin Yang

to any other node in G, while an SSSP query retrieves the shortest path from v to any other node. These two types of queries find important applications in graph analysis, especially in the computation of graph measures. Most of the existing solutions for SSD and SSSP queries, however, require that the input graph fits in the main memory, which renders them inapplicable for the massive disk-resident graphs commonly used in web and social applications. There are several techniques that are designed to be I/O efficient, but they all focus on undirected and/or unweighted graphs, and they only offer sub-optimal query efficiency. To address the deficiency of existing work, this paper presents Highways-on-Disk (HoD), a disk-based index that supports both SSD and SSSP queries on directed and weighted graphs. The key idea of HoD is to augment the input graph with a set of auxiliary edges, and exploit them during query processing to reduce I/O and computation costs. We experimentally evaluate HoD on both directed and undirected real-world graphs with up to billions of nodes and edges, and we demonstrate that HoD significantly outperforms alternative solutions in terms of query efficiency.

very large data bases | 2018

Go slow to go fast: minimal on-road time route scheduling with parking facilities using historical trajectory

Lei Li; Kai Zheng; Sibo Wang; Wen Hua; Xiaofang Zhou

Influence maximization is a combinatorial optimization problem that finds important applications in viral marketing, feed recommendation, etc. Recent research has led to a number of scalable approximation algorithms for influence maximization, such as TIM+ and IMM, and more recently, SSA and D-SSA. The goal of this paper is to conduct a rigorous theoretical and experimental analysis of SSA and D-SSA and compare them against the preceding algorithms. In doing so, we uncover inaccuracies in previously reported technical results on the accuracy and efficiency of SSA and D-SSA, which we set right. We also attempt to reproduce the original experiments on SSA and D-SSA, based on which we provide interesting empirical insights. Our evaluation confirms some results reported from the original experiments, but it also reveals anomalies in some other results and sheds light on the behavior of SSA and D-SSA in some important settings not considered previously. We also report on the performance of SSA-Fix, our modification to SSA in order to restore the approximation guarantee that was claimed for but not enjoyed by SSA. Overall, our study suggests that there exist opportunities for further scaling up influence maximization with approximation guarantees.

ACM Transactions on Storage | 2018

Persisting RB-Tree into NVM in a Consistency Perspective

Chundong Wang; Qingsong Wei; Lingkun Wu; Sibo Wang; Cheng Chen; Xiaokui Xiao; Jun Yang; Mingdi Xue; Yechao Yang

In a constrained shortest path (CSP) query, each edge in the road network is associated with both a length and a cost. Given an origin s, a destination t, and a cost constraint θ, the goal is to find the shortest path from s to t whose total cost does not exceed θ. Because exact CSP is NP-hard, previous work mostly focuses on approximate solutions. Even so, existing methods are still prohibitively expensive for large road networks. Two main reasons are (i) that they fail to utilize the special properties of road networks and (ii) that most of them process queries without indices; the few existing indices consume large amounts of memory and yet have limited effectiveness in reducing query costs. Motivated by this, we propose COLA, the first practical solution for approximate CSP processing on large road networks. COLA exploits the facts that a road network can be effectively partitioned, and that there exists a relatively small set of landmark vertices that commonly appear in CSP results. Accordingly, COLA indexes the vertices lying on partition boundaries, and applies an on-the-fly algorthm called α-Dijk for path computation within a partition, which effectively prunes paths based on landmarks. Extensive experiments demonstrate that on continent-sized road networks, COLA answers an approximate CSP query in sub-second time, whereas existing methods take hours. Interestingly, even without an index, the α-Dijk algorithm in COLA still outperforms previous solutions by more than an order of magnitude.

Explore More