Pili Hu
The Chinese University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pili Hu.
international conference on mobile systems, applications, and services | 2014
Chak Man Li; Pili Hu; Wing Cheong Lau
All printed documents and credentials are potentially subject to counterfeiting and forgery. Conventional counterfeiting solutions such as watermarking or printing with special-quality paper are not cost-effective. Certification via authorized chops/ stamps is low-cost but only provides a false sense of security/ authenticity. While embedding a serial number in the document for online verification is low-cost and secure, it is not applicable without Internet connection. We demonstrate AuthPaper (Authenticated Paper) to solve these problems by: 1) Digitally sign on the document to be protected; 2) Put the original content, digital signature and optionally the signers certificate in a self-describing encapsulation; 3) Generate a 2D barcode (e.g. QR code) to carry the encapsulation and embed it as an integral part of the paper document. Note that the information carried in Authenticated QR Code is 40 to 50 times more than a typical one (≈50 Bytes). The biggest technical challenge is to scan and decode such densely packed codes in a robust manner. We have developed an Android application to address the challenge. In short, AuthPaper provides a secure, low-cost and offline method for document authentication.
conference on online social networks | 2014
Pili Hu; Ronghai Yang; Yue Li; Wing Cheong Lau
OAuth 2.0 protocol has enjoyed wide adoption by Online Social Network (OSN) providers since its inception. Although the security guideline of OAuth 2.0 is well discussed in RFC6749 and RFC6819, many real-world attacks due to the implementation specifics of OAuth 2.0 in various OSNs have been discovered. To our knowledge, previously discovered loopholes are all based on the misuse of OAuth and many of them rely on provider side or application side vulnerabilities/ faults beyond the scope of the OAuth protocol. It was generally believed that correct use of OAuth 2.0 is secure. In this paper, we show that OAuth 2.0 is intrinsically vulnerable to App impersonation attack due to its provision of multiple authorization flows and token types. We start by reviewing and analyzing the OAuth 2.0 protocol and some common API design problems found in many 1st tiered OSNs. We then propose the App impersonation attack and investigate its impact on 12 major OSN providers. We demonstrate that, App impersonation via OAuth 2.0, when combined with additional API design features/ deficiencies, make large-scale exploit and privacy-leak possible. For example, it becomes possible for an attacker to completely crawl a 200-million-user OSN within just one week and harvest data objects like the status list and friend list which are expected, by its users, to be private among only friends. We also propose fixes that can be readily deployed to tackle the OAuth2.0-based App impersonation problem.
computer and communications security | 2016
Ronghai Yang; Guanchen Li; Wing Cheong Lau; Kehuan Zhang; Pili Hu
Motivated by the prevalence of OAuth-related vulnerabilities in the wild, large-scale security testing of real-world OAuth 2.0 implementations have received increasing attention lately [31,37,42]. However, these existing works either rely on manual discovery of new vulnerabilities in OAuth 2.0 implementations or perform automated testing for specific, previously-known vulnerabilities across a large number of OAuth implementations. In this work, we propose an adaptive model-based testing framework to perform automated, large-scale security assessments for OAuth 2.0 implementations in practice. Key advantages of our approach include (1) its ability to identify existing vulnerabilities and discover new ones in an automated manner; (2) improved testing coverage as all possible execution paths within the scope of the model will be checked and (3) its ability to cater for the implementation differences of practical OAuth systems/ applications, which enables the analyst to offload the manual efforts for large-scale testing of OAuth implementations. We have designed and implemented OAuthTester to realize our proposed framework. Using OAuthTester, we examine the implementations of 4 major Identity Providers as well as 500 top-ranked US and Chinese websites which use the OAuth-based Single-Sign-On service provided by the formers. Our empirical findings demonstrate the efficacy of adaptive model-based testing on OAuth 2.0 deployments at scale. More importantly, OAuthTester not only manages to rediscover various existing vulnerabilities but also identify several previously unknown security flaws and new exploits for a large number of eal-world applications implementing OAuth 2.0.
international conference on communications | 2015
Chak Man Li; Pili Hu; Wing Cheong Lau
All printed documents/credentials are potentially subject to counterfeiting and forgery. Conventional counterfeiting solutions such as watermarking or printing using special-quality paper are not cost-effective. Certification via authorized chops/stamps is low-cost but only provides a false sense of security/ authenticity. To tackle this problem, we propose AuthPaper: a cost-effective, secure solution for authenticating paper-based documents/credentials using off-the-shelf handheld devices such as smartphones and tablets. The key idea is to extend existing 2D barcodes, e.g. the QR code, to carry a large amount of self-describing, and most importantly, authenticated data of all types containing text, image and other binary ones. By embedding the Authenticated 2D barcode as an integral part of a paper-based document, the authenticity of the document can be readily verified by comparing its content with the corresponding digitally-signed content contained in the Authenticated 2D barcode. No online network-access or real-time communication with the document issuer is required during the document verification process. We have built a prototype using Android smartphones to prove that the proposed system is feasible. As shown by measurements, our prototype can accurately create and decode 2D barcodes carrying different sets of document data including images while the increases in processing time and memory usage are negligible.
international conference on communications | 2016
Ruohan Gao; Huanle Xu; Pili Hu; Wing Cheong Lau
The seminal works by Karger [13], [14] have shown that one can use Uniform Random Edge (URE) sampling to generate a graph skeleton which accurately approximates all cut-values in the original graph with high probability under some specific assumptions. As such, the random subgraphs resulted from URE sampling can often be used as substitutes for the original graphs in cut/flow-related graph-optimization problems [14]. In this paper, we extend the results of Karger to show that, besides the value (weight) of the cut-set, the weights of four additional types of edge-set, namely, Volume, Association, Complement Volume and Complement Association, are all well-preserved under URE sampling. More importantly, we show that these well-preserved edge-set metrics have dominant impact on the outcome of common graph-mining tasks including PageRank computation and Community Detection. As a result, URE sampling can be used to accelerate the corresponding graph-mining algorithms with small approximation errors. Via extensive experiments with large-scale graphs in practice, we demonstrate that URE sampling can achieve over 90% accuracy for PageRank computation and Modularity-based Community Detection by sampling only 20% edges of the original graph.
international conference on communications | 2014
Xiang Zhong; Pili Hu; Wing Cheong Lau
Online Social Networks (OSNs) heavily rely on community detection algorithms to support many of their core services. Common functions such as friend recommendation, and timeline personalization all require the fast discovery of communities over some massive graph(s). For such applications, scalability, flexibility and speed are much more important than marginal improvement in the theoretical quality of the results. While the community detection problem has been studied intensively in the past, existing work tends to emphasize on theoretical optimality than the aforementioned practical needs. In this paper, we present a 2-stage framework called Proximity-Based Cut and Merge (PBCM), for scalable and robust community detection. In the first stage, edges between low proximity nodes are eliminated in one pass. In the second stage, high proximity nodes are merged iteratively to produce results conforming to the intuitive notion of community. We explore the design space via extensive numerical evaluation to instantiate an effective community detection algorithm under the framework, and compare the performance of PBCM-based designs against state-of-the-art baselines. Our results show that the proposed PBCM framework is effective, scalable and robust. We also demonstrate the flexibility of PBCM by extending it to handle both overlapping and non-overlapping communities.
international conference on network protocols | 2013
Pili Hu; Wing Cheong Lau
Decentralized Social Network (DSN) has attracted a lot of research and development interest in recent years. It is believed to be the solution to many problems of centralized services. Due to the data limitation imposed by common decentralized architectures, centralized algorithms that support social networking functions need to be re-designed. In this work, we tackle the problem of community detection for a given user under the constraint of limited local topology information. This naturally yields a classification formulation for community detection. As an initial study, we focus on a specific type of classifiers - classification by thresholding against a proximity measure between nodes. We investigated four proximity measures: Common Neighbours (CN), Adamic/Adar score (AA), Page Rank (PR), Personalized PageRank (PPR). Using data collected from a large-scale Social Networking Service (SNS) in practice, we show that PPR can outperform the others with a few pre-known labels (37.5% to 64.97% relative improvement in terms of Area Under the ROC Curve). We further carry out extensive numerical evaluation of PPR, showing that more pre-known labels can linearly increase the capability of the single-feature classifier based on PPR. Users can thus seek for a trade-off between labeling cost and classification accuracy.
international conference on computer communications | 2015
Huanle Xu; Pili Hu; Wing Cheong Lau; Qiming Zhang; Yang Wu
Social Networking Service has become an essential part of our life today. However, many privacy concerns have recently been raised due to the centralized nature of such services. Decentralized Social Network (DSN) is believed to be a viable solution for these problems. In this paper, we design a protocol to coordinate the pulling operation of DSN nodes. The protocol is the result of forward engineering via utility maximization that takes communication layer congestion level as well as social network layer centrality into consideration. We solve the pulling rate control problem using the primal-dual approach and prove that the protocol can converge quickly when executed in a decentralized manner. Furthermore, we develop a novel “drumbeats” algorithm to estimate node centrality purely based on passively-observed information. Simulation results show that our protocol reduces the average message propagation delay by 15% when comparing to the baselined Fixed Equal Gap Pull protocol. In addition, the estimated node centrality matches well with the ground-truth derived from the actual topology of the social network.
global communications conference | 2014
Ruohan Gao; Pili Hu; Wing Cheong Lau
With the explosion of graph scale of social networks, it becomes increasingly impractical to study the original large graph directly. Being able to derive a representative sample of the original graph, graph sampling provides an efficient solution for social network analysis. We expect this sample could preserve some important graph properties and represent the original graph well. If one algorithm relies on the preserved properties, we can expect that it gives similar output on the original graph and the sampled graph. This leads to a systematic way to accelerate a class of graph algorithms. Our work is based on the idea of stratified sampling [14], a widely used technique in statistics. We propose a heuristic approach to achieve efficient graph sampling based on community structure of social networks. With the aid of ground-truth of communities available in social networks, we find out that sampling from communities preserves community- related graph properties very well. The experimental results show that our framework improves the performance of traditional graph sampling algorithms and therefore, is an effective method of graph sampling.
global communications conference | 2014
Pili Hu; Wing Cheong Lau
Decentralized Social Network (DSN) has attracted a lot of research and development interest in recent years. It is believed to be the solution to many problems of centralized services. Due to the data limitation imposed by common decentralized architectures, centralized algorithms that support social networking functions need to be re-designed. In this work, we tackle the problem of community detection for a given user under the constraint of limited local topology information. This naturally yields a classification formulation for community detection. As an initial study, we focus on a specific type of classifiers - classification by thresholding against a proximity measure between nodes. We investigated four proximity measures: Common Neighbours (CN), Adamic/Adar score (AA), Page Rank (PR), Personalized PageRank (PPR). Using data collected from a large-scale Online Social Network (OSN) in practice, we show that PPR can outperform the others with a few pre-known labels (37.5% to 64.97% relative improvement in terms of Area Under the ROC Curve). We further carry out extensive numerical evaluation of PPR, showing that more pre-known labels can linearly increase the capability of the single-feature classifier based on PPR. Users can thus seek for a trade-off between labeling cost and classification accuracy.