Is this you? Create Your Porfile

Huayu Wu

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huayu Wu is active.

Explore More

Publication

Featured researches published by Huayu Wu.

very large data bases | 2016

A general and parallel platform for mining co-movement patterns over large-scale trajectories

Qi Fan; Dongxiang Zhang; Huayu Wu; Kian-Lee Tan

Discovering co-movement patterns from large-scale trajectory databases is an important mining task and has a wide spectrum of applications. Previous studies have identified several types of interesting co-movement patterns and show-cased their usefulness. In this paper, we make two key contributions to this research field. First, we propose a more general co-movement pattern to unify those defined in the past literature. Second, we propose two types of parallel and scalable frameworks and deploy them on Apache Spark. To the best of our knowledge, this is the first work to mine co-movement patterns in real life trajectory databases with hundreds of millions of points. Experiments on three real life large-scale trajectory datasets have verified the efficiency and scalability of our proposed solutions.

knowledge discovery and data mining | 2014

Identifying tourists from public transport commuters

Mingqiang Xue; Huayu Wu; Wei Chen; Wee Siong Ng; Gin Howe Goh

Tourism industry has become a key economic driver for Singapore. Understanding the behaviors of tourists is very important for the government and private sectors, e.g., restaurants, hotels and advertising companies, to improve their existing services or create new business opportunities. In this joint work with Singapores Land Transport Authority (LTA), we innovatively apply machine learning techniques to identity the tourists among public commuters using the public transportation data provided by LTA. On successful identification, the travelling patterns of tourists are then revealed and thus allow further analyses to be carried out such as on their favorite destinations, region of stay, etc. Technically, we model the tourists identification as a classification problem, and design an iterative learning algorithm to perform inference with limited prior knowledge and labeled data. We show the superiority of our algorithm with performance evaluation and comparison with other state-of-the-art learning algorithms. Further, we build an interactive web-based system for answering queries regarding the moving patterns of the tourists, which can be used by stakeholders to gain insight into tourists travelling behaviors in Singapore.

pacific-asia conference on knowledge discovery and data mining | 2015

Locating Self-Collection Points for Last-Mile Logistics Using Public Transport Data

Huayu Wu; Dongxu Shao; Wee Siong Ng

Delivery failure and re-scheduling cause the delay of services and increase the operation costs for logistics companies. Setting up self-collection points is an effective solution that is attracting attentions from many companies. One challenge for this model is how to choose the locations for self-collection points. In this work, we design a methodology for locating self-collection points. We consider both the distribution of a company’s potential customers and the people’s gathering pattern in the city. We leverage on citizens’ public transport riding records to simulate how the crowds emerge for particular hours. We reasonably assume that a place near to a people crowd is more convenient for customers than a place far away for self parcel collection. Based on this, we propose a kernel transformation method to re-evaluate the pairwise positions of customers, and then do a clustering.

mobile data management | 2014

HipStream: A Privacy-Preserving System for Managing Mobility Data Streams

Huayu Wu; Shili Xiang; Wee Siong Ng; Wei Wu; Mingqiang Xue

Personal mobile data are being extensively collected by various service providers, in the form of data stream. Most service providers promise their customers for not misusing their data by paper-based agreement. However, the customers have no way to know whether the agreements are strictly followed or not, unless any scandals of private data misuse are revealed. To guarantee the correct use of customers personal data and assure them of the service safety, system-level data privacy control between the data owners (i.e., Customers) and the data users (i.e., Service providers) is in compelling need. Inspired by the concept of Hippocratic data management, we design and implement a system, Hip Stream to systemically enforce different Hippocratic principles to preserve data providers privacy when they send their data stream for services. In this paper, we describe the architecture of the Hip Stream system and demonstrate how it meets those privacy principles.

database systems for advanced applications | 2014

Conditioning Probabilistic Relational Data with Referential Constraints

Ruiming Tang; Dongxu Shao; M. Lamine Ba; Huayu Wu

A probabilistic relational database is a compact form of a set of deterministic relational databases (namely, possible worlds), each of which has a probability. In our framework, the existence of tuples is determined by associated Boolean formulae based on elementary events. An estimation, within such a setting, of the probabilities of possible worlds uses a prior probability distribution specified over the elementary events. Direct observations and general knowledge, in the form of constraints, help refining these probabilities, possibly ruling out some possible worlds. More precisely, new constraints can translate the observation of the existence or non-existence of a tuple, the knowledge of a well-defined rule, such as primary key constraint, foreign key constraint, referential constraint, etc. Informally, the process of enforcing knowledge on a probabilistic database, which consists of computing a new subset of valid possible worlds together with their new (conditional) probabilities, is called conditioning. In this paper, we are interested in finding a new probabilistic relational database after conditioning with referential constraints involved. In the most general case, conditioning is intractable. As a result, we restricted our study to probabilistic relational databases in which formulae of tuples are independent events in order to achieve some tractability results. We devise and present polynomial algorithms for conditioning probabilistic relational databases with referential constraints.

international conference on parallel and distributed systems | 2012

Privacy Preservation in Streaming Data Collection

Wee Siong Ng; Huayu Wu; Wei Wu; Shili Xiang; Kian-Lee Tan

Big data management and analysis has become a hot topic in academic and industrial research. In fact, a large portion of big data in service today are initially streaming data. To preserve the privacy of such data that are collected from data streams, the most efficient way is to control the process of data collection according to corresponding privacy polices. In this paper, we design a framework to support data stream management with privacy-preserving capabilities. In particular, we focus on two premier principles of data privacy, limited disclosure and limited collection. With these two principles guaranteed, the archived data will not necessarily be checked for privacy protection, before analysis and other operations can be done.

database systems for advanced applications | 2014

A*DAX: A Platform for Cross-Domain Data Linking, Sharing and Analytics

Narayanan Amudha; Gim Guan Chua; Eric Siew Khuan Foo; Shen Tat Goh; Shuqiao Guo; Paul Min Chim Lim; Mun-Thye Mak; Muhammad Cassim Munshi; See-Kiong Ng; Wee Siong Ng; Huayu Wu

We introduce the A*STAR Data Analytics and Exchange Platform (“A*DAX”), which is the backbone data platform for different programs and projects under the Urban Systems Initiative launched by the Agency for Science, Technology and Research in Singapore. The A*DAX aims to provide a centralized system for public and private sectors to manage and share data; meanwhile, it also provides basic data analytics and visualization functions for authorized parties to consume data. A*DAX is also a channel for developers to develop innovative applications based on real data to improve urban services.

international conference on data engineering | 2017

From Raw Footprints to Personal Interests: Bridging the Semantic Gap via Trip Intention Aggregation

Long Guo; Dongxiang Zhang; Huayu Wu; Bin Cui; Kian-Lee Tan

User-generated trajectories (UGT), such as GPS footprints from wearable devices or travel records from bus companies, capture rich information of human mobility and urban dynamics in the offline world. In this paper, our objective is to enrich these raw footprints and discover the users personal interests by utilizing the semantic information contained in the spatial-and temporal-aware user-generated contents (STUGC) published in the online world. We design a novel probabilistic framework named CO2 to connect the offline world with the online world in order to discover the users interests directly from their raw footprints in UGT. In particular, we first propose a latent probabilistic generative model named STLDA to infer the intention attached with each trip, and then aggregate the extracted trip intentions to discover the users personal interests. To tackle the inherent sparsity and noisiness problems of the tags in STUGC, STLDA considers the inner correlation between tags (i.e., semantic, spatial and temporal correlation) on the topic-level. To evaluate the effectiveness of CO2, we utilize a dataset containing three months of data with 5.3 billion bus records and a Twitter dataset with 1.5 million tweets published in 6 months in Singapore as a case study. Experimental results on these two real-world datasets show that CO2 is effective in discovering user interests and improves the precision of the state-of-the-art method by 280%. In addition, we also conduct a questionnaire survey in Singapore to evaluate the effectiveness of CO2. The results further validate the superiority of CO2.

international conference on data engineering | 2016

Fuzzy trajectory linking

Huayu Wu; Mingqiang Xue; Jianneng Cao; Panagiotis Karras; Wee Siong Ng; Kee Kiat Koo

Today, people can access various services with smart carry-on devices, e.g., surf the web with smart phones, make payments with credit cards, or ride a bus with commuting cards. In addition to the offered convenience, the access of such services can reveal their traveled trajectory to service providers. Very often, a user who has signed up for multiple services may expose her trajectory to more than one service providers. This state of affairs raises a privacy concern, but also an opportunity. On one hand, several colluding service providers, or a government agency that collects information from such service providers, may identify and reconstruct users trajectories to an extent that can be threatening to personal privacy. On the other hand, the processing of such rich data may allow for the development of better services for the common good. In this paper, we take a neutral standpoint and investigate the potential for trajectories accumulated from different sources to be linked so as to reconstruct a larger trajectory of a single person. We develop a methodology, called fuzzy trajectory linking (FTL) that achieves this goal, and two instantiations thereof, one based on hypothesis testing and one on Naïve-Bayes. We provide a theoretical analysis for factors that affect FTL and use two real datasets to demonstrate that our algorithms effectively achieve their goals.

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2016

Detecting Communities of Commuters: Graph Based Techniques Versus Generative Models

Ashish Dandekar; Stéphane Bressan; Talel Abdessalem; Huayu Wu; Wee Siong Ng

The main stage for a new generation of cooperative information systems are smart communities such as smart cities and smart nations. In the smart city context in which we position our work, urban planning, development and management authorities and stakeholders need to understand and take into account the mobility patterns of urban dwellers in order to manage the sociological, economic and environmental issues created by the continuing growth of cities and urban population. In this paper, we address the issue of the detection of communities of commuters which is one of the crucial aspects of smart community analysis.

Explore More