Wendy Hui Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wendy Hui Wang is active.

Explore More

Publication

Featured researches published by Wendy Hui Wang.

extending database technology | 2009

Anonymizing moving objects: how to hide a MOB in a crowd?

Roman Yarovoy; Francesco Bonchi; Laks V. S. Lakshmanan; Wendy Hui Wang

Moving object databases (MOD) have gained much interest in recent years due to the advances in mobile communications and positioning technologies. Study of MOD can reveal useful information (e.g., traffic patterns and congestion trends) that can be used in applications for the common benefit. In order to mine and/or analyze the data, MOD must be published, which can pose a threat to the location privacy of a user. Indeed, based on prior knowledge of a users location at several time points, an attacker can potentially associate that user to a specific moving object (MOB) in the published database and learn her position information at other time points. In this paper, we study the problem of privacy-preserving publishing of moving object database. Unlike in microdata, we argue that in MOD, there does not exist a fixed set of quasi-identifier (QID) attributes for all the MOBs. Consequently the anonymization groups of MOBs (i.e., the sets of other MOBs within which to hide) may not be disjoint. Thus, there may exist MOBs that can be identified explicitly by combining different anonymization groups. We illustrate the pitfalls of simple adaptations of classical k-anonymity and develop a notion which we prove is robust against privacy attacks. We propose two approaches, namely extreme-union and symmetric anonymization, to build anonymization groups that provably satisfy our proposed k-anonymity requirement, as well as yield low information loss. We ran an extensive set of experiments on large real-world and synthetic datasets of vehicular traffic. Our results demonstrate the effectiveness of our approach.

conference on information and knowledge management | 2009

Answering XML queries using materialized views revisited

Xiaoying Wu; Dimitri Theodoratos; Wendy Hui Wang

Answering queries using views is a well-established technique in databases. In this context, two outstanding problems can be formulated. The first one consists in deciding whether a query can be answered exclusively using one or multiple materialized views. Given the many alternative ways to compute the query from the materialized views, the second problem consists in finding the best way to compute the query from the materialized views. In the realm of XML, there is a restricted number of contributions in the direction of these problems due to the many limitations associated with the use of materialized views in traditional XML query evaluation models. In this paper, we adopt a recent evaluation model, called inverted lists model, and holistic algorithms which together have been established as the prominent technique for evaluating queries on large persistent XML data, and we address the previous two problems. This new context revises these problems since it requires new conditions for view usability and new techniques for computing queries from materialized views. We suggest an original approach for materializing views which stores for every view node only the list of XML nodes necessary for computing the answer of the view. We specify necessary and sufficient conditions for answering a tree-pattern query using one or multiple materialized views in terms of homomorphisms from the views to the query. In order to efficiently answer queries using materialized views, we design a stack-based algorithm which compactly encodes in polynomial time and space all the homomorphisms from a view to a query. We further propose space and time optimizations by using bitmaps to encode view materializations and by employing bitwise operations to minimize the evaluation cost of the queries. Finally, we conducted an extensive experimentation which demonstrates that our approach yields impressive query hit rates in the view pool, achieves significant time and space savings and shows smooth scalability.

Information Systems | 2013

Optimizing XML queries: Bitmapped materialized views vs. indexes

Xiaoying Wu; Dimitri Theodoratos; Wendy Hui Wang; Timos K. Sellis

Optimizing queries using materialized views has not been addressed adequately in the context of XML due to the many limitations associated with the definition and usability of materialized views in traditional XML query evaluation models. In this paper, we address the XML query optimization problem using materialized views in the framework of the inverted lists evaluation model which has been established as the most prominent one for evaluating queries on large persistent XML data. Under this framework, we propose a novel approach which instead of materializing the answer of a view materializes exactly the sublists of the inverted lists that are necessary for computing the answer of the view. A further originality of our approach is that the view materializations are stored as compressed bitmaps. This technique not only minimizes the materialization space but also reduces CPU and I/O costs by translating view materialization processing into bitwise operations. Our approach departs from the traditional approach which identifies a compensating expression that rewrites the query using the materialized views. Instead, it computes the query answer by executing holistic stack-based algorithms on the view materializations. We experimentally compared our approach with recent outstanding structural summary and B-tree based approaches. In order to make the comparison more competitive we also proposed an extension of a structural index approach to resolve combinatorial explosion problems. Our experimental results show that our compressed bitmapped materialized views approach is the most efficient, robust, and stable one for optimizing XML queries. It obtains significant performance savings at a very small space overhead and has negligible optimization time even for a large number of materialized views in the view pool.

geographic information science | 2013

Privacy-Preserving Distributed Movement Data Aggregation

Anna Monreale; Wendy Hui Wang; Francesca Pratesi; Salvatore Rinzivillo; Dino Pedreschi; Gennady L. Andrienko; Natalia V. Andrienko

We propose a novel approach to privacy-preserving analytical processing within a distributed setting, and tackle the problem of obtaining aggregated information about vehicle traffic in a city from movement data collected by individual vehicles and shipped to a central server. Movement data are sensitive because people’s whereabouts have the potential to reveal intimate personal traits, such as religious or sexual preferences, and may allow re-identification of individuals in a database. We provide a privacy-preserving framework for movement data aggregation based on trajectory generalization in a distributed environment. The proposed solution, based on the differential privacy model and on sketching techniques for efficient data compression, provides a formal data protection safeguard. Using real-life data, we demonstrate the effectiveness of our approach also in terms of data utility preserved by the data transformation.

conference on information and knowledge management | 2014

PraDa: Privacy-preserving Data-Deduplication-as-a-Service

Boxiang Dong; Ruilin Liu; Wendy Hui Wang

The data-cleaning-as-a-service (DCaS) paradigm enables users to outsource their data and data cleaning needs to computationally powerful third-party service providers. It raises several security issues. One of the issues is how the client can protect the private information in the outsourced data. In this paper, we focus on data deduplication as the main data cleaning task, and design two efficient privacy-preserving data-deduplication methods for the DCaS paradigm. We analyze the robustness of our two methods against the attacks that exploit the auxiliary frequency distribution and the knowledge of the encoding algorithms. Our empirical study demonstrates the efficiency and effectiveness of our privacy preserving approaches.

international conference on data mining | 2013

Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee

Boxiang Dong; Ruilin Liu; Wendy Hui Wang

In this paper, we focus on the problem of result integrity verification for outsourcing of frequent item set mining. We design efficient cryptographic approaches that verify whether the returned frequent item set mining results are correct and complete with deterministic guarantee. The key of our solution is that the service provider constructs cryptographic proofs of the mining results. Both correctness and completeness of the mining results are measured against the proofs. We optimize the verification by minimizing the number of proofs. Our empirical study demonstrates the efficiency and effectiveness of the verification approaches.

information reuse and integration | 2016

ARM: Authenticated Approximate Record Matching for Outsourced Databases

Boxiang Dong; Wendy Hui Wang

In this paper, we consider the outsourcing model in which a third-party server provides data integration as a service. Identifying approximately duplicate records in databases is an essential step for the information integration processes. Most existing approaches rely on estimating the similarity of potential duplicates. The service provider returns all records from the outsourced dataset that are similar according to specific distance metrics. A major security concern of this outsourcing paradigm is whether the service provider returns sound and complete near-duplicates. In this paper, we design ARM, an authentication system for the outsourced record matching. The key idea of ARM is that besides the similar record pairs, the server returns the verification object (VO) of these similar pairs to prove their correctness. First, we design an authenticated data structure named MB-tree for VO construction. Second, we design a lightweight authentication method that can catch the service providers various cheating behaviors by utilizing VOs. We perform an extensive set of experiment on real-world datasets to demonstrate that ARM can verify the record matching results with cheap cost.

international conference on data engineering | 2017

Frequency-Hiding Dependency-Preserving Encryption for Outsourced Databases

Boxiang Dong; Wendy Hui Wang

The cloud paradigm enables users to outsource their data to computationally powerful third-party service providers for data management. Many data management tasks rely on the data dependency in the outsourced data. This raises an important issue of how the data owner can protect the sensitive information in the outsourced data while preserving the data dependency. In this paper, we consider functional dependency (FD), an important type of data dependency. Although simple deterministic encryption schemes can preserve FDs, they may be vulnerable against the frequency analysis attack. We design a frequency hiding, FD-preserving probabilistic encryption scheme, named F2, that enables the service provider to discover the FDs from the encrypted dataset. We consider two attacks, namely the frequency analysis (FA) attack and the FD-preserving chosen plaintext attack (FCPA), and show that the F2 encryption scheme can defend against both attacks with formal provable guarantee. Our empirical study demonstrates the efficiency and effectiveness of F2, as well as its security against both FA and FCPA attacks.

IFIP Annual Conference on Data and Applications Security and Privacy | 2017

Budget-Constrained Result Integrity Verification of Outsourced Data Mining Computations

Bo Zhang; Boxiang Dong; Wendy Hui Wang

When outsourcing data mining needs to an untrusted service provider in the Data-Mining-as-a-Service (DMaS) paradigm, it is important to verify whether the service provider (server) returns correct mining results (in the format of data mining objects). We consider the setting in which each data mining object is associated with a weight for its importance. Given a client who is equipped with limited verification budget, the server selects a subset of mining results whose total verification cost does not exceed the given budget, while the total weight of the selected results is maximized. This maps to the well-known budgeted maximum coverage (BMC) problem, which is NP-hard. Therefore, the server may execute a heuristic algorithm to select a subset of mining results for verification. The server has financial incentives to cheat on the heuristic output, so that the client has to pay more for verification of the mining results that are less important. Our aim is to verify that the mining results selected by the server indeed satisfy the budgeted maximization requirement. It is challenging to verify the result integrity of the heuristic algorithms as the results are non-deterministic. We design a probabilistic verification method by including negative candidates (NCs) that are guaranteed to be excluded from the budgeted maximization result of the ratio-based BMC solutions. We perform extensive experiments on real-world datasets, and show that the NC-based verification approach can achieve high guarantee with small overhead.

computer software and applications conference | 2016

Privacy-Preserving Outsourcing of Data Mining

Anna Monreale; Wendy Hui Wang

Data mining is gaining momentum in society due to the ever increasing availability of large amounts of data, easily gathered by a variety of collection technologies and stored via computer systems. Due to the limited computational resources of data owners and the developments in cloud computing, there has been considerable recent interest in the paradigm of data mining-as-a-service (DMaaS). In this paradigm, a company (data owner) lacking in expertise or computational resources outsources its mining needs to a third party service provider (server). Given the fact that the server may not be fully trusted, one of the main concerns of the DMaaS paradigm is the protection of data privacy. In this paper, we provide an overview of a variety of techniques and approaches that address the privacy issues of the DMaaS paradigm.

Explore More