Is this you? Create Your Porfile

Jianneng Cao

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianneng Cao is active.

Explore More

Publication

Featured researches published by Jianneng Cao.

very large data bases | 2012

PrivBasis: frequent itemset mining with differential privacy

Ninghui Li; Wahbeh H. Qardaji; Dong Su; Jianneng Cao

The discovery of frequent itemsets can serve valuable economic and research purposes. Releasing discovered frequent itemsets, however, presents privacy challenges. In this paper, we study the problem of how to perform frequent itemset mining on transaction databases while satisfying differential privacy. We propose an approach, called PrivBasis, which leverages a novel notion called basis sets. A θ-basis set has the property that any itemset with frequency higher than θ is a subset of some basis. We introduce algorithms for privately constructing a basis set and then using it to find the most frequent itemsets. Experiments show that our approach greatly outperforms the current state of the art.

IEEE Transactions on Dependable and Secure Computing | 2011

CASTLE: Continuously Anonymizing Data Streams

Jianneng Cao; Barbara Carminati; Elena Ferrari; Kian-Lee Tan

Most of the existing privacy-preserving techniques, such as k-anonymity methods, are designed for static data sets. As such, they cannot be applied to streaming data which are continuous, transient, and usually unbounded. Moreover, in streaming applications, there is a need to offer strong guarantees on the maximum allowed delay between incoming data and the corresponding anonymized output. To cope with these requirements, in this paper, we present Continuously Anonymizing STreaming data via adaptive cLustEring (CASTLE), a cluster-based scheme that anonymizes data streams on-the-fly and, at the same time, ensures the freshness of the anonymized data by satisfying specified delay constraints. We further show how CASTLE can be easily extended to handle ℓ-diversity. Our extensive performance study shows that CASTLE is efficient and effective w.r.t. the quality of the output data.

very large data bases | 2010

ρ-uncertainty: inference-proof transaction anonymization

Jianneng Cao; Panagiotis Karras; Chedy Raïssi; Kian-Lee Tan

The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a persons behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose ρ-uncertainty, the first, to our knowledge, privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversarys knowledge and without falsifying data. The problem of achieving ρ-uncertainty with low information loss is challenging because it is natural. A trivial solution is to suppress all sensitive items. We develop more sophisticated schemes. In a broad experimental study, we show that the problem is solved non-trivially by a technique that combines generalization and suppression, which also achieves favorable results compared to a baseline perturbation-based scheme.The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a persons behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose ρ-uncertainty, the first, to our knowledge, privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversarys knowledge and without falsifying data. The problem of achieving ρ-uncertainty with low information loss is challenging because it is natural. A trivial solution is to suppress all sensitive items. We develop more sophisticated schemes. In a broad experimental study, we show that the problem is solved non-trivially by a technique that combines generalization and suppression, which also achieves favorable results compared to a baseline perturbation-based scheme.

very large data bases | 2012

Publishing microdata with a robust privacy guarantee

Jianneng Cao; Panagiotis Karras

Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds the percentage of information an adversary gains after seeing the published data for each sensitive value therein. This paper introduces β-likeness, an appropriately robust privacy model for microdata anonymization, along with two anonymization schemes designed therefore, the one based on generalization, and the other based on perturbation. Our model postulates that an adversarys confidence on the likelihood of a certain sensitive-attribute (SA) value should not increase, in relative difference terms, by more than a predefined threshold. Our techniques aim to satisfy a given β threshold with little information loss. We experimentally demonstrate that (i) our model provides an effective privacy guarantee in a way that predecessor models cannot, (ii) our generalization scheme is more effective and efficient in its task than methods adapting algorithms for the k-anonymity model, and (iii) our perturbation method outperforms a baseline approach. Moreover, we discuss in detail the resistance of our model and methods to attacks proposed in previous research.

very large data bases | 2011

SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness

Jianneng Cao; Panagiotis Karras; Panos Kalnis; Kian-Lee Tan

Today, the publication of microdata poses a privacy threat: anonymous personal records can be re-identified using third data sources. Past research has tried to develop a concept of privacy guarantee that an anonymized data set should satisfy before publication, culminating in the notion of t-closeness. To satisfy t-closeness, the records in a data set need to be grouped into Equivalence Classes (ECs), such that each EC contains records of indistinguishable quasi-identifier values, and its local distribution of sensitive attribute (SA) values conforms to the global table distribution of SA values. However, despite this progress, previous research has not offered an anonymization algorithm tailored for t-closeness. In this paper, we cover this gap with SABRE, a SA Bucketization and REdistribution framework for t-closeness. SABRE first greedily partitions a table into buckets of similar SA values and then redistributes the tuples of each bucket into dynamically determined ECs. This approach is facilitated by a property of the Earth Mover’s Distance (EMD) that we employ as a measure of distribution closeness: If the tuples in an EC are picked proportionally to the sizes of the buckets they hail from, then the EMD of that EC is tightly upper-bounded using localized upper bounds derived for each bucket. We prove that if the t-closeness constraint is properly obeyed during partitioning, then it is obeyed by the derived ECs too. We develop two instantiations of SABRE and extend it to a streaming environment. Our extensive experimental evaluation demonstrates that SABRE achieves information quality superior to schemes that merely applied algorithms tailored for other models to t-closeness, and can be much faster as well.

ACM Transactions on Information and System Security | 2010

A framework to enforce access control over data streams

Barbara Carminati; Elena Ferrari; Jianneng Cao; Kian-Lee Tan

Although access control is currently a key component of any computational system, it is only recently that mechanisms to guard against unauthorized access to streaming data have started to be investigated. To cope with this lack, in this article, we propose a general framework to protect streaming data, which is, as much as possible, independent from the target stream engine. Differently from RDBMSs, up to now a standard query language for data streams has not yet emerged and this makes the development of a general solution to access control enforcement more difficult. The framework we propose in this article is based on an expressive role-based access control model proposed by us. It exploits a query rewriting mechanism, which rewrites user queries in such a way that they do not return tuples/attributes that should not be accessed according to the specified access control policies. Furthermore, the framework contains a deployment module able to translate the rewritten query in such a way that it can be executed by different stream engines, therefore, overcoming the lack of standardization. In the article, besides presenting all the components of our framework, we prove the correctness and completeness of the query rewriting algorithm, and we present some experiments that show the feasibility of the developed techniques.

international conference on data engineering | 2008

CASTLE: A delay-constrained scheme for k s -anonymizing data streams

Jianneng Cao; Barbara Carminati; Elena Ferrari; Kian-Lee Tan

Most of existing privacy preserving techniques, such as anonymity methods, are designed for static data sets. As such, they cannot be applied to streaming data which are continuous, transient and usually unbounded. Moreover, in streaming applications, there is a need to offer strong guarantees on the maximum allowed delay between an incoming data and its anonymized output. To cope with these requirements, in this paper, we present CASTLE (continuously anonymizing streaming data via adaptive clustering), a cluster-based scheme that anonymizes data streams on-the-fly and, at the same time, ensures the freshness of the anonymized data by satisfying specified delay constraints. We further show how CASTLE can be easily extended to handle Z-diversity. Our extensive performance study shows that CASTLE is efficient and effective.

conference on data and application security and privacy | 2016

Differentially Private K-Means Clustering

Dong Su; Jianneng Cao; Ninghui Li; Elisa Bertino; Hongxia Jin

There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.

international conference on data engineering | 2009

ACStream: Enforcing Access Control over Data Streams

Jianneng Cao; Barbara Carminati; Elena Ferrari; Kian-Lee Tan

In this demo proposal, we illustrate ACStream, a system built on top of Stream Base [1], to specify and enforce access control policies over data streams. ACStream supports a very flexible role-based access control model specifically designed to protect against unauthorized access to streaming data. The core component of ACStream is a query rewriting mechanism that, by exploiting a set of secure operators proposed by us in [2], rewrites a user query in such a way that it does not violate the specified access control policies during its execution. The demo will show how policies modelling a variety of access control requirements can be easily specified and enforced using ACStream.

extending database technology | 2013

Efficient and accurate strategies for differentially-private sliding window queries

Jianneng Cao; Qian Xiao; Gabriel Ghinita; Ninghui Li; Elisa Bertino; Kian-Lee Tan

Regularly releasing the aggregate statistics about data streams in a privacy-preserving way not only serves valuable commercial and social purposes, but also protects the privacy of individuals. This problem has already been studied under differential privacy, but only for the case of a single continuous query that covers the entire time span, e.g., counting the number of tuples seen so far in the stream. However, most real-world applications are window-based, that is, they are interested in the statistical information about streaming data within a window, instead of the whole unbound stream. Furthermore, a Data Stream Management System (DSMS) may need to answer numerous correlated aggregated queries simultaneously, rather than a single one. To cope with these requirements, we study how to release differentially private answers for a set of sliding window aggregate queries. We propose two solutions, each consisting of query sampling and composition. We first selectively sample a subset of representative sliding window queries from the set of all the submitted ones. The representative queries are answered by adding Laplace noises in a way satisfying differential privacy. For each non-representative query, we compose its answer from the query results of those representatives. The experimental evaluation shows that our solutions are efficient and effective.

Explore More