Yonghui Xiao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yonghui Xiao is active.

Explore More

Publication

Featured researches published by Yonghui Xiao.

very large data bases | 2010

Differentially private data release through multidimensional partitioning

Yonghui Xiao; Li Xiong; Chun Yuan

Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release based on an interactive differential privacy interface. We propose two multidimensional partitioning strategies including a baseline cell-based partitioning and an innovative kd-tree based partitioning. In addition to providing formal proofs for differential privacy and usefulness guarantees for linear distributive queries, we also present a set of experimental results and demonstrate the feasibility and performance of our method.

Journal of the American Medical Informatics Association | 2013

SHARE: system design and case studies for statistical health information release

James J. Gardner; Li Xiong; Yonghui Xiao; Jingjing Gao; Andrew R. Post; Xiaoqian Jiang; Lucila Ohno-Machado

OBJECTIVES We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. MATERIALS AND METHODS SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. RESULTS Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback-Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. CONCLUSIONS SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.

international conference on data engineering | 2012

DPCube: Releasing Differentially Private Data Cubes for Health Information

Yonghui Xiao; James J. Gardner; Li Xiong

We demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component uses differentially private access mechanisms and an innovative 2-phase multidimensional partitioning strategy to publish a multi-dimensional data cube or histogram that achieves good utility while satisfying differential privacy. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.

international conference on data engineering | 2017

Quantifying Differential Privacy under Temporal Correlations

Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong

Differential Privacy (DP) has received increasing attention as a rigorous privacy framework. Many existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives, which assume that the data are independent, or that adversaries do not have knowledge of the data correlations. However, continuous generated data in the real world tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. First, we model the temporal correlations using Markov model and analyze the privacy leakage of a DP mechanism when adversaries have knowledge of such temporal correlations. Our analysis reveals that the privacy loss of a DP mechanism may accumulate and increase over time. We call it temporal privacy leakage. Second, to measure such privacy loss, we design an efficient algorithm for calculating it in polynomial time. Although the temporal privacy leakage may increase over time, we also show that its supremum may exist in some cases. Third, to bound the privacy loss, we propose mechanisms that convert any existing DP mechanism into one against temporal privacy leakage. Experiments with synthetic data confirm that our approach is efficient and effective.

very large data bases | 2017

LocLok: location cloaking with differential privacy via hidden markov model

Yonghui Xiao; Li Xiong; Si Zhang; Yang Cao

We demonstrate LocLok, a LOCation-cLOaKing system to protect the locations of a user with differential privacy. LocLok has two features: (a) it protects locations under temporal correlations described through hidden Markov model; (b) it releases the optimal noisy location with the planar isotropic mechanism (PIM), the first mechanism that achieves the lower bound of differential privacy. We show the detailed computation of LocLok with the following components: (a) how to generate the possible locations with Markov model, (b) how to perturb the location with PIM, and (c) how to make inference about the true location in Markov model. An online system with real-word dataset will be presented with the computation details.

very large data bases | 2018

ConTPL: controlling temporal privacy leakage in differentially private continuous data release

Yang Cao; Li Xiong; Masatoshi Yoshikawa; Yonghui Xiao; Si Zhang

In many real-world systems, such as Internet of Thing, sensitive data streams are collected and analyzed continually. To protect privacy, a number of mechanisms are designed to achieve ϵ-differential privacy for processing sensitive streaming data, whose privacy loss is rigorously controlled within a given parameter ϵ. However, most of the existing studies do not consider the effect of temporal correlations among the continuously generated data on the privacy loss. Our recent work reveals that, the privacy loss of a traditional DP mechanism (e.g., Laplace mechanism) may not be bounded by ϵ due to temporal correlations. We call such unexpected privacy loss Temporal Privacy Leakage (TPL). In this demonstration, we design a system, ConTPL, which is able to automatically convert an existing differentially private streaming data release mechanism into one bounding TPL within a specified level. ConTPL also provides an interactive interface and real-time visualization to help data curator understand and explore the effect of different parameters on TPL.

advances in geographic information systems | 2016

A Markov chain based pruning method for predictive range queries

Xiaofeng Xu; Li Xiong; Vaidy S. Sunderam; Yonghui Xiao

Predictive range queries retrieve objects in a certain spatial region at a (future) prediction time. Processing predictive range queries on large moving object databases is expensive. Thus effective pruning is important, especially for long-term predictive queries since accurately predicting long-term future behaviors of moving objects is challenging and expensive. In this work, we propose a pruning method that effectively reduces the candidate set for predictive range queries based on (high-order) Markov chain models learned from historical trajectories. The key to our method is to devise compressed representations for sparse multi-dimensional matrices, and leverage efficient algorithms for matrix computations. Experimental evaluations show that our approach significantly outperforms other pruning methods in terms of efficiency and precision.

computer and communications security | 2015