Yonghui Xiao
Emory University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yonghui Xiao.
very large data bases | 2010
Yonghui Xiao; Li Xiong; Chun Yuan
Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release based on an interactive differential privacy interface. We propose two multidimensional partitioning strategies including a baseline cell-based partitioning and an innovative kd-tree based partitioning. In addition to providing formal proofs for differential privacy and usefulness guarantees for linear distributive queries, we also present a set of experimental results and demonstrate the feasibility and performance of our method.
Journal of the American Medical Informatics Association | 2013
James J. Gardner; Li Xiong; Yonghui Xiao; Jingjing Gao; Andrew R. Post; Xiaoqian Jiang; Lucila Ohno-Machado
OBJECTIVES We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. MATERIALS AND METHODS SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. RESULTS Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback-Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. CONCLUSIONS SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.
international conference on data engineering | 2012
Yonghui Xiao; James J. Gardner; Li Xiong
We demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component uses differentially private access mechanisms and an innovative 2-phase multidimensional partitioning strategy to publish a multi-dimensional data cube or histogram that achieves good utility while satisfying differential privacy. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.
international conference on data engineering | 2017
Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong
Differential Privacy (DP) has received increasing attention as a rigorous privacy framework. Many existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives, which assume that the data are independent, or that adversaries do not have knowledge of the data correlations. However, continuous generated data in the real world tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. First, we model the temporal correlations using Markov model and analyze the privacy leakage of a DP mechanism when adversaries have knowledge of such temporal correlations. Our analysis reveals that the privacy loss of a DP mechanism may accumulate and increase over time. We call it temporal privacy leakage. Second, to measure such privacy loss, we design an efficient algorithm for calculating it in polynomial time. Although the temporal privacy leakage may increase over time, we also show that its supremum may exist in some cases. Third, to bound the privacy loss, we propose mechanisms that convert any existing DP mechanism into one against temporal privacy leakage. Experiments with synthetic data confirm that our approach is efficient and effective.
very large data bases | 2017
Yonghui Xiao; Li Xiong; Si Zhang; Yang Cao
We demonstrate LocLok, a LOCation-cLOaKing system to protect the locations of a user with differential privacy. LocLok has two features: (a) it protects locations under temporal correlations described through hidden Markov model; (b) it releases the optimal noisy location with the planar isotropic mechanism (PIM), the first mechanism that achieves the lower bound of differential privacy. We show the detailed computation of LocLok with the following components: (a) how to generate the possible locations with Markov model, (b) how to perturb the location with PIM, and (c) how to make inference about the true location in Markov model. An online system with real-word dataset will be presented with the computation details.
very large data bases | 2018
Yang Cao; Li Xiong; Masatoshi Yoshikawa; Yonghui Xiao; Si Zhang
In many real-world systems, such as Internet of Thing, sensitive data streams are collected and analyzed continually. To protect privacy, a number of mechanisms are designed to achieve ϵ-differential privacy for processing sensitive streaming data, whose privacy loss is rigorously controlled within a given parameter ϵ. However, most of the existing studies do not consider the effect of temporal correlations among the continuously generated data on the privacy loss. Our recent work reveals that, the privacy loss of a traditional DP mechanism (e.g., Laplace mechanism) may not be bounded by ϵ due to temporal correlations. We call such unexpected privacy loss Temporal Privacy Leakage (TPL). In this demonstration, we design a system, ConTPL, which is able to automatically convert an existing differentially private streaming data release mechanism into one bounding TPL within a specified level. ConTPL also provides an interactive interface and real-time visualization to help data curator understand and explore the effect of different parameters on TPL.
advances in geographic information systems | 2016
Xiaofeng Xu; Li Xiong; Vaidy S. Sunderam; Yonghui Xiao
Predictive range queries retrieve objects in a certain spatial region at a (future) prediction time. Processing predictive range queries on large moving object databases is expensive. Thus effective pruning is important, especially for long-term predictive queries since accurately predicting long-term future behaviors of moving objects is challenging and expensive. In this work, we propose a pruning method that effectively reduces the candidate set for predictive range queries based on (high-order) Markov chain models learned from historical trajectories. The key to our method is to devise compressed representations for sparse multi-dimensional matrices, and leverage efficient algorithms for matrix computations. Experimental evaluations show that our approach significantly outperforms other pruning methods in terms of efficiency and precision.
computer and communications security | 2015
Yonghui Xiao; Li Xiong
Transactions on Data Privacy | 2014
Yonghui Xiao; Li Xiong; Liyue Fan; Slawomir Goryczka; Haoran Li
arXiv: Databases | 2018
Yang Cao; Yonghui Xiao; Li Xiong; Liquan Bai