Luca Bonomi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luca Bonomi is active.

Explore More

Publication

Featured researches published by Luca Bonomi.

conference on information and knowledge management | 2013

A two-phase algorithm for mining sequential patterns with differential privacy

Luca Bonomi; Li Xiong

Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-the-art solutions.

conference on information and knowledge management | 2012

Frequent grams based embedding for privacy preserving record linkage

Luca Bonomi; Li Xiong; Rui Chen; Benjamin C. M. Fung

In this paper, we study the problem of privacy preserving record linkage which aims to perform record linkage without revealing anything about the non-linked records. We propose a new secure embedding strategy based on frequent variable length grams which allows record linkage on the embedded space. The frequent grams used for constructing the embedding base are mined from the original database under the framework of differential privacy. Compared with the state-of-the-art secure matching schema [15], our approach provides formal, provable privacy guarantees and achieves better scalability while providing comparable utility.

international world wide web conferences | 2014

Monitoring web browsing behavior with differential privacy

Liyue Fan; Luca Bonomi; Li Xiong; Vaidy S. Sunderam

Monitoring web browsing behavior has benefited many data mining applications, such as top-K discovery and anomaly detection. However, releasing private user data to the greater public would concern web users about their privacy, especially after the incident of AOL search log release where anonymization was not correctly done. In this paper, we adopt differential privacy, a strong, provable privacy definition, and show that differentially private aggregates of web browsing activities can be released in real-time while preserving the utility of shared data. Our proposed algorithms utilize the rich correlation of the time series of aggregated data and adopt a state-space approach to estimate the underlying, true aggregates from the perturbed values by the differential privacy mechanism. We evaluate our algorithms with real-world web browsing data. Utility evaluations with three metrics demonstrate that the quality of the private, released data by our solutions closely resembles that of the original, unperturbed aggregates.

very large data bases | 2013

Mining frequent patterns with differential privacy

Luca Bonomi; Li Xiong

The mining of frequent patterns is a fundamental component in many data mining tasks. A considerable amount of research on this problem has led to a wide series of efficient and scalable algorithms for mining frequent patterns. However, releasing these patterns is posing concerns on the privacy of the users participating in the data. Indeed the information from the patterns can be linked with a large amount of data available from other sources creating opportunities for adversaries to break the individual privacy of the users and disclose sensitive information. In this proposal, we study the mining of frequent patterns in a privacy preserving setting. We first investigate the difference between sequential and itemset patterns, and second we extend the definition of patterns by considering the absence and presence of noise in the data. This leads us in distinguishing the patterns between exact and noisy. For exact patterns, we describe two novel mining techniques that we previously developed. The first approach has been applied in a privacy preserving record linkage setting, where our solution is used to mine frequent patterns which are employed in a secure transformation procedure to link records that are similar. The second approach improves the mining utility results using a two-phase strategy which allows to effectively mine frequent substrings as well as prefixes patterns. For noisy patterns, first we formally define the patterns according to the type of noise and second we provide a set of potential applications that require the mining of these patterns. We conclude the paper by stating the challenges in this new setting and possible future research directions.

Annals of the New York Academy of Sciences | 2017

Genome Privacy: Challenges, Technical Approaches to Mitigate Risk, and Ethical Considerations in the United States

Shuang Wang; Xiaoqian Jiang; Siddharth Singh; Rebecca A. Marmor; Luca Bonomi; Dov Fox; Michelle Dow; Lucila Ohno-Machado

Accessing and integrating human genomic data with phenotypes are important for biomedical research. Making genomic data accessible for research purposes, however, must be handled carefully to avoid leakage of sensitive individual information to unauthorized parties and improper use of data. In this article, we focus on data sharing within the scope of data accessibility for research. Current common practices to gain biomedical data access are strictly rule based, without a clear and quantitative measurement of the risk of privacy breaches. In addition, several types of studies require privacy‐preserving linkage of genotype and phenotype information across different locations (e.g., genotypes stored in a sequencing facility and phenotypes stored in an electronic health record) to accelerate discoveries. The computer science community has developed a spectrum of techniques for data privacy and confidentiality protection, many of which have yet to be tested on real‐world problems. In this article, we discuss clinical, technical, and ethical aspects of genome data privacy and confidentiality in the United States, as well as potential solutions for privacy‐preserving genotype–phenotype linkage in biomedical research.

international conference on management of data | 2013

LinkIT: privacy preserving record linkage and integration via transformations

Luca Bonomi; Li Xiong; James J. Lu

We propose to demonstrate an open-source tool, LinkIT, for privacy preserving record Linkage and Integration via data Transformations. LinkIT implements novel algorithms that support data transformations for linking sensitive attributes, and is designed to work with our previously developed tool, FRIL (Fine-grained Record Integration and Linkage), to provide a complete record linkage solution. LinkIT can be also used as a stand-alone secure transformation tool to link string records. The system uses a novel embedding technique based on frequent variable length grams mined from original records with differential privacy, and utilizes a personalized threshold for performing linkage in the embedded space. Compared to the state-of-the-art secure transformation method [16], LinkIT guarantees stronger privacy with better scalability while achieving comparable utility results.

IEEE Transactions on Big Data | 2016

Big Data Privacy in Biomedical Research

Shuang Wang; Luca Bonomi; Wenrui Dai; Feng Chen; Cynthia Cheung; Cinnamon S. Bloss; Samuel Cheng; Xiaoqian Jiang

Biomedical research often involves studying patient data that contain personal information. Inappropriate use of these data might lead to leakage of sensitive information, which can put patient privacy at risk. The problem of preserving patient privacy has received increasing attentions in the era of big data. Many privacy methods have been developed to protect against various attack models. This paper reviews relevant topics in the context of biomedical research. We discuss privacy preserving technologies related to (1) record linkage, (2) synthetic data generation, and (3) genomic data privacy. We also discuss the ethical implications of big data privacy in biomedicine and present challenges in future research directions for improving data privacy in biomedical research.

symposium on large spatial databases | 2017

Multi-user Itinerary Planning for Optimal Group Preference

Liyue Fan; Luca Bonomi; Cyrus Shahabi; Li Xiong

The increasing popularity of location-based applications creates new opportunities for users to travel together. In this paper, we study a novel spatio-social optimization problem, i.e., Optimal Group Route, for multi-user itinerary planning. With our problem formulation, users can individually specify sources and destinations, preferences on the Point-of-interest (POI) categories, as well as the distance constraints. The goal is to find a itinerary that can be traversed by all the users while maximizing the group’s preference of POI categories in the itinerary. Our work advances existing group trip planning studies by maximizing the group’s social experience. To this end, individual preferences of POI categories are aggregated by considering the agreement and disagreement among group members. Furthermore, planning a multi-user itinerary on large real-world networks is computationally challenging. We propose one approximate solution with bounded approximation ratio and one exact solution which computes the optimal itinerary by exploring a limited number of paths in the road network. In addition, an effective compression algorithm is developed to reduce the size of the network, providing a significant acceleration in our exact solution. We conduct extensive empirical evaluations on the road network and POI datasets of Los Angeles and our results confirm the effectiveness and efficiency of our solutions.

Journal of Biomedical Informatics | 2018

Patient ranking with temporally annotated data

Luca Bonomi; Xiaoqian Jiang

Modern medical information systems enable the collection of massive temporal health data. Albeit these data have great potentials for advancing medical research, the data exploration and extraction of useful knowledge present significant challenges. In this work, we develop a new pattern matching technique which aims to facilitate the discovery of clinically useful knowledge from large temporal datasets. Our approach receives in input a set of temporal patterns modeling specific events of interest (e.g., doctors knowledge, symptoms of diseases) and it returns data instances matching these patterns (e.g., patients exhibiting the specified symptoms). The resulting instances are ranked according to a significance score based on the p-value. Our experimental evaluations on a real-world dataset demonstrate the efficiency and effectiveness of our approach.

Statistical Methods in Medical Research | 2018

Linking temporal medical records using non-protected health information data

Luca Bonomi; Xiaoqian Jiang

Modern medical research relies on multi-institutional collaborations which enhance the knowledge discovery and data reuse. While these collaborations allow researchers to perform analytics otherwise impossible on individual datasets, they often pose significant challenges in the data integration process. Due to the lack of a unique identifier, data integration solutions often have to rely on patient’s protected health information (PHI). In many situations, such information cannot leave the institutions or must be strictly protected. Furthermore, the presence of noisy values for these attributes may result in poor overall utility. While much research has been done to address these challenges, most of the current solutions are designed for a static setting without considering the temporal information of the data (e.g. EHR). In this work, we propose a novel approach that uses non-PHI for linking patient longitudinal data. Specifically, our technique captures the diagnosis dependencies using patterns which are shown to provide important indications for linking patient records. Our solution can be used as a standalone technique to perform temporal record linkage using non-protected health information data or it can be combined with Privacy Preserving Record Linkage solutions (PPRL) when protected health information is available. In this case, our approach can solve ambiguities in results. Experimental evaluations on real datasets demonstrate the effectiveness of our technique.

Explore More