Is this you? Create Your Porfile

Ali Inan

University of Texas at Dallas

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ali Inan is active.

Explore More

Publication

Featured researches published by Ali Inan.

extending database technology | 2010

Private record matching using differential privacy

Ali Inan; Murat Kantarcioglu; Gabriel Ghinita; Elisa Bertino

Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.

data and knowledge engineering | 2007

Privacy preserving clustering on horizontally partitioned data

Ali Inan; Selim Volkan Kaya; Yücel Saygin; Erkay Savas; Ayça Azgin Hintoglu; Albert Levi

Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.

international conference on data engineering | 2008

A Hybrid Approach to Private Record Linkage

Ali Inan; Murat Kantarcioglu; Elisa Bertino; Monica Scannapieco

Real-world entities are not always represented by the same set of features in different data sets. Therefore matching and linking records corresponding to the same real-world entity distributed across these data sets is a challenging task. If the data sets contain private information, the problem becomes even harder due to privacy concerns. Existing solutions of this problem mostly follow two approaches: sanitization techniques and cryptographic techniques. The former achieves privacy by perturbing sensitive data at the expense of degrading matching accuracy. The later, on the other hand, attains both privacy and high accuracy under heavy communication and computation costs. In this paper, we propose a method that combines these two approaches and enables users to trade off between privacy, accuracy and cost. Experiments conducted on real data sets show that our method has significantly lower costs than cryptographic techniques and yields much more accurate matching results compared to sanitization techniques, even when the data sets are perturbed extensively.

international conference on data engineering | 2009

Using Anonymized Data for Classification

Ali Inan; Murat Kantarcioglu; Elisa Bertino

In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.

international conference on data engineering | 2006

Privacy Preserving Clustering on Horizontally Partitioned Data

Ali Inan; Yucel Saygyn; Erkay Savas; Ayca Azgyn Hintoglu; Albert Levi

Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. The power of data mining tools to extract hidden information that cannot be otherwise seen by simple querying proved to be useful. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites.

very large data bases | 2009

Query Optimization in Encrypted Relational Databases by Vertical Schema Partitioning

Mustafa Canim; Murat Kantarcioglu; Ali Inan

Security and privacy concerns, as well as legal considerations, force many companies to encrypt the sensitive data in their databases. However, storing the data in encrypted format entails significant performance penalties during query processing. In this paper, we address several design issues related to querying encrypted relational databases. The experiments we conducted on benchmark datasets show that excessive decryption costs during query processing result in CPU bottleneck. As a solution we propose a new method based on schema decomposition that partitions sensitive and non-sensitive attributes of a relation into two separate relations. Our method improves the system performance dramatically by parallelizing disk IO latency with CPU-intensive operations (i.e., encryption/decryption).

international conference on data mining | 2005

Suppressing data sets to prevent discovery of association rules

Ayça Azgin Hintoglu; Ali Inan; Yücel Saygin; Mehmet Keskinoz

Enterprises have been collecting data for many reasons including better customer relationship management, and high-level decision making. Public safety was another motivation for large-scale data collection efforts initiated by government agencies. However, such widespread data collection efforts coupled with powerful data analysis tools raised concerns about privacy. This is due to the fact that collected data may contain confidential information. One method to ensure privacy is to selectively hide confidential information from the data sets to be disclosed. In this paper, we focus on hiding confidential correlations. We introduce a heuristic to reduce the information loss and propose a blocking method that prevents discovery of confidential correlations while preserving the usefulness of the data set.

international conference on data mining | 2016

Explode: An Extensible Platform for Differentially Private Data Analysis

Emir Esmerdag; Mehmet Emre Gursoy; Ali Inan; Yücel Saygin

Differential privacy (DP) has emerged as a popular standard for privacy protection and received great attention from the research community. However, practitioners often find DP cumbersome to implement, since it requires additional protocols (e.g., for randomized response, noise addition) and changes to existing database systems. To avoid these issues we introduce Explode, a platform for differentially private data analysis. The power of Explode comes from its ease of deployment and use: The data owner can install Explode on top of an SQL server, without modifying any existing components. Explode then hosts a web application that allows users to conveniently perform many popular data analysis tasks through a graphical user interface, e.g., issuing statistical queries, classification, correlation analysis. Explode automatically converts these tasks to collections of SQL queries, and uses the techniques in [3] to determine the right amount of noise that should be added to satisfy DP while producing high utility outputs. This paper describes the current implementation of Explode, together with potential improvements and extensions.

Ubiquitous knowledge discovery | 2010