Mustafa Canim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mustafa Canim is active.

Explore More

Publication

Featured researches published by Mustafa Canim.

very large data bases | 2012

Secure multidimensional range queries over outsourced data

Bijit Hore; Sharad Mehrotra; Mustafa Canim; Murat Kantarcioglu

In this paper, we study the problem of supporting multidimensional range queries on encrypted data. The problem is motivated by secure data outsourcing applications where a client may store his/her data on a remote server in encrypted form and want to execute queries using server’s computational capabilities. The solution approach is to compute a secure indexing tag of the data by applying bucketization (a generic form of data partitioning) which prevents the server from learning exact values but still allows it to check if a record satisfies the query predicate. Queries are evaluated in an approximate manner where the returned set of records may contain some false positives. These records then need to be weeded out by the client which comprises the computational overhead of our scheme. We develop a bucketization procedure for answering multidimensional range queries on multidimensional data. For a given bucketization scheme, we derive cost and disclosure-risk metrics that estimate client’s computational overhead and disclosure risk respectively. Given a multidimensional dataset, its bucketization is posed as an optimization problem where the goal is to minimize the risk of disclosure while keeping query cost (client’s computational overhead) below a certain user-specified threshold value. We provide a tunable data bucketization algorithm that allows the data owner to control the trade-off between disclosure risk and cost. We also study the trade-off characteristics through an extensive set of experiments on real and synthetic data.

very large data bases | 2010

SSD bufferpool extensions for database systems

Mustafa Canim; George A. Mihaila; Bishwaranjan Bhattacharjee; Kenneth A. Ross; Christian A. Lang

High-end solid state disks (SSDs) provide much faster access to data compared to conventional hard disk drives. We present a technique for using solid-state storage as a caching layer between RAM and hard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced. For random I/O, the potential performance gains are particularly significant. Our system continuously monitors the disk access patterns to identify hot regions of the disk. Temperature statistics are maintained at the granularity of an extent, i.e., 32 pages, and are kept current through an aging mechanism. Unlike prior caching methods, once the SSD is populated with pages from warm regions cold pages are not admitted into the cache, leading to low levels of cache pollution. Simulations based on DB2 I/O traces, and a prototype implementation within DB2 both show substantial performance improvements.

international conference of the ieee engineering in medicine and biology society | 2012

Secure Management of Biomedical Data With Cryptographic Hardware

Mustafa Canim; Murat Kantarcioglu; Bradley Malin

The biomedical community is increasingly migrating toward research endeavors that are dependent on large quantities of genomic and clinical data. At the same time, various regulations require that such data be shared beyond the initial collecting organization (e.g., an academic medical center). It is of critical importance to ensure that when such data are shared, as well as managed, it is done so in a manner that upholds the privacy of the corresponding individuals and the overall security of the system. In general, organizations have attempted to achieve these goals through deidentification methods that remove explicitly, and potentially, identifying features (e.g., names, dates, and geocodes). However, a growing number of studies demonstrate that deidentified data can be reidentified to named individuals using simple automated methods. As an alternative, it was shown that biomedical data could be shared, managed, and analyzed through practical cryptographic protocols without revealing the contents of any particular record. Yet, such protocols required the inclusion of multiple third parties, which may not always be feasible in the context of trust or bandwidth constraints. Thus, in this paper, we introduce a framework that removes the need for multiple third parties by collocating services to store and to process sensitive biomedical data through the integration of cryptographic hardware. Within this framework, we define a secure protocol to process genomic data and perform a series of experiments to demonstrate that such an approach can be run in an efficient manner for typical biomedical investigations.

very large data bases | 2010

Building disclosure risk aware query optimizers for relational databases

Mustafa Canim; Murat Kantarcioglu; Bijit Hore; Sharad Mehrotra

Many DBMS products in the market provide built in encryption support to deal with the security concerns of the organizations. This solution is quite effective in preventing data leakage from compromised/stolen storage devices. However, recent studies show that a significant part of the leaked records have been done so by using specialized malwares that can access the main memory of systems. These malwares can easily capture the sensitive information that are decrypted in the memory including the cryptographic keys used to decrypt them. This can further compromise the security of data residing on disk that are encrypted with the same keys. In this paper we quantify the disclosure risk of encrypted data in a relational DBMS for main memory-based attacks and propose modifications to the standard query processing mechanism to minimize such risks. Specifically, we propose query optimization techniques and disclosure models to design a data-sensitivity aware query optimizer. We implemented a prototype DBMS by modifying both the storage engine and optimizer of MySQL-InnoDB server. The experimental results show that the disclosure risk of such attacks can be reduced dramatically while incurring a small performance overhead in most cases.

very large data bases | 2013

Making updates disk-I/O friendly using SSDs

Mohammad Sadoghi; Kenneth A. Ross; Mustafa Canim; Bishwaranjan Bhattacharjee

Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques for reducing index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates, and localizes maintenance to indexes on updated attributes. Further, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used Generalized Search Tree (GiST) open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost, by a factor of 3. For insertions of new records, our novel batching technique can save up to 90% of the insertion time.

very large data bases | 2014

Reducing database locking contention through multi-version concurrency

Mohammad Sadoghi; Mustafa Canim; Bishwaranjan Bhattacharjee; Fabian Nagel; Kenneth A. Ross

In multi-version databases, updates and deletions of records by transactions require appending a new record to tables rather than performing in-place updates. This mechanism incurs non-negligible performance overhead in the presence of multiple indexes on a table, where changes need to be propagated to all indexes. Additionally, an uncommitted record update will block other active transactions from using the index to fetch the most recently committed values for the updated record. In general, in order to support snapshot isolation and/or multi-version concurrency, either each active transaction is forced to search a database temporary area (e.g., roll-back segments) to fetch old values of desired records, or each transaction is forced to scan the entire table to find the older versions of the record in a multi-version database (in the absence of specialized temporal indexes). In this work, we describe a novel kV-Indirection structure to enable efficient (parallelizable) optimistic and pessimistic multi-version concurrency control by utilizing the old versions of records (at most two versions of each record) to provide direct access to the recent changes of records without the need of temporal indexes. As a result, our technique results in higher degree of concurrency by reducing the clashes between readers and writers of data and avoiding extended lock delays. We have a working prototype of our concurrency model and kV-Indirection structure in a commercial database and conducted an extensive evaluation to demonstrate the benefits of our multi-version concurrency control, and we obtained orders of magnitude speed up over the single-version concurrency control.

Proceedings of the 21st annual IFIP WG 11.3 working conference on Data and applications security | 2007

Design and analysis of querying encrypted data in relational databases

Mustafa Canim; Murat Kantarcioglu

Security and privacy concerns as well as legal considerations force many companies to encrypt the sensitive data in databases. However, storing the data in an encrypted format entails non-negligible performance penalties while processing queries. In this paper, we address several design issues related to querying encrypted data in relational databases. Based on our experiments, we propose new and efficient techniques to reduce the cost of cryptographic operations while processing different types of queries. Our techniques enable us not only to overlap the cryptographic operations with the IO latencies but also to reduce the number of block cipher operations with the help of selective decryption capabilities.

very large data bases | 2009

Query Optimization in Encrypted Relational Databases by Vertical Schema Partitioning

Mustafa Canim; Murat Kantarcioglu; Ali Inan

Security and privacy concerns, as well as legal considerations, force many companies to encrypt the sensitive data in their databases. However, storing the data in encrypted format entails significant performance penalties during query processing. In this paper, we address several design issues related to querying encrypted relational databases. The experiments we conducted on benchmark datasets show that excessive decryption costs during query processing result in CPU bottleneck. As a solution we propose a new method based on schema decomposition that partitions sensitive and non-sensitive attributes of a relation into two separate relations. Our method improves the system performance dramatically by parallelizing disk IO latency with CPU-intensive operations (i.e., encryption/decryption).

ieee international conference on cloud engineering | 2013

System G Data Store: Big, Rich Graph Data Analytics in the Cloud

Mustafa Canim; Yuan-Chi Chang

Big, rich graph data is increasingly captured through the interactions among people (email, messaging, social media), objects (location/map, server/network, product/catalog) and their relations. Graph data analytics, however, poses several intrinsic challenges that are ill fitted to the popular Map Reduce programming model. This paper presents System G, a graph data management system that supports rich graph data, accepts online updates, complies with Hadoop, and runs efficiently by minimizing redundant data shuffling. These desirable capabilities are built on top of Apache HBase for scalability, updatability and compatibility. This paper introduces several exemplary target graph queries and global feature algorithms implemented using the newly available HBase Coprocessors. These graph algorithmic coprocessors execute on the server side directly on graph data stored locally and only communicates with remote servers for the dynamic algorithmic state, which is typically a small fraction of the raw data. Performance evaluation on real-world rich graph datasets demonstrated significant improvement over traditional Hadoop implementation, as prior works observed in their no-graph-shuffling solutions. Our work stands out at achieving the same or better performance without introducing incompatibility or scalability limitations.

international congress on big data | 2013

Multi-resolution Social Network Community Identification and Maintenance on Big Data Platform

Hidayet Aksu; Mustafa Canim; Yuan-Chi Chang; Ibrahim Korpeoglu; Özgür Ulusoy

Community identification in social networks is of great interest and with dynamic changes to its graph representation and content, the incremental maintenance of community poses significant challenges in computation. Moreover, the intensity of community engagement can be distinguished at multiple levels, resulting in a multi-resolution community representation that has to be maintained over time. In this paper, we first formalize this problem using the k-core metric projected at multiple k values, so that multiple community resolutions are represented with multiple k-core graphs. We then present distributed algorithms to construct and maintain a multi-k-core graph, implemented on the scalable big-data platform Apache HBase. Our experimental evaluation results demonstrate orders of magnitude speedup by maintaining multi-k-core incrementally over complete reconstruction. Our algorithms thus enable practitioners to create and maintain communities at multiple resolutions on different topics in rich social network content simultaneously.

Explore More