Nedyalko Borisov
Duke University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nedyalko Borisov.
international conference on management of data | 2011
Herodotos Herodotou; Nedyalko Borisov; Shivnath Babu
Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained independent of one another. From their traditional use in improving query performance, partitioning strategies have evolved into a powerful mechanism to improve the overall manageability of database systems. Table partitioning simplifies administrative tasks like data loading, removal, backup, statistics maintenance, and storage provisioning. Query language extensions now enable applications and user queries to specify how their results should be partitioned for further use. However, query optimization techniques have not kept pace with the rapid advances in usage and user control of table partitioning. We address this gap by developing new techniques to generate efficient plans for SQL queries involving multiway joins over partitioned tables. Our techniques are designed for easy incorporation into bottom-up query optimizers that are in wide use today. We have prototyped these techniques in the PostgreSQL optimizer. An extensive evaluation shows that our partition-aware optimization techniques, with low optimization overhead, generate plans that can be an order of magnitude better than plans produced by current optimizers.
PLOS ONE | 2011
Faheem Mitha; Herodotos Herodotou; Nedyalko Borisov; Chen Jiang; Josh Yoder; Kouros Owzar
Background We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. Results The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. Conclusions By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.
international conference on management of data | 2011
Nedyalko Borisov; Shivnath Babu; Nagapramod Mandagere; Sandeep M. Uttamchandani
Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corruption today involves administrators writing ad hoc scripts that run data-integrity tests at the application, database, file-system, and storage levels. This manual approach is tedious, error-prone, and provides no understanding of the potential system unavailability and data loss if a corruption were to occur. We introduce the Amulet system that addresses the problem of verifying the correctness of stored data proactively and continuously. To our knowledge, Amulet is the first system that: (i) gives administrators a declarative language to specify their objectives regarding the detection and repair of data corruption; (ii) contains optimization and execution algorithms to ensure that the administrators objectives are met robustly and with least cost, e.g., using pay-as-you cloud resources; and (iii) provides timely notification when corruption is detected, allowing proactive repair of corruption before it impacts users and applications. We describe the implementation and a comprehensive evaluation of Amulet for a database software stack deployed on an infrastructure-as-a-service cloud provider.
international conference on data engineering | 2011
Nedyalko Borisov; Shivnath Babu; Nagapramod Mandagere; Sandeep M. Uttamchandani
The danger of production or backup data becoming corrupted is a problem that database administrators dread. This position paper aims to bring this problem to the attention of the database research community, which, surprisingly, has by and large overlooked this problem. We begin by pointing out the causes and consequences of data corruption. We then describe the Proactive Checking Framework (PCF), a new framework that enables a database system to deal with data corruption automatically and proactively. We use a prototype implementation of PCF to give deeper insights into the overall problem and to outline a challenging research agenda to address it.
extending database technology | 2013
Nedyalko Borisov; Shivnath Babu
The need to perform testing and tuning of database instances with production-like workloads (W), configurations (C), data (D), and resources (R) arises routinely. The further W, C, D, and R used in testing and tuning deviate from what is observed on the production database instance, the lower is the trustworthiness of the testing and tuning tasks done. For example, it is common to hear about performance degradation observed after the production database is upgraded from one software version to another. A typical cause of this problem is that the W, C, D, or R used during upgrade testing differed in some way from that on the production database. Performing testing and tuning tasks in principled and automated ways is very important, especially since---spurred by innovations in cloud computing---the number of database instances that a database administrator (DBA) has to manage is growing rapidly. We present Flex, a platform for trustworthy testing and tuning of production database instances. Flex gives DBAs a high-level language, called Slang, to specify definitions and objectives regarding running experiments for testing and tuning. Flexs orchestrator schedules and runs these experiments in an automated manner that meets the DBA-specified objectives. Flex has been fully prototyped. We present results from a comprehensive empirical evaluation that reveals the effectiveness of Flex on diverse problems such as upgrade testing, near-real-time testing to detect corruption of data, and server configuration tuning. We also report on our experiences taking some of the testing and tuning software described in the literature and porting them to run on the Flex platform.
very large data bases | 2009
Nedyalko Borisov; Shivnath Babu; Sandeep M. Uttamchandani; Ramani R. Routray; Aameek Singh
Many enterprise environments have databases running on network-attached storage infrastructure (referred to as Storage Area Networks or SANs). Both the database and the SAN are complex subsystems that are managed by separate teams of administrators. As often as not, database administrators have limited understanding of SAN configuration and behavior, and limited visibility into the SANs run-time performance; and vice versa for the SAN administrators. Diagnosing the cause of performance problems is a challenging exercise in these environments. We propose to remedy the situation through a novel tool, called Diads, for database and SAN problem diagnosis. This demonstration proposal summarizes the technical innovations in Diads: (i) a powerful abstraction called Annotated Plan Graphs (APGs) that ties together the execution path of queries in the database and the SAN using low-overhead monitoring data, and (ii) a diagnosis workflow that combines domain-specific knowledge with machine-learning techniques. The scenarios presented in the demonstration are also described.
conference on innovative data systems research | 2011
Herodotos Herodotou; Harold Lim; Gang Luo; Nedyalko Borisov; Liang Dong; Fatma Bilgen Cetin; Shivnath Babu
workshop on hot topics in operating systems | 2009
Shivnath Babu; Nedyalko Borisov; Songyun Duan; Herodotos Herodotou; Vamsidhar Thummala
conference on innovative data systems research | 2009
Nedyalko Borisov; Shivnath Babu; Sandeep M. Uttamchandani; Ramani R. Routray; Aameek Singh
file and storage technologies | 2009
Shivnath Babu; Nedyalko Borisov; Sandeep M. Uttamchandani; Ramani R. Routray; Aameek Singh