Aaron J. Elmore | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron J. Elmore is active.

Explore More

Publication

Featured researches published by Aaron J. Elmore.

international conference on management of data | 2015

The BigDAWG Polystore System

Jennie Duggan; Aaron J. Elmore; Michael Stonebraker; Magdalena Balazinska; Bill Howe; Jeremy Kepner; Samuel Madden; David Maier; Timothy G. Mattson; Stan Zdonik

This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that â no one size fits allâ . To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data

very large data bases | 2012

Serializability, not serial: concurrency control and availability in multi-datacenter datastores

Stacy Patterson; Aaron J. Elmore; Faisal Nawab; Divyakant Agrawal; Amr El Abbadi

We present a framework for concurrency control and availability in multi-datacenter datastores. While we consider Googles Megastore as our motivating example, we define general abstractions for key components, making our solution extensible to any system that satisfies the abstraction properties. We first develop and analyze a transaction management and replication protocol based on a straightforward implementation of the Paxos algorithm. Our investigation reveals that this protocol acts as a concurrency prevention mechanism rather than a concurrency control mechanism. We then propose an enhanced protocol called Paxos with Combination and Promotion (Paxos-CP) that provides true transaction concurrency while requiring the same per instance message complexity as the basic Paxos protocol. Finally, we compare the performance of Paxos and Paxos-CP in a multi-datacenter experimental study, and we demonstrate that Paxos-CP results in significantly fewer aborted transactions than basic Paxos.

ieee high performance extreme computing conference | 2016

The BigDAWG polystore system and architecture

Vijay Gadepally; Peinan Chen; Jennie Duggan; Aaron J. Elmore; Brandon Haynes; Jeremy Kepner; Samuel Madden; Tim Mattson; Michael Stonebraker

Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.

very large data bases | 2015

Collaborative data analytics with DataHub

Anant P. Bhardwaj; Amol Deshpande; Aaron J. Elmore; David R. Karger; Samuel Madden; Aditya G. Parameswaran; Harihar Subramanyam; Eugene Wu; Rebecca Zhang

While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various data-processing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook - an IPython-based notebook for analyzing data and storing the results of data analysis.

international conference on management of data | 2015

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

Aaron J. Elmore; Vaibhav Arora; Rebecca Taft; Andrew Pavlo; Divyakant Agrawal; Amr El Abbadi

For data-intensive applications with many concurrent users, modern distributed main memory database management systems (DBMS) provide the necessary scale-out support beyond what is possible with single-node systems. These DBMSs are optimized for the short-lived transactions that are common in on-line transaction processing (OLTP) workloads. One way that they achieve this is to partition the database into disjoint subsets and use a single-threaded transaction manager per partition that executes transactions one-at-a-time in serial order. This minimizes the overhead of concurrency control mechanisms, but requires careful partitioning to limit distributed transactions that span multiple partitions. Previous methods used off-line analysis to determine how to partition data, but the dynamic nature of these applications means that they are prone to hotspots. In these situations, the DBMS needs to reconfigure how data is partitioned in real-time to maintain performance objectives. Bringing the system off-line to reorganize the database is unacceptable for on-line applications. To overcome this problem, we introduce the Squall technique for supporting live reconfiguration in partitioned, main memory DBMSs. Squall supports fine-grained repartitioning of databases in the presence of distributed transactions, high throughput client workloads, and replicated data. An evaluation of our approach on a distributed DBMS shows that Squall can reconfigure a database with no downtime and minimal overhead on transaction latency.

very large data bases | 2016

Decibel: the relational dataset branching system

Michael Maddox; David Goehring; Aaron J. Elmore; Samuel Madden; Aditya G. Parameswaran; Amol Deshpande

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs.

computational science and engineering | 2012

The evolving landscape of data management in the cloud

Divyakant Agrawal; Amr El Abbadi; Beng Chin Ooi; Sudipto Das; Aaron J. Elmore

Scalable database management systems (DBMSs) are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the classical enterprise infrastructures to next generation cloud infrastructures. Though scalable data management on distributed platforms has been a vision for more than three decades and much research has focused on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment that is inherently distributed. This article presents an organised picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses systems for supporting update heavy applications and focuses on providing an in-depth analysis of such systems. We crystallise the design choices made by some successful large scale database management systems, analyse the application demands and access patterns and how the scalable database management systems have evolved to meet such requirements.

ieee high performance extreme computing conference | 2016

Integrating real-time and batch processing in a polystore

John Meehan; Stan Zdonik; Shaobo Tian; Yulong Tian; Nesime Tatbul; Adam Dziedzic; Aaron J. Elmore

This paper describes a stream processing engine called S-Store and its role in the BigDAWG polystore. Fundamentally, S-Store acts as a frontend processor that accepts input from multiple sources, and massages it into a form that has eliminated errors (data cleaning) and translates that input into a form that can be efficiently ingested into BigDAWG. S-Store also acts as an intelligent router that sends input tuples to the appropriate components of BigDAWG. All updates to S-Stores shared memory are done in a transactionally consistent (ACID) way, thereby eliminating new errors caused by non-synchronized reads and writes. The ability to migrate data from component to component of BigDAWG is crucial. We have described a migrator from S-Store to Postgres that we have implemented as a first proof of concept. We report some interesting results using this migrator that impact the evaluation of query plans.

very large data bases | 2013

Towards database virtualization for database as a service

Aaron J. Elmore; Carlo Curino; Divyakant Agrawal; Amr El Abbadi

Advances in operating system and storage-level virtualization technologies have enabled the effective consolidation of heterogeneous applications in a shared cloud infrastructure. Novel research challenges arising from this new shared environment include load balancing, workload estimation, resource isolation, machine replication, live migration, and an emergent need of automation to handle large scale operations with minimal manual intervention. Given that databases are at the core of most applications that are deployed in the cloud, database management systems (DBMSs) represent a very important technology component that needs to be virtualized in order to realize the benefits of virtualization from autonomic management of data-intensive applications in large scale data-centers. The goal of this tutorial is to survey the techniques used in providing elasticity in virtual machine systems, shared storage systems, and survey database research on multitenant architectures and elasticity primitives. This foundation of core Database as a Service advances, together with a primer of important related topics in OS and storage-level virtualization, are central for anyone that wants to operate in this area of research.

international conference on management of data | 2017

OrpheusDB: A Lightweight Approach to Relational Dataset Versioning

Liqi Xu; Silu Huang; SiLi Hui; Aaron J. Elmore; Aditya G. Parameswaran

We demonstrate OrpheusDB, a lightweight approach to versioning of relational datasets. OrpheusDB is built as a thin layer on top of standard relational databases, and therefore inherits much of their benefits while also compactly storing, tracking, and recreating dataset versions on demand. OrpheusDB also supports a range of querying modalities spanning both SQL and git-style version commands. Conference attendees will be able to interact with OrpheusDB via an interactive version browser interface. The demo will highlight underlying design decisions of OrpheusDB, and provide an understanding of how OrpheusDB translates versioning commands into commands understood by a database system that is unaware of the presence of versions. OrpheusDB has been developed as open-source software; code is available at http://orpheus-db.github.io.

Explore More