Jennie Duggan
Northwestern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jennie Duggan.
international conference on management of data | 2011
Jennie Duggan; Ugur Çetintemel; Olga Papaemmanouil; Eli Upfal
Current trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance expectations. In these newly-emerging settings, avoiding potential Quality-of-Service (QoS) violations heavily relies on performance predictability, i.e., the ability to estimate the impact of concurrent query execution on the performance of individual queries in a continuously evolving workload. This paper presents a modeling approach to estimate the impact of concurrency on query performance for analytical workloads. Our solution relies on the analysis of query behavior in isolation, pairwise query interactions and sampling techniques to predict resource contention under various query mixes and concurrency levels. We introduce a simple yet powerful metric that accurately captures the joint effects of disk and memory contention on query performance in a single value. We also discuss predicting the execution behavior of a time-varying query workload through query-interaction timelines, i.e., a fine-grained estimation of the time segments during which discrete mixes will be executed concurrently. Our experimental evaluation on top of PostgreSQL/TPC-H demonstrates that our models can provide query latency predictions within approximately 20% of the actual values in the average case.
international conference on management of data | 2015
Jennie Duggan; Aaron J. Elmore; Michael Stonebraker; Magdalena Balazinska; Bill Howe; Jeremy Kepner; Samuel Madden; David Maier; Timothy G. Mattson; Stan Zdonik
This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that â no one size fits allâ . To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data
ieee high performance extreme computing conference | 2016
Vijay Gadepally; Peinan Chen; Jennie Duggan; Aaron J. Elmore; Brandon Haynes; Jeremy Kepner; Samuel Madden; Tim Mattson; Michael Stonebraker
Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.
extending database technology | 2014
Jennie Duggan; Olga Papaemmanouil; Ugur Çetintemel; Eli Upfal
Predicting query performance under concurrency is a difficult task that has many applications in capacity planning, cloud computing, and batch scheduling. We introduce Contender, a new resourcemodeling approach for predicting the concurrent query performance of analytical workloads. Contender’s unique feature is that it can generate effective predictions for both static as well as adhoc or dynamic workloads with low training requirements. These characteristics make Contender a practical solution for real-world deployment. Contender relies on models of hardware resource contention to predict concurrent query performance. It introduces two key metrics, Concurrent Query Intensity (CQI) and Query Sensitivity (QS), to characterize the impact of resource contention on query interactions. CQI models how aggressively concurrent queries will use the shared resources. QS defines how a query’s performance changes as a function of the scarcity of resources. Contender integrates these two metrics to effectively estimate a query’s concurrent execution latency using only linear time sampling of the query mixes. Contender learns from sample query executions (based on known query templates) and uses query plan characteristics to generate latency estimates for previously unseen templates. Our experimental results, obtained from PostgreSQL/TPC-DS, show that Contender’s predictions have an error of 19% for known templates and 25% for new templates, which is competitive with the state-ofthe-art while requiring considerably less training time.
international conference on data engineering | 2013
Jennie Duggan; Yun Chi; Hakan Hacigümüs; Shenghuo Zhu; Ugur Çetintemel
We introduce a new learning-based solution for portable database workload performance prediction. The current state of the art addresses performance prediction for individual, static hardware configurations and thus cannot generalize to new platforms without additional training. In this work, we focus on analytical databases that might be deployed on different hardware configurations, possibly offered by various Infrastructure-as-a-Service (IaaS) providers in the cloud. Enabling workload performance predictions that can be ported across hardware configurations and IaaS offerings could significantly help cloud users with their service-purchase decisions and cloud providers with their provisioning decisions. Our solution is based on collaborative filtering modeling and prediction. We applied it to lightweight workload fingerprints that model the characteristics and behavior of concurrent query workloads for carefully selected, abstract hardware configurations. Our preliminary results are derived from experiments with TPC-H and TPC-DS benchmarks on the Amazon and Rackspace clouds. They demonstrate that our techniques can predict analytical workload throughput values for diverse hardware platforms with low training overhead and within approximately 30% of the correct figure.
international conference on management of data | 2014
Jennie Duggan; Michael Stonebraker
Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array databases hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.
ieee high performance extreme computing conference | 2016
Zuohao She; Surabhi Ravishankar; Jennie Duggan
A polystore system evaluates queries that span multiple disparate data models; this character introduces a unique query optimization challenge. Specialized database engines such as array and graph databases support partially overlapping sets of query processing operations. Among their common or similar semantics, different systems could have completely different performance profiles for the same query, making their relative usefulness vary from query to query. We hypothesize that a polystore system could exploit this context-dependent disparity of performance by making choices between executing a sub-query locally and migrating the inputs for remote executions. In this work, as part of the larger ISTC BigDAWG project, we examine the challenges of polystore query optimization through the lens of equivalent semantics among back-end databases.
very large data bases | 2015
Aaron J. Elmore; Jennie Duggan; Michael Stonebraker; Magdalena Balazinska; Ugur Çetintemel; Vijay Gadepally; Jeffrey Heer; Bill Howe; Jeremy Kepner; Tim Kraska; Samuel Madden; David Maier; Timothy G. Mattson; Stavros Papadopoulos; Jeff Parkhurst; Nesime Tatbul; Manasi Vartak; Stan Zdonik
very large data bases | 2014
Rebecca Taft; Essam Mansour; Marco Serafini; Jennie Duggan; Aaron J. Elmore; Ashraf Aboulnaga; Andrew Pavlo; Michael Stonebraker
international conference on management of data | 2015
Jennie Duggan; Olga Papaemmanouil; Leilani Battle; Michael Stonebraker