Olga Papaemmanouil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Olga Papaemmanouil is active.

Explore More

Publication

Featured researches published by Olga Papaemmanouil.

international conference on management of data | 2011

Performance prediction for concurrent database workloads

Jennie Duggan; Ugur Çetintemel; Olga Papaemmanouil; Eli Upfal

Current trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance expectations. In these newly-emerging settings, avoiding potential Quality-of-Service (QoS) violations heavily relies on performance predictability, i.e., the ability to estimate the impact of concurrent query execution on the performance of individual queries in a continuously evolving workload. This paper presents a modeling approach to estimate the impact of concurrency on query performance for analytical workloads. Our solution relies on the analysis of query behavior in isolation, pairwise query interactions and sampling techniques to predict resource contention under various query mixes and concurrency levels. We introduce a simple yet powerful metric that accurately captures the joint effects of disk and memory contention on query performance in a single value. We also discuss predicting the execution behavior of a time-varying query workload through query-interaction timelines, i.e., a fine-grained estimation of the time segments during which discrete mixes will be executed concurrently. Our experimental evaluation on top of PostgreSQL/TPC-H demonstrates that our models can provide query latency predictions within approximately 20% of the actual values in the average case.

international conference on management of data | 2015

Overview of Data Exploration Techniques

Stratos Idreos; Olga Papaemmanouil; Surajit Chaudhuri

Data exploration is about efficiently extracting knowledge from data even if we do not know exactly what we are looking for. In this tutorial, we survey recent developments in the emerging area of database systems tailored for data exploration. We discuss new ideas on how to store and access data as well as new ideas on how to interact with a data system to enable users and applications to quickly figure out which data parts are of interest. In addition, we discuss how to exploit lessons-learned from past research, the new challenges data exploration crafts, emerging applications and future research directions.

international conference on management of data | 2014

Explore-by-example: an automatic query steering framework for interactive data exploration

Kyriaki Dimitriadou; Olga Papaemmanouil; Yanlei Diao

Interactive Data Exploration (IDE) is a key ingredient of a diverse set of discovery-oriented applications, including ones from scientific computing and evidence-based medicine. In these applications, data discovery is a highly ad hoc interactive process where users execute numerous exploration queries using varying predicates aiming to balance the trade-off between collecting all relevant information and reducing the size of returned data. Therefore, there is a strong need to support these human-in-the-loop applications by assisting their navigation in the data to find interesting objects. In this paper, we introduce AIDE, an Automatic Interactive Data Exploration framework, that iteratively steers the user towards interesting data areas and predicts a query that retrieves his objects of interest. Our approach leverages relevance feedback on database samples to model user interests and strategically collects more samples to refine the model while minimizing the user effort. AIDE integrates machine learning and data management techniques to provide effective data exploration results (matching the users interests with high accuracy) as well as high interactive performance. It delivers highly accurate query predictions for very common conjunctive queries with very small user effort while, given a reasonable number of samples, it can predict with high accuracy complex conjunctive queries. Furthermore, it provides interactive performance by limiting the user wait time per iteration to less than a few seconds in average. Our user study indicates that AIDE is a practical exploration framework as it significantly reduces the user effort and the total exploration time compared with the current state-of-the-art approach of manual exploration.

international conference on data engineering | 2010

A generic auto-provisioning framework for cloud databases

Jennie Rogers; Olga Papaemmanouil; Ugur Çetintemel

We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the systems operational cost, estimated based on the IaaS providers pricing model, while satisfying QoS expectations. Specifically, we describe two different approaches, a “white-box” approach that uses a fine-grained estimation of the expected resource consumption for a workload, and a “black-box” approach that relies on coarse-grained profiling to characterize the workloads end-to-end performance across various cloud resources. We formalize both approaches as a constraint programming problem and use a generic constraint solver to efficiently tackle them. We present preliminary experimental numbers, obtained by running TPC-H queries with PostsgreSQL on Amazons EC2, that provide evidence of the feasibility and utility of our approaches. We also briefly discuss the pertinent challenges and directions of on-going research.

extending database technology | 2014

Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction

Jennie Duggan; Olga Papaemmanouil; Ugur Çetintemel; Eli Upfal

Predicting query performance under concurrency is a difficult task that has many applications in capacity planning, cloud computing, and batch scheduling. We introduce Contender, a new resourcemodeling approach for predicting the concurrent query performance of analytical workloads. Contender’s unique feature is that it can generate effective predictions for both static as well as adhoc or dynamic workloads with low training requirements. These characteristics make Contender a practical solution for real-world deployment. Contender relies on models of hardware resource contention to predict concurrent query performance. It introduces two key metrics, Concurrent Query Intensity (CQI) and Query Sensitivity (QS), to characterize the impact of resource contention on query interactions. CQI models how aggressively concurrent queries will use the shared resources. QS defines how a query’s performance changes as a function of the scarcity of resources. Contender integrates these two metrics to effectively estimate a query’s concurrent execution latency using only linear time sampling of the query mixes. Contender learns from sample query executions (based on known query templates) and uses query plan characteristics to generate latency estimates for previously unseen templates. Our experimental results, obtained from PostgreSQL/TPC-DS, show that Contender’s predictions have an error of 19% for known templates and 25% for new templates, which is competitive with the state-ofthe-art while requiring considerably less training time.

very large data bases | 2016

WiSeDB: a learning-based workload management advisor for cloud databases

Ryan Marcus; Olga Papaemmanouil

Workload management for cloud databases must deal with the tasks of resource provisioning, query placement and query scheduling in a manner that meets the applications performance goals while minimizing the cost of using cloud resources. Existing solutions have approached these three challenges in isolation, and with only a particular type of performance goal in mind. In this paper, we introduce WiSeDB, a learning-based framework for generating holistic workload management solutions customized to application-defined performance metrics and workload characteristics. Our approach relies on supervised learning to train cost-effective decision tree models for guiding query placement, scheduling, and resource provisioning decisions. Applications can use these models for both batch and online scheduling of incoming workloads. A unique feature of our system is that it can adapt its offline model to stricter/looser performance goals with minimal re-training. This allows us to present alternative workload management strategies that address the typical performance vs. cost trade-off of cloud services. Experimental results show that our approach has very low training overhead while offering low cost strategies for a variety of performance goals and workload characteristics.

IEEE Transactions on Knowledge and Data Engineering | 2016

AIDE: An Active Learning-Based Approach for Interactive Data Exploration

Kyriaki Dimitriadou; Olga Papaemmanouil; Yanlei Diao

In this paper, we argue that database systems be augmented with an automated data exploration service that methodically steers users through the data in a meaningful way. Such an automated system is crucial for deriving insights from complex datasets found in many big data applications such as scientific and healthcare applications as well as for reducing the human effort of data exploration. Towards this end, we present AIDE, an Automatic Interactive Data Exploration framework that assists users in discovering new interesting data patterns and eliminate expensive ad-hoc exploratory queries. AIDE relies on a seamless integration of classification algorithms and data management optimization techniques that collectively strive to accurately learn the user interests based on his relevance feedback on strategically collected samples. We present a number of exploration techniques as well as optimizations that minimize the number of samples presented to the user while offering interactive performance. AIDE can deliver highly accurate query predictions for very common conjunctive queries with small user effort while, given a reasonable number of samples, it can predict with high accuracy complex disjunctive queries. It provides interactive performance as it limits the user wait time per iteration of exploration to less than a few seconds.

international conference on data engineering | 2009

Supporting Generic Cost Models for Wide-Area Stream Processing

Olga Papaemmanouil; Ugur Çetintemel; John Jannotti

Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, a generic data stream collection, processing, and dissemination system that addresses this limitation efficiently. XFlow can express and optimize a variety of optimization metrics and constraints by distributing stream processing queries across a wide-area network. It uses metric-independent decentralized algorithms that work on localized, aggregated statistics, while avoiding local optima. To facilitate light-weight dynamic changes on the query deployment, XFlow relies on a loosely-coupled, flexible architecture consisting of multiple publish-subscribe overlay trees that can gracefully scale and adapt to changes to network and workload conditions. Based on the desired performance goals, the system progressively refines the query deployment, the structure of the overlay trees, as well as the statistics collection process. We provide an overview of XFlows architecture and discuss its decentralized optimization model. We demonstrate its flexibility and the effectiveness using real-world streams and experimental results obtained from XFlows deployment on PlanetLab. The experiments reveal that XFlow can effectively optimize various performance metrics in the presence of varying network and workload conditions.

international conference on data engineering | 2012

Supporting Extensible Performance SLAs for Cloud Databases

Olga Papaemmanouil

Despite the fast growth and increased adoption of cloud databases the lack of application-specific Service-Level-Agreements (SLAs) hinders the adoption of cloud data services by large-scale enterprises. Defining application-specific QoS objectives and constraints, monitoring the performance factors to ensure acceptable QoS levels and isolating the source of QoS degradation, are some of the critical tasks that are still addressed through custom, ad-hoc tools at the application level, which drastically increases the application development and maintenance overhead. In this work-in-progress paper, we argue that performance management of data management applications should itself be offered as a service. Towards this goal, we present XCloud, a suite of SLA management services for cloud databases that enables the definition and monitoring of application-specific performance criteria and customizable performance SLAs.

very large data bases | 2015

AIDE: an automatic user navigation system for interactive data exploration

Yanlei Diao; Kyriaki Dimitriadou; Zhan Li; Wenzhao Liu; Olga Papaemmanouil; Kemi Peng; Liping Peng

Data analysts often engage in data exploration tasks to discover interesting data patterns, without knowing exactly what they are looking for. Such exploration tasks can be very labor-intensive because they often require the user to review many results of ad-hoc queries and adjust the predicates of subsequent queries to balance the tradeoff between collecting all interesting information and reducing the size of returned data. In this demonstration we introduce AIDE, a system that automates these exploration tasks. AIDE steers the user towards interesting data areas based on her relevance feedback on database samples, aiming to achieve the goal of identifying all database objects that match the user interest with high efficiency. In our demonstration, conference attendees will see AIDE in action for a variety of exploration tasks on real-world datasets.

Explore More