Songyun Duan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Songyun Duan is active.

Explore More

Publication

Featured researches published by Songyun Duan.

very large data bases | 2009

Tuning database configuration parameters with iTuned

Songyun Duan; Vamsidhar Thummala; Shivnath Babu

Database systems have a large number of configuration parameters that control memory distribution, I/O optimization, costing of query plans, parallelism, many aspects of logging, recovery, and other behavior. Regular users and even expert database administrators struggle to tune these parameters for good performance. The wave of research on improving database manageability has largely overlooked this problem which turns out to be hard to solve. We describe iTuned, a tool that automates the task of identifying good settings for database configuration parameters. iTuned has three novel features: (i) a technique called Adaptive Sampling that proactively brings in appropriate data through planned experiments to find high-impact parameters and high-performance parameter settings, (ii) an executor that supports online experiments in production database environments through a cycle-stealing paradigm that places near-zero overhead on the production workload; and (iii) portability across different database systems. We show the effectiveness of iTuned through an extensive evaluation based on different types of workloads, database systems, and usage scenarios.

extending database technology | 2011

Predicting completion times of batch query workloads using interaction-aware models and simulation

Mumtaz Ahmad; Songyun Duan; Ashraf Aboulnaga; Shivnath Babu

A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20% error.

international conference on autonomic computing | 2008

Guided Problem Diagnosis through Active Learning

Songyun Duan; Shivnath Babu

There is widespread interest today in developing tools that can diagnose the cause of a system failure accurately and efficiently based on monitoring data collected from the system. Over time, the system monitoring data will contain two types of failure data: (i) annotated failure data L, which is monitoring data collected from failure states of the system, where the cause of failure has been diagnosed and attached as annotations with the data; and (ii) unannotated failure data U. Previous work on wholly- or partially-automated diagnosis focused on L or U in isolation. In this paper, we argue that it is important to consider both L and U together to improve the overall accuracy of diagnosis; and in particular, to proactively move instances from U to L. However, such movement requires manual diagnosis effort from system administrators. Since manual diagnosis is expensive and time-consuming, we propose an algorithm to make the best use of manual effort while maximizing the benefit gained from newly diagnosed instances. We report an experimental evaluation of our algorithm using data from a variety of failures - both single failures and multiple correlated failures - injected in a testbed, as well as with synthetic data.

international conference on data engineering | 2009

Fa: A System for Automating Failure Diagnosis

Songyun Duan; Shivnath Babu; Kamesh Munagala

Failures of Internet services and enterprise systems lead to user dissatisfaction and considerable loss of revenue. Since manual diagnosis is often laborious and slow, there is considerable interest in tools that can diagnose the cause of failures quickly and automatically from system-monitoring data. This paper identifies two key data-mining problems arising in a platform for automated diagnosis called {\em Fa}. Fa uses monitoring data to construct a database of{\em failure signatures} against which data from undiagnosed failures can be matched. Two novel challenges we address are to make signatures robust to the noisy monitoring data in production systems, and to generate reliable confidence estimates for matches. Fa uses a new technique called {\em anomaly-based clustering} when the signature database has no high-confidence match for an undiagnosed failure. This technique clusters monitoring data based on how it differs from the failure data, and pinpoints attributes linked to the failure. We show the effectiveness of Fa through a comprehensive experimental evaluation based on failures from a production setting, a variety of failures injected in a testbed, and synthetic data.

international conference on data engineering | 2007

Toward Self-Healing Multitier Services

Brian Cook; Shivnath Babu; George Candea; Songyun Duan

Are self-heating database-centric multitier services Utopia or just a hard puzzle? We argue for the latter and aim to identify the missing pieces of this puzzle. We advocate robust and scalable learning-based approaches to self-healing that we expect to work well for a large class of multitier services. We identify performance-availability problems (PAPs) as the most relevant target for self-healing, and argue that PAPs are best addressed macroscopically. outside the realm of individual tiers. Finally, we lay out a research agenda for learning-based approaches to self-healing, to enable wider deployment of self-healing multi-tier services.

international conference on data engineering | 2010

Interaction-aware prediction of business intelligence workload completion times

Mumtaz Ahmad; Songyun Duan; Ashraf Aboulnaga; Shivnath Babu

While planning the execution of report-generation workloads, database administrators often need to know how long different query workloads will take to run. Database systems run mixes of multiple queries of different types concurrently. Hence, estimating the completion time of a query workload requires reasoning about query mixes and inter-query interactions in the mixes; rather than considering queries or query types in isolation. This paper presents a novel approach for estimating workload completion time based on experiment-driven modeling and simulation of the impact of inter-query interactions. A preliminary evaluation of this approach with TPC-H queries on IBM DB2 shows how our approach can consistently predict workload completion times with good accuracy.

international conference on management of data | 2006

Proactive identification of performance problems

Songyun Duan; Shivnath Babu

We propose to demonstrate Fa, an automated tool for timely and accurate prediction of Service-Level-Agreement (SLA) violations caused by performance problems in database systems. Fa periodically collects performance data at three levels: applications, database server, and operating system. This data is used to construct probabilistic models for predicting SLA violations. Fa currently uses graphical Bayesian network models because of their ability to support a wide range of inferences, including prediction and diagnosis, as well as their support for interactive visualization and presentation of complex system behavior in intuitive ways.

international conference on data engineering | 2009

Automated Diagnosis of System Failures with Fa

Songyun Duan; Shivnath Babu

Failures of Internet services and enterprise systems lead to user dissatisfaction and considerable loss of revenue. Since manual diagnosis is often laborious and slow, there is considerable interest in tools that can diagnose the cause of failures quickly and automatically from system-monitoring data. Fa uses monitoring data to construct a database of {\em failure signatures} against which data from undiagnosed failures can be matched. Two novel challenges we address are to make signatures robust to the noisy monitoring data in production systems, and to generate reliable confidence estimates for matches. Fa uses a new technique called {\em anomaly-based clustering} when thesignature database has no high-confidence match for an undiagnosed failure. This technique clusters monitoring data based on how it differs from the failure data, and pinpoints attributes linked to the failure. We show the effectiveness of Fa through a comprehensive experimental evaluation based on failures from a production setting, a variety of failures injected in a testbed, and synthetic data.

international conference on data engineering | 2008

Processing Diagnosis Queries: A Principled and Scalable Approach

Shivnath Babu; Songyun Duan; Kamesh Munagala

Many popular Web sites suffer occasional user-visible problems such as slow responses, blank pages or error messages being displayed, items not being added to shopping carts, database slowdowns, and others. Such deviations of systems from desired behavior, or failures, can cause user dissatisfaction and considerable loss of revenue. The scale, complexity, and dynamics of modern systems make it hard to track down the cause of failures manually. We address this problem through a new class of declarative queries, called diagnosis queries, that a system administrator or user can pose to pinpoint the cause of a failure. We describe how diagnosis queries are specified over system-monitoring data, and the challenges faced by current techniques to process these queries. We develop and evaluate a new algorithm, based on a combination of clustering and classification, to process diagnosis queries automatically, efficiently, and with good accuracy.

very large data bases | 2007