Willis Lang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Willis Lang is active.

Explore More

Publication

Featured researches published by Willis Lang.

international conference on data engineering | 2012

Towards Multi-tenant Performance SLOs

Willis Lang; Srinath Shankar; Jignesh M. Patel; Ajay Kalhan

As traditional and mission-critical relational database workloads migrate to the cloud in the form of Database-as-a-Service (DaaS), there is an increasing motivation to provide performance goals in Service Level Objectives (SLOs). Providing such performance goals is challenging for DaaS providers as they must balance the performance that they can deliver to tenants and the data centers operating costs. In general, aggressively aggregating tenants on each server reduces the operating costs but degrades performance for the tenants, and vice versa. In this paper, we present a framework that takes as input the tenant workloads, their performance SLOs, and the server hardware that is available to the DaaS provider, and outputs a cost-effective recipe that specifies how much hardware to provision and how to schedule the tenants on each hardware resource. We evaluate our method and show that it produces effective solutions that can reduce the costs for the DaaS provider while meeting performance goals.

data management on new hardware | 2010

Wimpy node clusters: what about non-wimpy workloads?

Willis Lang; Jignesh M. Patel; Srinath Shankar

The high cost associated with powering servers has introduced new challenges in improving the energy efficiency of clusters running data processing jobs. Traditional high-performance servers are largely energy inefficient due to various factors such as the over-provisioning of resources. The increasing trend to replace traditional high-performance server nodes with low-power low-end nodes in clusters has recently been touted as a solution to the cluster energy problem. However, the key tacit assumption that drives such a solution is that the proportional scale-out of such low-power cluster nodes results in constant scaleup in performance. This paper studies the validity of such an assumption using measured price and performance results from a low-power Atom-based node and a traditional Xeon-based server and a number of published parallel scaleup results. Our results show that in most cases, computationally complex queries exhibit disproportionate scaleup characteristics which potentially makes scale-out with low-end nodes an expensive and lower performance solution.

IEEE Transactions on Knowledge and Data Engineering | 2010

Dictionary-Based Compression for Long Time-Series Similarity

Willis Lang; Michael Morse; Jignesh M. Patel

Long time-series data sets are common in many domains, especially scientific domains. Applications in these fields often require comparing trajectories using similarity measures. Existing methods perform well for short time series but their evaluation cost degrades rapidly for longer time series. In this work, we develop a new time-series similarity measure called the Dictionary Compression Score (DCS) for determining time-series similarity. We also show that this method allows us to accurately and quickly calculate similarity for both short and long time series. We use the well-known Kolmogorov Complexity in information theory and the Lempel-Ziv compression framework as a basis to calculate similarity scores. We show that off-the-shelf compressors do not fair well for computing time-series similarity. To address this problem, we developed a novel dictionary-based compression technique to compute time-series similarity. We also develop heuristics to automatically identify suitable parameters for our method, thus, removing the task of parameter tuning found in other existing methods. We have extensively compared DCS with existing similarity methods for classification. Our experimental evaluation shows that for long time-series data sets, DCS is accurate, and it is also significantly faster than existing methods.

IEEE Transactions on Knowledge and Data Engineering | 2014

Towards Multi-Tenant Performance SLOs

Willis Lang; Srinath Shankar; Jignesh M. Patel; Ajay Kalhan

As traditional and mission-critical relational database workloads migrate to the cloud in the form of Database-as-a-Service (DaaS), there is an increasing motivation to provide performance goals in Service Level Objectives (SLOs). Providing such performance goals is challenging for DaaS providers as they must balance the performance that they can deliver to tenants and the data center’s operating costs. In general, aggressively aggregating tenants on each server reduces the operating costs but degrades performance for the tenants, and vice versa. In this paper, we present a framework that takes as input the tenant workloads, their performance SLOs, and the server hardware that is available to the DaaS provider, and outputs a cost-effective recipe that specifies how much hardware to provision and how to schedule the tenants on each hardware resource. We evaluate our method and show that it produces effective solutions that can reduce the costs for the DaaS provider while meeting performance goals.

international conference on data engineering | 2008

Scalable Rule-Based Gene Expression Data Classification

Mark A. Iwen; Willis Lang; Jignesh M. Patel

Current state-of-the-art association rule-based classifiers for gene expression data operate in two phases: (i) Association rule mining from training data followed by (ii) Classification of query data using the mined rules. In the worst case, these methods require an exponential search over the subset space of the training data sets samples and/or genes during at least one of these two phases. Hence, existing association rule-based techniques are prohibitively computationally expensive on large gene expression datasets. Our main result is the development of a heuristic rule-based gene expression data classifier called Boolean Structure Table Classification (BSTC). BSTC is explicitly related to association rule-based methods, but is guaranteed to be polynomial space/time. Extensive cross validation studies on several real gene expression datasets demonstrate that BSTC retains the classification accuracy of current association rule-based methods while being orders of magnitude faster than the leading classifier RCBT on large datasets. As a result, BSTC is able to finish table generation and classification on large datasets for which current association rule-based methods become computationally infeasible. BSTC also enjoys two other advantages over association rule-based classifiers: (i) BSTC is easy to use (requires no parameter tuning), and (ii) BSTC can easily handle datasets with any number of class types. Furthermore, in the process of developing BSTC we introduce a novel class of Boolean association rules which have potential applications to other data mining problems.

symposium on cloud computing | 2015

Microsoft azure SQL database telemetry

Willis Lang; Frank Bertsch; David J. DeWitt; Nigel R. Ellis

Microsoft operates the Azure SQL Database (ASD) cloud service, one of the dominant relational cloud database services in the market today. To aid the academic community in their research on designing and efficiently operating cloud database services, Microsoft is introducing the release of production-level telemetry traces from the ASD service. This telemetry data set provides, over a wide set of important hardware resources and counters, the consumption level of each customer database replica. The first release will be a multi-month time-series data set that includes the full cluster traces from two different ASD global regions.

international conference on data engineering | 2017

Predictive Provisioning: Efficiently Anticipating Usage in Azure SQL Database

Lalitha Viswanathan; Bikash Chandra; Willis Lang; Karthik Ramachandra; Jignesh M. Patel; Ajay Kalhan; David J. DeWitt; Alan Halverson

Over-booking cloud resources is an effective way to increase the cost efficiency of a cluster, and is being studied within Microsoft for the Azure SQL Database service. A key challenge is to strike the right balance between the potentially conflicting goals of optimizing for resource allocation efficiency and positive user experience. Understanding when cloud database customers use their database instances and when they are idle can allow one to successfully balance these two metrics. In our work, we formulate and evaluate production-feasible methods to develop idleness profiles for customer databases. Using one of the largest data center telemetry datasets, namely Azure SQL Database telemetry across multiple data centers, we show that our schemes are effective in predicting future patterns of database usage. Our methods are practical and improve the efficiency of clusters while managing customer expectations.

international conference on management of data | 2018

Survivability of Cloud Databases - Factors and Prediction

Jose Picado; Willis Lang; Edward C. Thayer

Public cloud database providers observe all sorts of different usage patterns and behaviors while operating their services. Service providers such as Microsoft try to understand and characterize these behaviors in order to improve the quality of their service, provide new features for customers, and/or increase the efficiency of the operations. While there are many types of patterns of behavior that are of interest to providers, such as query types, workload intensity, and temporal activity, in this paper, we focus on the lowest level of behavior -- how long do public cloud databases survive before being dropped? Given the large and diverse relational database population that Azure SQL DB has, we present a large-scale survivability study of our service and identify some factors that can demonstrably help predict the lifespan of cloud databases. The results of this study are being used to influence how Azure SQL DB operates in order to increase efficiency as well as improve customer experience.

very large data bases | 2010