Jaimie Kelley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaimie Kelley is active.

Explore More

Publication

Featured researches published by Jaimie Kelley.

international conference on autonomic computing | 2012

Adaptive green hosting

Nan Deng; Christopher Stewart; Daniel Gmach; Martin F. Arlitt; Jaimie Kelley

The growing carbon footprint of Web hosting centers contributes to climate change and could harm the publics perception of Web hosts and Internet services. A pioneering cadre of Web hosts, called green hosts, lower their footprints by cutting into their profit margins to buy carbon offsets. This paper argues that an adaptive approach to buying carbon offsets can increase a green hosts total profit by exploiting daily, bursty patterns in Internet service workloads. We make the case in three steps. First, we present a realistic, geographically distributed service that meets strict SLAs while using green hosts to lower its carbon footprint. We show that the service routes requests between competing hosts differently depending on its request arrival rate and on how many carbon offsets each host provides. Second, we use empirical traces of request arrivals to compute how many carbon offsets a host should provide to maximize its profit. We find that diurnal fluctuations and bursty surges interrupted long contiguous periods where the best carbon offset policy held steady, leading us to propose a reactive approach. For certain hosts, our approach can triple the profit compared to a fixed approach used in practice. Third, we simulate 9 services with diverse carbon footprint goals that distribute their workloads across 11 Web hosts worldwide. We use real data on the location of Web hosts and their provided carbon offset policies to show that adaptive green hosting can increase profit by 152% for one of todays larger green hosts.

international conference on autonomic computing | 2015

Measuring and Managing Answer Quality for Online Data-Intensive Services

Jaimie Kelley; Christopher Stewart; Nathaniel Morris; Devesh Tiwari; Yuxiong He; Sameh Elnikety

Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers, the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the Easy Rec Recommendation Engine, and the Open Ephyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.

ACM Transactions on Modeling and Performance Evaluation of Computing | 2017

Obtaining and Managing Answer Quality for Online Data-Intensive Services

Jaimie Kelley; Christopher Stewart; Nathaniel Morris; Devesh Tiwari; Yuxiong He; Sameh Elnikety

Online data-intensive (OLDI) services use anytime algorithms to compute over large amounts of data and respond quickly. Interactive response times are a priority, so OLDI services parallelize query execution across distributed software components and return best effort answers based on the data so far processed. Omitted data from slow components could lead to better answers, but tracing online how much better the answers could be is difficult. We propose Ubora, a design approach to measure the effect of slow-running components on the quality of answers. Ubora randomly samples online queries and executes them a second time. The first online execution omits data from slow components and provides interactive answers. The second execution uses mature results from intermediate components completed after the online execution finishes. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of services, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question-answering system. Ubora computes answer quality with more mature executions per second than competing approaches that do not use memoization. With Ubora, we show that answer quality is effective at guiding online admission control. While achieving the same answer quality on high-priority queries, our adaptive controller had 55% higher peak throughput on low-priority queries than a competing controller guided by the rate of timeouts.

ieee international conference on cloud engineering | 2014

Managing Tiny Tasks for Data-Parallel, Subsampling Workloads

Sundeep Kambhampati; Jaimie Kelley; Christopher Stewart; William C. L. Stewart; Rajiv Ramnath

Subsampling workloads compute statistics from a set of observed samples using a random subset of sample data (i.e., a subsample). Data-parallel platforms group these samples into tasks, each task subsamples its data in parallel. In this paper, we study subsampling workloads that benefit from tiny tasks-i.e., tasks comprising few samples. Tiny tasks reduce processor cache misses caused by random subsampling, which speeds up per-task running time. However, they can also cause significant scheduling overheads that negate the time reduction from reduced cache misses. For example, vanilla Hadoop takes longer to start tiny tasks than to run them. We compared the task scheduling overheads of vanilla Hadoop, lightweight Hadoop setups, and BashReduce. BashReduce, the best platform, outperformed the worst by 3.6X but scheduling overhead was still 12% of a tasks running time. We improved BashReduces scheduler by allowing it to size tasks according to kneepoints on the miss rate curve. We tested these changes on high-throughput genotype data and on data obtained from Netflix. Our improved BashReduce outperformed vanilla Hadoop by almost 3X and completed short, interactive jobs almost as efficiently as long jobs. These results held at scale and across diverse, heterogeneous hardware.

international conference on distributed computing systems workshops | 2013

Balanced and Predictable Networked Storage

Jaimie Kelley; Christopher Stewart

Networking bandwidth and latency have improved in recent years, prompting a wide range of workloads to move back to key value stores, databases, and other types of networked storage. However, networked storage has a well known drawback: Outlier access times create a heavy tailed distribution. Outlier accesses can take much longer than normal access times. This paper studies the effects of outliers on data processing workloads. These workloads strive for balance, i.e., all nodes are kept busy at all times. Outlier accesses can cause bubbles in the pipeline, slowing down the whole workload. For this paper, we modeled the effect of outliers in balanced map reduce systems. We found that outliers can cause 70% slowdown. We also modeled a solution: Use 5% of system resources on replication for predictability -- an old but seldom used approach to mask outliers. We found that this approach could return more than 5% in speedup.

international conference on autonomic computing | 2016

Adaptive Power Profiling for Many-Core HPC Architectures

Jaimie Kelley; Christopher Stewart; Devesh Tiwari; Saurabh Gupta

State of the art schedulers use workload profiles to help determine which resources to allocate. Traditionally, threads execute on every available core, but increasingly, too much power is consumed by using every core. Because peak power can occur at any point in time during the workload, workloads are commonly profiled to completion multiple times in an offline architecture. In practice, this process is too time consuming for online profiling and alternate approaches are used, such as profiling for k% of the workload or predicting peak power from similar workloads. We studied the effectiveness of these methods for core scaling. Core scaling is a technique which executes threads on a subset of available cores, allowing unused cores to enter low-power operating modes. Schedulers can use core scaling to reduce peak power, but must have an accurate profile across potential settings for number of active cores in order to know when to make this decision. We devised an accurate, fast and adaptive approach to profile peak power under core scaling. Our approach uses short profiling runs to collect instantaneous power traces for a workload under each core scaling setting. The duration of profiling varies for each power trace and depends on the desired accuracy. Compared to k% profiling of peak power, our approach reduced the profiling duration by up to 93% while keeping accuracy within 3%.

european conference on computer systems | 2018

Model-driven computational sprinting

Nathaniel Morris; Christopher Stewart; Lydia Y. Chen; Robert Birke; Jaimie Kelley

Computational sprinting speeds up query execution by increasing power usage for short bursts. Sprinting policy decides when and how long to sprint. Poor policies inflate response time significantly. We propose a model-driven approach that chooses between sprinting policies based on their expected response time. However, sprinting alters query executions at runtime, creating a complex dependency between queuing and processing time. Our performance modeling approach employs offline profiling, machine learning, and first-principles simulation. Collectively, these modeling techniques capture the effects of sprinting on response time. We validated our modeling approach with 3 sprinting mechanisms across 9 workloads. Our performance modeling approach predicted response time with median error below 4% in most tests and median error of 11% in the worst case. We demonstrated model-driven sprinting for cloud providers seeking to colocate multiple workloads on AWS Burstable Instances while meeting service level objectives. Model-driven sprinting uncovered policies that achieved response time goals, allowing more workloads to colocate on a node. Compared to AWS Burstable policies, our approach increased revenue per node by 1.6X.

symposium on cloud computing | 2017

Early work on modeling computational sprinting

Nathaniel Morris; Christopher Stewart; Robert Birke; Lydia Y. Chen; Jaimie Kelley

Ever tightening power caps constrain the sustained processing speed of modern processors. With computational sprinting, processors reserve a small power budget that can be used to increase processing speed for short bursts. Computational sprinting speeds up query executions that would otherwise yield slow response time. Common mechanisms used for sprinting include DVFS, core scaling, CPU throttling and application-specific accelerators.

Proceedings of the Posters and Demo Track on | 2012

Graduated locality-aware scheduling for search engines

Jaimie Kelley; Christopher Stewart

Search engines parse diverse, natural language datasets in search of answers to a user query. Not only are they expected to find good answers, they must find them quickly. For public search engines, like Bing and Google, answers that are returned slowly cost more and produce less revenue than answers returned within a second [3]. For private search engines like IBMs Watson [2], slow answers bound the number of queries that can be processed over the lifetime of the hardware. To meet these response time demands, search engines scale out, i.e., they divide datasets across large server clusters and execute user queries in parallel across the cluster. An open research challenge is to reduce the number of scale-out servers needed to ensure fast response times. Recent research reduces scale out by partially executing queries, checking intermediate results, and completing the query as soon as the results exceed a quality threshold [1]. Such partial query execution exploits redundancy within datasets; Often good answers can be found without parsing the entire dataset. IBMs Watson used partial query execution. It buzzed in during Jeopardy games only when intermediate answers exceeded a quality threshold [2]. As another example, Bing used fewer servers by executing queries only until intermediate results reached diminishing returns [1]. Along with measuring the quality of intermediate results, partial query execution requires a mechanism to explore the search dataset iteratively. This poster paper describes a new approach to iteratively explore search datasets.

Archive | 2016