Christina Delimitrou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christina Delimitrou is active.

Explore More

Publication

Featured researches published by Christina Delimitrou.

architectural support for programming languages and operating systems | 2013

Paragon: QoS-aware scheduling for heterogeneous datacenters

Christina Delimitrou; Christos Kozyrakis

Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11% and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

architectural support for programming languages and operating systems | 2014

Quasar: resource-efficient and QoS-aware cluster management

Christina Delimitrou; Christos Kozyrakis

Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability. We present Quasar, a cluster management system that increases resource utilization while providing consistently high application performance. Quasar employs three techniques. First, it does not rely on resource reservations, which lead to underutilization as users do not necessarily understand workload dynamics and physical resource requirements of complex codebases. Instead, users express performance constraints for each workload, letting Quasar determine the right amount of resources to meet these constraints at any point. Second, Quasar uses classification techniques to quickly and accurately determine the impact of the amount of resources (scale-out and scale-up), type of resources, and interference on performance for each workload and dataset. Third, it uses the classification results to jointly perform resource allocation and assignment, quickly exploring the large space of options for an efficient way to pack workloads on available resources. Quasar monitors workload performance and adjusts resource allocation and assignment when needed. We evaluate Quasar over a wide range of workload scenarios, including combinations of distributed analytics frameworks and low-latency, stateful services, both on a local cluster and a cluster of dedicated EC2 servers. At steady state, Quasar improves resource utilization by 47% in the 200-server EC2 cluster, while meeting performance constraints for workloads of all types.

ACM Transactions on Computer Systems | 2013

QoS-Aware scheduling in heterogeneous datacenters with paragon

Christina Delimitrou; Christos Kozyrakis

Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online, and do not scale beyond a few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity- and interference-aware. Paragon is derived from robust analytical methods, and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown incoming workload with respect to heterogeneity and interference in multiple shared resources. It does so by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. After the initial application placement, Paragon monitors application behavior and adjusts the scheduling decisions at runtime to avoid performance degradations. Additionally, we design ARQ, a multiclass admission control protocol that constrains application waiting time. ARQ queues applications in separate classes based on the type of resources they need and avoids long queueing delays for easy-to-satisfy workloads in highly-loaded scenarios. Paragon scales to tens of thousands of servers and applications with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious, and least-loaded schedulers only provide similar guarantees for 14%, 11%, and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

ieee international symposium on workload characterization | 2013

iBench: Quantifying interference for datacenter applications

Christina Delimitrou; Christos Kozyrakis

Interference between co-scheduled applications is one of the major reasons that causes modern datacenters (DCs) to operate at low utilization. DC operators traditionally side-step interference either by disallowing colocation altogether and providing isolated server instances, or by requiring the users to express resource reservations, which are often exaggerated to counter-balance the unpredictability in the quality of allocated resources. Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate. We present iBench, a novel workload suite that helps quantify the pressure different applications put in various shared resources, and similarly the pressure they can tolerate in these resources. iBench consists of a set of carefully-crafted benchmarks that induce interference of increasing intensity in resources that span the CPU, cache hierarchy, memory, storage and networking subsystems. We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications. Then, we use iBench to demonstrate the importance of considering interference in a set of challenging problems that range from DC scheduling and server provisioning, to resource-efficient application development and scheduling for heterogeneous CMPs. In all cases quantifying interference with iBench results in significant performance and/or efficiency improvements. We plan to release iBench under a free software license.

symposium on cloud computing | 2015

Tarcil: reconciling scheduling speed and quality in large shared clusters

Christina Delimitrou; Daniel Sanchez; Christos Kozyrakis

Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality. Tarcil uses an analytically derived sampling framework that adjusts the sample size based on load, and provides statistical guarantees on the quality of allocated resources. It also implements admission control when sampling is unlikely to find suitable resources. This makes it appropriate for large, shared clusters hosting short- and long-running jobs. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general scenarios, Tarcil achieves near-optimal performance for 4× and 2× more jobs than sampling-based and centralized schedulers respectively.

architectural support for programming languages and operating systems | 2016

HCloud: Resource-Efficient Provisioning in Shared Cloud Systems

Christina Delimitrou; Christos Kozyrakis

Cloud computing promises flexibility and high performance for users and cost efficiency for operators. To achieve this, cloud providers offer instances of different sizes, both as long-term reservations and short-term, on-demand allocations. Unfortunately, determining the best provisioning strategy is a complex, multi-dimensional problem that depends on the load fluctuation and duration of incoming jobs, and the performance unpredictability and cost of resources. We first compare the two main provisioning strategies (reserved and on-demand resources) on Google Compute Engine (GCE) using three representative workload scenarios with batch and latency-critical applications. We show that either approach is suboptimal for performance or cost. We then present HCloud, a hybrid provisioning system that uses both reserved and on-demand resources. HCloud determines which jobs should be mapped to reserved versus on-demand resources based on overall load, and resource unpredictability. It also determines the optimal instance size an application needs to satisfy its Quality of Service (QoS) constraints. We demonstrate that hybrid configurations improve performance by 2.1x compared to fully on-demand provisioning, and reduce cost by 46% compared to fully reserved systems. We also show that hybrid strategies are robust to variation in system and job parameters, such as cost and system load.

ieee international symposium on workload characterization | 2011

Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads

Christina Delimitrou; Sriram Sankar; Kushagra Vaid; Christos Kozyrakis

The cost and power impact of suboptimal storage configurations is significant in datacenters (DCs) as inefficiencies are aggregated over several thousand servers and represent considerable losses in capital and operating costs. Designing performance, power and cost-optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different storage design choices. Traditional benchmarking is invalid in cloud data-stores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical both from a cost and time perspective. Despite these issues, current workload generators are not able to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system configurations. To address these limitations, we propose a modeling and characterization framework for large-scale storage applications. As part of this framework we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the principal features of the framework that allow accurate modeling and generation of storage workloads and the validation process performed against ten original DC applications traces. Furthermore, using our framework, we perform an in-depth, per-thread characterization of these applications and provide insights on their behavior. Finally, we explore two practical applications of this methodology: SSD caching and defragmentation benefits on enterprise storage. In both cases we observe significant speedup for most of the examined applications. Since knowledge of the workloads spatial and temporal locality is necessary to model these use cases, our framework was instrumental in quantifying their performance benefits. The proposed methodology provides a detailed understanding on the storage activity of large-scale applications and enables a wide spectrum of storage studies without the requirement for access to real applications and full application deployment.

ieee international symposium on workload characterization | 2012

ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers

Christina Delimitrou; Sriram Sankar; Aman Kansal; Christos Kozyrakis

Large-scale datacenters now host a large part of the worlds data and computation, which makes their design a crucial architectural challenge. Datacenter (DC) applications, unlike traditional workloads, are dominated by user patterns that only emerge in the large-scale. This creates the need for concise, accurate and scalable analytical models that capture both their temporal and spatial features and can be used to create representative activity patterns. Unfortunately, previous work lacks the ability to track the complex patterns that are present in these applications, or scales poorly with the size of the system. In this work, we focus on the network aspect of datacenter workloads. We present ECHO, a scalable and accurate modeling scheme that uses hierarchical Markov Chains to capture the network activity of large-scale applications in time and space. ECHO can also use these models to re-create representative network traffic patterns. We validate the model against real DC-scale applications, such as Websearch and show marginal deviations between original and generated workloads. We verify that ECHO captures all the critical features of DC workloads, such as the locality of communication and burstiness and evaluate the granularity necessary for this. Finally we perform a detailed characterization of the network traffic for workloads in DCs of tens of thousands of servers over significant time frames.

international symposium on computer architecture | 2016

Automatic generation of efficient accelerators for reconfigurable hardware

David Koeplinger; Christina Delimitrou; Raghu Prabhakar; Christos Kozyrakis; Yaqi Zhang; Kunle Olukotun

Acceleration in the form of customized datapaths offer large performance and energy improvements over general purpose processors. Reconfigurable fabrics such as FPGAs are gaining popularity for use in implementing application-specific accelerators, thereby increasing the importance of having good high-level FPGA design tools. However, current tools for targeting FPGAs offer inadequate support for high-level programming, resource estimation, and rapid and automatic design space exploration. We describe a design framework that addresses these challenges. We introduce a new representation of hardware using parameterized templates that captures locality and parallelism information at multiple levels of nesting. This representation is designed to be automatically generated from high-level languages based on parallel patterns. We describe a hybrid area estimation technique which uses template-level models and design-level artificial neural networks to account for effects from hardware place-and-route tools, including routing overheads, register and block RAM duplication, and LUT packing. Our runtime estimation accounts for off-chip memory accesses. We use our estimation capabilities to rapidly explore a large space of designs across tile sizes, parallelization factors, and optional coarse-grained pipelining, all at multiple loop levels. We show that estimates average 4.8% error for logic resources, 6.1% error for runtimes, and are 279 to 6533 times faster than a commercial high-level synthesis tool. We compare the best-performing designs to optimized CPU code running on a server-grade 6 core processor and show speedups of up to 16.7×.

IEEE Computer Architecture Letters | 2013

The Netflix Challenge: Datacenter Edition

Christina Delimitrou; Christos Kozyrakis

The hundreds of thousands of servers in modern warehouse-scale systems make performance and efficiency optimizations pressing design challenges. These systems are traditionally considered homogeneous. However, that is not typically the case. Multiple server generations compose a heterogeneous environment, whose performance opportunities have not been fully explored since techniques that account for platform heterogeneity typically do not scale to the tens of thousands of applications hosted in large-scale cloud providers. We present ADSM, a scalable and efficient recommendation system for application-to-server mapping in large-scale datacenters (DCs) that is QoS-aware. ADSM overcomes the drawbacks of previous techniques, by leveraging robust and computationally efficient analytical methods to scale to tens of thousands of applications with minimal overheads. It is also QoS-aware, mapping applications to platforms while enforcing strict QoS guarantees. ADSM is derived from validated analytical models, has low and bounded prediction errors, is simple to implement and scales to thousands of applications without significant changes to the system. Over 390 real DC workloads, ADSM improves performance by 16% on average and up to 2.5x and efficiency by 22% in a DC with 10 different server configurations.

Explore More