Elie Krevat | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elie Krevat is active.

Explore More

Publication

Featured researches published by Elie Krevat.

acm special interest group on data communication | 2009

Safe and effective fine-grained TCP retransmissions for datacenter communication

Vijay Vasudevan; Amar Phanishayee; Hiral Shah; Elie Krevat; David G. Andersen; Gregory R. Ganger; Garth A. Gibson; Brian Mueller

This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets---the TCP incast problem. In these networks, receivers can experience a drastic reduction in application throughput when simultaneously requesting data from many servers using TCP. Inbound data overfills small switch buffers, leading to TCP timeouts lasting hundreds of milliseconds. For many datacenter workloads that have a barrier synchronization requirement (e.g., filesystem reads and parallel data-intensive queries), throughput is reduced by up to 90%. For latency-sensitive applications, TCP timeouts in the datacenter impose delays of hundreds of milliseconds in networks with round-trip-times in microseconds. Our practical solution uses high-resolution timers to enable microsecond-granularity TCP timeouts. We demonstrate that this technique is effective in avoiding TCP incast collapse in simulation and in real-world experiments. We show that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area.

workshop on automated control for datacenters and clouds | 2009

Tashi: location-aware cluster management

Michael Kozuch; Michael P. Ryan; Richard Gass; Steven W. Schlosser; David R. O'Hallaron; James Cipar; Elie Krevat; Julio Lopez; Michael Stroucken; Gregory R. Ganger

Big Data applications, those that require large data corpora either for correctness or for fidelity, are becoming increasingly prevalent. Tashi is a cluster management system designed particularly for enabling cloud computing applications to operate on repositories of Big Data. These applications are extremely scalable but also have very high resource demands. A key technique for making such applications perform well is Location-Awareness. This paper demonstrates that location-aware applications can outperform those that are not location aware by factors of 3-11 and describes two general services developed for Tashi to provide location-awareness independently of the storage system.

petascale data storage workshop | 2007

On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems

Elie Krevat; Vijay Vasudevan; Amar Phanishayee; David G. Andersen; Gregory R. Ganger; Garth A. Gibson; Srinivasan Seshan

TCP Incast plagues scalable cluster-based storage built atop standard TCP/IP-over-Ethernet, often resulting in much lower client read bandwidth than can be provided by the available network links. This paper reviews the Incast problem and discusses potential application-level approaches to avoiding it.

job scheduling strategies for parallel processing | 2002

Job Scheduling for the BlueGene/L System

Elie Krevat; José G. Castaños; José E. Moreira

BlueGene/L is a massively parallel cellular architecture system with a toroidal interconnect. Cellular architectures with a toroidal interconnect are effective at producing highly scalable computing systems, but typically require job partitions to be both rectangular and contiguous. These restrictions introduce fragmentation issues that affect the utilization of the system and the wait time and slowdown of queued jobs. We propose to solve these problems for the BlueGene/L system through scheduling algorithms that augment a baseline first come first serve (FCFS) scheduler. Restricting ourselves to space-sharing techniques, which constitute a simpler solution to the requirements of cellular computing, we present simulation results for migration and backfilling techniques on BlueGene/L. These techniques are explored individually and jointly to determine their impact on the system. Our results demonstrate that migration can be effective for a pure FCFS scheduler but that backfilling produces even more benefits. We also show that migration can be combined with backfilling to produce more opportunities to better utilize a parallel machine.

ACM Transactions on Storage | 2014

Agility and Performance in Elastic Distributed Storage

Lianghong Xu; James Cipar; Elie Krevat; Alexey Tumanov; Nitin Gupta; Michael Kozuch; Gregory R. Ganger

Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility. This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading, that restricts the set of servers where writes to overloaded servers are redirected. This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67--82%, outdoing state-of-the-art designs by 6--120%.

measurement and modeling of computer systems | 2011

Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing

Elie Krevat; Tomer Shiran; Eric Anderson; Joseph Tucek; Jay J. Wylie; Gregory R. Ganger

“Data-intensive scalable computing” (DISC) refers to a rapidly growing style of computing characterized by its reliance on large and expanding datasets [3]. Driven by the desire and capability to extract insight from such datasets, DISC is quickly emerging as a major activity of many organizations. Map-reduce style programming frameworks such as MapReduce [4] and Hadoop [1] support DISC activities by providing abstractions and frameworks to more easily scale data-parallel computations over commodity machines. In the pursuit of scale, popular map-reduce frameworks neglect efficiency as an important metric. Anecdotal experiences indicate that they neither achieve balance nor full goodput of hardware resources, effectively wasting a large fraction of the computers over which jobs are scaled. If these inefficiencies are real, the same work could be completed at much lower costs. An ideal run would provide maximum scalability for a given computation without wasting resources. Given the widespread use and scale of DISC systems, it is important that we move closer to frameworks that are “hardwareefficient,” where the framework provides sufficient parallelism to keep the bottleneck resource fully utilized and makes good use of all I/O components. An important first step is to understand the degree, characteristics, and causes of inefficiency. We have a simple model that predicts the idealized lower-bound runtime of a map-reduce workload by assuming an even data distribution, that data is perfectly pipelined through sequential operations, and that the underlying I/O resources are utilized at their full bandwidths whenever applicable. The model’s input parameters describe basic characteristics of the job (e.g., amount of input data), of the hardware (e.g., per-node disk and network throughputs), and of the framework configuration (e.g., replication factor). The output is the idealized runtime. The goal of the model is not to accurately predict the runtime of a job on any given system, but to indicate what the runtime theoretically should be. To focus the evaluation on the efficiency of the programming framework, and not the entire software stack, mea-

networked systems design and implementation | 2011

Diagnosing performance changes by comparing request flows

Raja R. Sambasivan; Alice X. Zheng; Michael De Rosa; Elie Krevat; Spencer Whitman; Michael Stroucken; William Yang Wang; Lianghong Xu; Gregory R. Ganger

hot topics in operating systems | 2011