Mor Harchol-Balter | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mor Harchol-Balter is active.

Explore More

Publication

Featured researches published by Mor Harchol-Balter.

measurement and modeling of computer systems | 2009

Optimal power allocation in server farms

Anshul Gandhi; Mor Harchol-Balter; Rajarshi Das; Charles R. Lefurgy

Server farms today consume more than 1.5% of the total electricity in the U.S. at a cost of nearly

high-performance computer architecture | 2010

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

Yoongu Kim; Dongsu Han; Onur Mutlu; Mor Harchol-Balter

4.5 billion. Given the rising cost of energy, many industries are now seeking solutions for how to best make use of their available power. An important question which arises in this context is how to distribute available power among servers in a server farm so as to get maximum performance. By giving more power to a server, one can get higher server frequency (speed). Hence it is commonly believed that, for a given power budget, performance can be maximized by operating servers at their highest power levels. However, it is also conceivable that one might prefer to run servers at their lowest power levels, which allows more servers to be turned on for a given power budget. To fully understand the effect of power allocation on performance in a server farm with a fixed power budget, we introduce a queueing theoretic model, which allows us to predict the optimal power allocation in a variety of scenarios. Results are verified via extensive experiments on an IBM BladeCenter. We find that the optimal power allocation varies for different scenarios. In particular, it is not always optimal to run servers at their maximum power levels. There are scenarios where it might be optimal to run servers at their lowest power levels or at some intermediate power levels. Our analysis shows that the optimal power allocation is non-obvious and depends on many factors such as the power-to-frequency relationship in the processors, the arrival rate of jobs, the maximum server frequency, the lowest attainable server frequency and the server farm configuration. Furthermore, our theoretical model allows us to explore more general settings than we can implement, including arbitrarily large server farms and different power-to-frequency curves. Importantly, we show that the optimal power allocation can significantly improve server farm performance, by a factor of typically 1.4 and as much as a factor of 5 in some cases.

ACM Transactions on Computer Systems | 2003

Size-based scheduling to improve web performance

Mor Harchol-Balter; Bianca Schroeder; Nikhil Bansal; Mukesh Agrawal

Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control access to main memory. The scheduling algorithm employed by these memory controllers has a significant effect on system throughput, so choosing an efficient scheduling algorithm is important. The scheduling algorithm also needs to be scalable — as the number of cores increases, the number of memory controllers shared by the cores should also increase to provide sufficient bandwidth to feed the cores. Unfortunately, previous memory scheduling algorithms are inefficient with respect to system throughput and/or are designed for a single memory controller and do not scale well to multiple memory controllers, requiring significant finegrained coordination among controllers. This paper proposes ATLAS (Adaptive per-Thread Least-Attained-Service memory scheduling), a fundamentally new memory scheduling technique that improves system throughput without requiring significant coordination among memory controllers. The key idea is to periodically order threads based on the service they have attained from the memory controllers so far, and prioritize those threads that have attained the least service over others in each period. The idea of favoring threads with least-attained-service is borrowed from the queueing theory literature, where, in the context of a single-server queue it is known that least-attained-service optimally schedules jobs, assuming a Pareto (or any decreasing hazard rate) workload distribution. After verifying that our workloads have this characteristic, we show that our implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput. Furthermore, since the periods over which we accumulate the attained service are long, the controllers coordinate very infrequently to form the ordering of threads, thereby making ATLAS scalable to many controllers. We evaluate ATLAS on a wide variety of multiprogrammed SPEC 2006 workloads and systems with 4–32 cores and 1–16 memory controllers, and compare its performance to five previously proposed scheduling algorithms. Averaged over 32 workloads on a 24-core system with 4 controllers, ATLAS improves instruction throughput by 10.8%, and system throughput by 8.4%, compared to PAR-BS, the best previous CMP memory scheduling algorithm. ATLASs performance benefit increases as the number of cores increases.

Journal of Parallel and Distributed Computing | 1999

On Choosing a Task Assignment Policy for a Distributed Server System

Mor Harchol-Balter; Mark Crovella; Cristina D. Murta

Is it possible to reduce the expected response time of every request at a web server, simply by changing the order in which we schedule the requests? That is the question we ask in this paper.This paper proposes a method for improving the performance of web servers servicing static HTTP requests. The idea is to give preference to requests for small files or requests with short remaining file size, in accordance with the SRPT (Shortest Remaining Processing Time) scheduling policy.The implementation is at the kernel level and involves controlling the order in which socket buffers are drained into the network. Experiments are executed both in a LAN and a WAN environment. We use the Linux operating system and the Apache and Flash web servers.Results indicate that SRPT-based scheduling of connections yields significant reductions in delay at the web server. These result in a substantial reduction in mean response time and mean slowdown for both the LAN and WAN environments. Significantly, and counter to intuition, the requests for large files are only negligibly penalized or not at all penalized as a result of SRPT-based scheduling.

principles of distributed computing | 1999

Resource discovery in distributed networks

Mor Harchol-Balter; Tom Leighton; Daniel M. Lewin

We consider a distributed server system in which each host processes tasks in First-Come-First-Served order and each tasks service demand is known immediately upon task arrival. We consider four task assignment policies commonly proposed for such distributed server systems: Round Robin; Random; Size-Based, in which all tasks within a given size range are assigned to a particular host; and Dynamic-Least-Work-Remaining, in which a task is assigned to the host with the least outstanding work. Using analysis and simulation, we explore the influence of task size variability on which task assignment policy is best. Surprisingly, we find that not one of the above task assignment policies is best. In particular, we find that when the task sizes are not highly variable, the Dynamic policy is preferable. However, when task sizes show the degree of variability more characteristic of empirically measured workloads, the size-based policy is the best choice. We use the resulting observations to argue in favor of a specific size-based policy, SITA-E, that shows very good performance for realistic task size distributions.

ACM Transactions on Computer Systems | 2012

AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers

Anshul Gandhi; Mor Harchol-Balter; Ram Raghunathan; Michael Kozuch

In large distributed networks of computers, it is often the case that a subset of machines wants to cooperate to perform a task. Before they can do so, these machines need to learn of the existence of each other. In this paper we are interested in distributed algorithms whereby machines in a network learn of other machines in the network by making queries to machines they already know. The algorithms should be efficient both in terms of the time required and in terms of the total network communication required until all machines have discovered all other machines. We propose a very simple algorithm called Name-Dropper whereby all machines learn about each other within O(log’ n) rounds with high probability, where n is the number of machines in the network. The total number of connections required is O(n log2 n) and the total number of pointers which must be communicated is O(n2 log2 n), with high probability. Each of the preceding bounds is optimal to within polylogarithmic factors.

measurement and modeling of computer systems | 2003

Classifying scheduling policies with respect to unfairness in an M/GI/1

Adam Wierman; Mor Harchol-Balter

Energy costs for data centers continue to rise, already exceeding

measurement and modeling of computer systems | 1996

Exploiting process lifetime distributions for dynamic load balancing

Mor Harchol-Balter; Allen B. Downey

15 billion yearly. Sadly much of this power is wasted. Servers are only busy 10--30% of the time on average, but they are often left on, while idle, utilizing 60% or more of peak power when in the idle state. We introduce a dynamic capacity management policy, AutoScale, that greatly reduces the number of servers needed in data centers driven by unpredictable, time-varying load, while meeting response time SLAs. AutoScale scales the data center capacity, adding or removing servers as needed. AutoScale has two key features: (i) it autonomically maintains just the right amount of spare capacity to handle bursts in the request rate; and (ii) it is robust not just to changes in the request rate of real-world traces, but also request size and server efficiency. We evaluate our dynamic capacity management approach via implementation on a 38-server multi-tier data center, serving a web site of the type seen in Facebook or Amazon, with a key-value store workload. We demonstrate that AutoScale vastly improves upon existing dynamic capacity management policies with respect to meeting SLAs and robustness.

Performance Evaluation | 2006

Closed form solutions for mapping general distributions to quasi-minimal PH distributions

Takayuki Osogami; Mor Harchol-Balter

It is common to evaluate scheduling policies based on their mean response times. Another important, but sometimes opposing, performance metric is a scheduling policys fairness. For example, a policy that biases towards small job sizes so as to minimize mean response time may end up being unfair to large job sizes. In this paper we define three types of unfairness and demonstrate large classes of scheduling policies that fall into each type. We end with a discussion on which jobs are the ones being treated unfairly.

international conference on data engineering | 2006

How to Determine a Good Multi-Programming Level for External Scheduling

Bianca Schroeder; Mor Harchol-Balter; Arun Iyengar; Erich M. Nahum; Adam Wierman

We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memory-transfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 - 50%, compared to non-preemptive migration.

Explore More