Is this you? Create Your Porfile

Larry Rudolph

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Larry Rudolph is active.

Explore More

Publication

Featured researches published by Larry Rudolph.

job scheduling strategies for parallel processing | 1997

Theory and Practice in Parallel Job Scheduling

Dror G. Feitelson; Larry Rudolph; Uwe Schwiegelshohn; Kenneth C. Sevcik; Parkson Wong

The scheduling of jobs on parallel supercomputer is becoming the subject of much research. However, there is concern about the divergence of theory and practice. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the components of a scheduling system, that has grown from requirements in the field.

The Journal of Supercomputing | 2004

Dynamic Partitioning of Shared Cache Memory

G.E. Suh; Larry Rudolph; Srinivas Devadas

This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches.Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-time. Also, the workload is determined at run-time by the operating system scheduler. Our scheme combines the information, and partitions the cache amongst the executing processes/threads. Partition sizes are varied dynamically to reduce the total number of misses.The partitioning scheme has been evaluated using a processor simulator modeling a two-processor CMP system. The results show that the scheme can improve the total IPC significantly over the standard least recently used (LRU) replacement policy. In a certain case, partitioning doubles the total IPC over standard LRU. Our results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory.

high-performance computer architecture | 2002

A new memory monitoring scheme for memory-aware scheduling and partitioning

G.E. Suh; Srinivas Devadas; Larry Rudolph

We propose a low overhead, online memory monitoring scheme utilizing a set of novel hardware counters. The counters indicate the marginal gain in cache hits as the size of the cache is increased, which gives the cache miss-rate as a function of cache size. Using the counters, we describe a scheme that enables an accurate estimate of the isolated miss-rates of each process as a function of cache size under the standard LRU replacement policy. This information can be used to schedule jobs or to partition the cache to minimize the overall miss-rate. The data collected by the monitors can also be used by an analytical model of cache and memory behavior to produce a more accurate overall miss-rate for the collection of processes sharing a cache in both time and space. This overall miss-rate can be used to improve scheduling and partitioning schemes.

job scheduling strategies for parallel processing | 2004

Parallel job scheduling — a status report

Dror G. Feitelson; Larry Rudolph; Uwe Schwiegelshohn

The popularity of research on the scheduling of parallel jobs demands a periodic review of the status of the field. Indeed, several surveys have been written on this topic in the context of parallel supercomputers [17, 20]. The purpose of the present paper is to update that material, and to extend it to include work concerning clusters and the grid.

Journal of Parallel and Distributed Computing | 1992

Gang scheduling performance benefits for fine-grain synchronization

Dror G. Feitelson; Larry Rudolph

Abstract Multiprogrammed multiprocessors executing fine-grain parallel programs appear to require new scheduling policies. A promising new idea is gang scheduling, where a set of threads are scheduled to execute simultaneously on a set of processors. This has the intuitive appeal of supplying the threads with an environment that is very similar to a dedicated machine. It allows the threads to interact efficiently by using busy waiting, without the risk of waiting for a thread that currently is not running. Without gang scheduling, threads have to block in order to synchronize, thus suffering the overhead of a context switch. While this is tolerable in coarse-grain computations, and might even lead to performance benefits if the threads are highly unbalanced, it causes severe performance degradation in the fine-grain case. We have developed a model to evaluate the performance of different combinations of synchronization mechanisms and scheduling policies, and validated it by an implementation on the Makbilan multiprocessor. The model leads to the conclusion that gang scheduling is required for efficient fine-grain synchronization on multiprogrammed multiprocessors.

international colloquium on automata languages and programming | 1990

A complexity theory of efficient parallel algorithms

Clyde P. Kruskal; Larry Rudolph; Marc Snir

Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that emphasizes the efficiency of parallel algorithms. We define a complexity class PE of problems that can be solved by parallel algorithms that are efficient (the speedup is proportional to the number of processors used) and polynomially faster than sequential algorithms. Other complexity classes are also defined, in terms of time and efficiency: A class that has a slightly weaker efficiency requirement than PE, and a class that is a natural generalization of NC. We investigate the relationship between various models of parallel computation, using a newly defined concept of efficient simulation. This includes new models that reflect asynchrony and high communication latency in parallel computers. We prove that the class PE is invariant across the shared memory models (PRAMs) and fully connected message passing machines. These results show that our definitions are robust. Many open problems motivated by our approach are listed.

acm symposium on parallel algorithms and architectures | 1991

A simple load balancing scheme for task allocation in parallel machines

Larry Rudolph; Miriam Slivkin-Allalouf; Eli Upfal

A collection of local workpiles (task queues) and a simple load balancing scheme is well suited for scheduling tasks in shared memory parallel machines. Task scheduling on such machines has usually been done through a single, globally accessible, workpile. The scheme introduced in this paper achieves a balancing comparable to that of a global workpile, while minimizing the overheads. In many parallel computer architectures, each processor has some memory that it can access more efficiently, and so it is desirable that tasks do not mirgrate frequently. The load balancing is simple and distributed: Whenever a processor accesses its local workpile, it performs a balancing operation with probability inversely proportional to the size of its workpile. The balancing operation consists of examining the workpile of a random processor and exchanging tasks so as to equalize the size of the two workpiles. The probabilistic analysis of the performance of the load balancing scheme proves that each tasks in the system receives its fair share of computation time. Specifically, the expected size of each local task queue is within a small constant factor of the average, i.e. total number of tasks in the system divided by the number of processors.

job scheduling strategies for parallel processing | 1995

Parallel Job Scheduling: Issues and Approaches

Dror G. Feitelson; Larry Rudolph

Parallel job scheduling is beginning to gain recognition as an important topic that is distinct from the scheduling of tasks within a parallel job by the programmer or runtime system. The main issue is how to share the resources of the parallel machine among a number of competing jobs, giving each the required level of service. This level of scheduling is done by the operating system. The four most commonly used or advocated techniques are to use a global queue, use variable partitioning, use dynamic partitioning, and use gang scheduling. These techniques are surveyed, and the benefits and shortcomings of each are identified. Then additional requirements that are not addressed by current systems are outlined, followed by considerations for evaluating various scheduling schemes.

IEEE Computer | 1990

Distributed hierarchical control for parallel processing

Dror G. Feitelson; Larry Rudolph

A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system. Such a structure allows dynamic repartitioning according to changing job requirements. The design goals are examined, and the principles of distributed hierarchical control are presented. Control over processors is discussed. Mapping and load balancing with distributed hierarchical control are considered. Support for gang scheduling as well as availability and fault tolerance is addressed. The use of distributed hierarchical control in memory management and I/O is discussed.<<ETX>>

job scheduling strategies for parallel processing | 1996

Towards Convergence in Job Schedulers for Parallel Supercomputers

Dror G. Feitelson; Larry Rudolph

The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the systems profit.

Explore More