Thomas J. LeBlanc | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas J. LeBlanc is active.

Explore More

Publication

Featured researches published by Thomas J. LeBlanc.

symposium on operating systems principles | 1991

First-class user-level threads

Brian D. Marsh; Michael L. Scott; Thomas J. LeBlanc; Evangelos P. Markatos

It is often desirable, for reasons of clarity, portability, and efficiency, to write parallel programs in which the number of processes is independent of the number of available processors. Several modern operating systems support more than one process in an address space, but the overhead of creating and synchronizing kernel processes can be high. Many runtime environments implement lightweight processes (threads) in user space, but this approach usually results in second-class status for threads, making it difficult or impossible to perform scheduling operations at appropriate times (e.g. when the current thread blocks in the kernel). In addition, a lack of common assumptions may also make it difficult for parallel programs or library routines that use dissimilar thread packages to communicate with each other, or to synchronize access to shared data.We describe a set of kernel mechanisms and conventions designed to accord first-class status to user-level threads, allowing them to be used in any reasonable way that traditional kernel-provided processes can be used, while leaving the details of their implementation to user-level code. The key features of our approach are (1) shared memory for asynchronous communication between the kernel and the user, (2) software interrupts for events that might require action on the part of a user-level scheduler, and (3) a scheduler interface convention that facilitates interactions in user space between dissimilar kinds of threads. We have incorporated these mechanisms in the Psyche parallel operating system, and have used them to implement several different kinds of user-level threads. We argue for our approach in terms of both flexibility and performance.

architectural support for programming languages and operating systems | 1989

A software instruction counter

John M. Mellor-Crummey; Thomas J. LeBlanc

Although several recent papers have proposed architectural support for program debugging and profiling, most processors do not yet provide even basic facilities, such as an instruction counter. As a result, system developers have been forced to invent software solutions. This paper describes our implementation of a software instruction counter for program debugging. We show that an instruction counter can be reasonably implemented in software, often with less than 10% execution overhead. Our experience suggests that a hardware instruction counter is not necessary for a practical implementation of watch-points and reverse execution, however it will make program instrumentation much easier for the system developer.

Journal of Parallel and Distributed Computing | 1990

Analyzing parallel program executions using multiple views

Thomas J. LeBlanc; John M. Mellor-Crummey; Robert J. Fowler

Abstract To understand a parallel programs execution we must be able to analyze lots of information describing complex relationships among many processes. Various techniques have been used, from program replay to program animation, but each has limited applicability and the lack of a common foundation precludes an integrated solution. Our approach to parallel program analysis is based on a multiplicity of views of an execution. We use a synchronization trace captured during execution to construct a graph representation of the programs behavior. A user manipulates this representation to create and fine-tune visualizations using an integrated, programmable toolkit. Additional execution details can be recovered as needed using program replay to reconstruct an execution from an existing synchronization trace. We present a framework for describing views of a parallel programs execution, and an analysis methodology that relates a sequence of views to the program development cycle. We then describe our toolkit implementation and explain how users construct visualizations using the toolkit. Finally, we present an extended example to illustrate both our methodology and the power of our programmable toolkit.

conference on high performance computing (supercomputing) | 1992

Using processor affinity in loop scheduling on shared-memory multiprocessors

Evangelos P. Markatos; Thomas J. LeBlanc

The authors consider a new dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. It is shown that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. The authors propose a loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and colocate loop iterations with the necessary data. They compare the performance of this algorithm to that of other known algorithm using four representative applications on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, and a Sequent Symmetry, and they show that the algorithm offers substantial performance improvements, up to a factor of 3 in some cases. They conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.<<ETX>>

international symposium on computer architecture | 1992

Adjustable block size coherent caches

Czarek Dubnicki; Thomas J. LeBlanc

Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the nuber of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, ad merged back into a single cache line to explit spatial locality. To evaluate this cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program behavior. We show that for evry fixed block size, some program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by applications.

international parallel and distributed processing symposium | 1991

Multiprogramming on multiprocessors

Mark Crovella; Prakash Das; Cezary Dubnicki; Thomas J. LeBlanc; Evangelos P. Markatos

Many solutions have been proposed to the problem of multiprogramming a multiprocessor. However, each has limited applicability or fails to address an important source of overhead. In addition, there has been little experimental comparison of the various solutions in the presence of applications with varying degrees of parallelism and synchronization. The authors explore the tradeoffs between three different approaches to multiprogramming a multiprocessor: time-slicing, coscheduling, and dynamic hardware partitions. They implemented applications that vary in the degree of parallelism, and the frequency and type of synchronization. They show that in most cases coscheduling is preferable to time-slicing. They also show that although there are cases where coscheduling is beneficial, dynamic hardware partitions do no worse, and will often do better. They conclude that under most circumstances, hardware partitioning is the best strategy for multiprogramming a multiprocessor, no matter how much parallelism applications employ or how frequently synchronization occurs.<<ETX>>

IEEE Transactions on Computers | 1985

HPC: A model of structure and change in distributed systems

Thomas J. LeBlanc; Stuart A. Friedberg

Distributed systems must provide certain fundamental facilities, including communication, protection, resource management, reliability and process (computation) abstraction. The authors describe the design of HPC, an object-oriented model of interprocess relationships for distributed systems which addresses all of these fundamental services. The major novelties of HPC lie in the extension of the process abstraction to collections of processes and the provision of a rich set of structuring mechanisms for distributed computations. An important aspect of the model is that it results in the ability to maintain and exploit execution context for managing processes in a distributed computation.

acm sigplan symposium on principles and practice of parallel programming | 1990

Multi-model parallel programming in psyche

Michael L. Scott; Thomas J. LeBlanc; Brian D. Marsh

Many different parallel programming models, including lightweight processes that communicate with shared memory and heavyweight processes that communicate with messages, have been used to implement parallel applications. Unfortunately, operating systems and languages designed for parallel programming typically support only one model. Multi-model parallel programming is the simultaneous use of several different models, both across programs and within a single program. This paper describes multi-model parallel programming in the Psyche multiprocessor operating system. We explain why multi-model programming is desirable and present an operating system interface designed to support it. Through a series of three examples, we illustrate how the Psyche operating system supports different models of parallelism and how the different models are able to interact.

measurement and modeling of computer systems | 1996

Waiting time analysis and performance visualization in Carnival

Wagner Meira; Thomas J. LeBlanc; Alexandros Poulos

Waiting time (where one processor is blocked while waiting for another) arises from a variety of sources in parallel programs, includlng communication, synchronization, load imbalance, and resource contention. Many tools can identify portions of the source code where waiting time arises and measure it during execution, but the programmer must infer the underlying cause of waiting time from other measurements. Carnival is a performance measurement and analysis tool that automates this inference process. Using traces of program executions, the tool identifies the differences in execution paths leading up to a synchronization point, and explains waiting time to the user in terms of those differences. It also supports several different types of performance profiles, which can be used to isolate and quantify important sources of waiting time. We present algorithms for characterizing waiting time in terms of execution paths, and describe implementations on the IBM SP2 and the SGI Challenge. We also present the Carnival user interface, and illustrate the functionality of the interface and the usefulness of waiting time analysis by identifying and explaining the sources of overhead in example applications.

international parallel and distributed processing symposium | 1992

Shared memory vs. message passing in shared-memory multiprocessors

Thomas J. LeBlanc; Evangelos P. Markatos

It is argued that the choice between the shared-memory and message-passing models depends on two factors: the relative cost of communication and computation as implemented by the hardware, and the degree of load imbalance inherent in the application. Two representative applications are used to illustrate the performance advantages of each programming model on several different shared-memory machines, including the BBN Butterfly, Sequent Symmetry, Encore Multimax and Silicon Graphics Iris multiprocessors. It is shown that applications implemented in the shared-memory model perform better on the previous generation of multiprocessors, while applications implemented in the message-passing model perform better on modern multiprocessors. It is argued that both models have performance advantages, and that the factors that influence the choice of model may not be known at compile-time. As a compromise solution, the authors propose an alternative programming model, which has the load balancing properties of the shared-memory model and the locality properties of the message-passing model, and show that this new model performs better than the other two alternatives.<<ETX>>

Explore More