Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chris J. Scheiman is active.

Publication


Featured researches published by Chris J. Scheiman.


acm symposium on parallel algorithms and architectures | 1995

LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Albert Alexandrov; Mihai F. Ionescu; Klaus E. Schauser; Chris J. Scheiman

We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5). However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has one additional parameter, G, which captures the bandwidth obtained for long messages. Experimental data collected on the Meiko CS-2 shows that this simple extension of the LogP model can quite accurately predict communication performance for both short and long messages. This paper discusses algorithm design and analysis under the new model, examining the all-to-all remap, FFT, and radix sort. We also examine, in more detail, the single node scatter problem. We derive solutions for this problem and prove their optimality under the LogGP model. These solutions are qualitatively different from those obtained under the simpler LogP model, reflecting the importance of capturing long messages in a model.


Concurrency and Computation: Practice and Experience | 1997

SuperWeb: research issues in Java-based global computing

Albert Alexandrov; Maximilian Ibel; Klaus E. Schauser; Chris J. Scheiman

The Internet, in particular the World Wide Web, continues to expand at an amazing pace. We propose a new infrastructure, SuperWeb, to harness global resources, such as CPU cycles or disk storage, and make them available to every user on the Internet. SuperWeb has the potential for solving parallel supercomputing applications involving thousands of co-operating components on the Internet. However, we anticipate that initial implementations will be used inside large organizations with large heterogeneous intranets. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributed computing realized by languages such as Java. Our SuperWeb prototype consists of brokers, clients and hosts. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Clients submit tasks that need to be executed. The broker maps client computations onto the registered hosts. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.


ACM Transactions on Computer Systems | 1998

UFO: a personal global file system based on user-level extensions to the operating system

Albert Alexandrov; Maximilian Ibel; Klaus E. Schauser; Chris J. Scheiman

In this article we show how to extend a wide range of functionality of standard operation systems completely at the user level. Our approach works by intercepting selected system calls at the user level, using tracing facilities such as the /proc file system provided by many Unix operating systems. The behavior of some intercepted system calls is then modified to implement new functionality. This approach does not require any relinking or recompilation of existing applications. In fact, the extensions can even be dynamically “installed” into already running processes. The extensions work completely at the user level and install without system administrator assistance. Individual users can choose what extensions to run, in effect creating a personalized operating system view for themselves. We used this approach to implement a global file system, called Ufo, which allows users to treat remote files exactly as if they were local. Currently, Ufo supports file access through the FTP and HTTP protocols and allows new protocols to be plugged in. While several other projects have implemented global file system abstractions, they all require either changes to the operating system or modifications to standard libraries. The article gives a detailed performance analysis of our approach to extending the OS and establishes that Ufo introduces acceptable overhead for common applications even though intercepting individual system calls incurs a high cost.


international parallel processing symposium | 1997

SuperWeb: towards a global Web-based parallel computing infrastructure

Albert Alexandrov; Maximilian Ibel; Klaus E. Schauser; Chris J. Scheiman

The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make them available to everyone interested. This infrastructure has the potential for solving parallel supercomputing applications involving thousands of cooperating components. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributed computing embodied in languages such as Java. We developed a prototype of a global computing infrastructure, called SuperWeb, that consists of hosts, brokers and clients. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Client computations are then mapped by the broker onto the registered resources. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.


international parallel processing symposium | 1995

Experience with active messages on the Meiko CS-2

Klaus E. Schauser; Chris J. Scheiman

Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. During our work we have identified that architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior The overcame this problem by producing specialized code which runs on the communications co-processor and supports the active messages protocol. Our implementation of active messages results in a one-way latency of 12.3 /spl mu/s and achieves up to 39 MB/s for bulk transfers. Both numbers are close to optimal for the current Meiko hardware and are competitive with performance of active messages on other hardware platforms.<<ETX>>


IEEE Transactions on Parallel and Distributed Systems | 1992

A processor-time-minimal systolic array for transitive closure

Chris J. Scheiman; Peter R. Cappello

Using a directed acyclic graph (DAG) model of algorithms, the authors focus on processor-time-minimal multiprocessor schedules: time-minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo, and Lewis (KLL) algorithm for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 parallel steps. As originally reported. their systolic array comprises n/sup 2/ processing elements. It is shown that any time-minimal multiprocessor schedule of the KLL algorithms dag needs at least n/sup 2//3 processing elements. Then a processor-time-minimal systolic array realizing the KLL dag is constructed. Its processing elements are organized as a cylindrically connected 2-D mesh, when n=0 mod 3. When n not=0 mod 3, the 2-D mesh is connected as a torus. >


architectural support for programming languages and operating systems | 1996

Evaluation of architectural support for global address-based communication in large-scale parallel machines

Arvind Krishnamurthy; Klaus E. Schauser; Chris J. Scheiman; Randolph Y. Wang; David E. Culler; Katherine A. Yelick

Large-scale parallel machines are incorporating increasingly sophisticated architectural support for user-level messaging and global memory access. We provide a systematic evaluation of a broad spectrum of current design alternatives based on our implementations of a global address language on the Thinking Machines CM-5, Intel Paragon, Meiko CS-2, Cray T3D, and Berkeley NOW. This evaluation includes a range of compilation strategies that make varying use of the network processor; each is optimized for the target architecture and the particular strategy. We analyze a family of interacting issues that determine the performance trade-offs in each implementation, quantify the resulting latency, overhead, and bandwidth of the global access operations, and demonstrate the effects on application performance.


IEEE Transactions on Parallel and Distributed Systems | 1994

A period-processor-time-minimal schedule for cubical mesh algorithms

Chris J. Scheiman; Peter R. Cappello

Using a directed acyclic graph (dag) model of algorithms, we investigate precedence-constrained multiprocessor schedules for the n/spl times/n/spl times/n directed mesh. This cubical mesh is fundamental, representing the standard algorithm for square matrix product, as well as many other algorithms. Its completion requires at least 3/sup n/spl minus/2/ multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. For the cubical mesh, such a schedule requires at least /spl lsqb/3n/sup 2//4/spl rsqb/ processors. Among such schedules, one with the minimum period (i.e., maximum throughput) is referred to as a period-processor-time-minimal schedule. The period of any processor-time-minimal schedule for the cubical mesh is at least 3/sup n/2/ steps. This lower bound is shown to be exact by constructing, for n a multiple of 6, a period-processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is a toroidally connected n/2/spl times/n/2/spl times/3 mesh. >


international conference on application specific array processors | 1995

A processor-time-minimal schedule for 3D rectilinear mesh algorithms

Chris J. Scheiman; Peter R. Cappello

The paper, using a directed acyclic graph (dag) model of algorithms, investigates precedence constrained multiprocessor schedules for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed rectilinear mesh. Its completion requires at least n/sub x/+n/sub y/+n/sub z/-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. Lower bounds are shown for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed mesh, and these bounds are shown to be exact by constructing a processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is either a two dimensional mesh or skewed cylinder. The contribution of this paper is two-fold: It generalizes the previous work on cubical mesh algorithms, and it presents a more elegant mathematical method for deriving processor-time lower bounds for such problems.


Parallel Algorithms and Applications | 2000

PROCESSOR-TIME-OPTIMAL SYSTOLIC ARRAYS

Peter R. Cappello; Ömer Eğecioğlu; Chris J. Scheiman

Abstract Minimizing the amount of time and number of processors needed to perform an application reduces the applications fabrication cost and operation costs. A directed acyclic graph (dag) model of algorithms is used to define a time-minimal schedule and a processor-time-minimal schedule, We present a technique for finding a lower bound on the number of processors needed to achieve a given schedule of an algorithm. The application of this technique is illustrated with a tensor product computation. We then apply the technique to the free schedule of algorithms for matrix product, Gaussian elimination, and transitive closure. For each, we provide a time-minimal processor schedule that meets these processor lower bounds, including the one for tensor product.

Collaboration


Dive into the Chris J. Scheiman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

M. Weis

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Katherine A. Yelick

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge