Derek Lieber | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Derek Lieber is active.

Explore More

Publication

Featured researches published by Derek Lieber.

conference on object-oriented programming systems, languages, and applications | 1999

Implementing jalapeño in Java

Bowen Alpern; Clement Richard Attanasio; Anthony Cocchi; Derek Lieber; Stephen Edwin Smith; Ton Ngo; John J. Barton; Susan Flynn Hummel; Janice C. Sheperd; Mark F. Mergen

Jalapeño is a virtual machine for Java#8482; servers written in Java. A running Java program involves four layers of functionality: the user code, the virtual-machine, the operating system, and the hardware. By drawing the Java / non-Java boundary below the virtual machine rather than above it, Jalapeño reduces the boundary-crossing overhead and opens up more opportunities for optimization. To get Jalapeño started, a boot image of a working Jalapeño virtual machine is concocted and written to a file. Later, this file can be loaded into memory and executed. Because the boot image consists entirely of Java objects, it can be concocted by a Java program that runs in any JVM. This program uses reflection to convert the boot image into Jalapeños object format. A special MAGIC class allows unsafe casts and direct access to the hardware. Methods of this class are recognized by Jalapeños three compilers, which ignore their bytecodes and emit special-purpose machine code. User code will not be allowed to call MAGIC methods so Javas integrity is preserved. A small non-Java program is used to start up a boot image and as an interface to the operating system. Javas programming features — object orientation, type safety, automatic memory management — greatly facilitated development of Jalapeño. However, we also discovered some of the languages limitations.

ACM Transactions on Graphics | 1991

Efficient Delaunay triangulation using rational arithmetic

Michael Karasick; Derek Lieber; Lee R. Nackman

Many fundamental tests performed by geometric algorithms can be formulated in terms of finding the sign of a determinant. When these tests are implemented using fixed precision arithmetic such as floating point, they can produce incorrect answers; when they are implemented using arbitrary-precision arithmetic, they are expensive to compute. We present adaptive-precision algorithms for finding the signs of determinants of matrices with integer and rational elements. These algorithms were developed and tested by integrating them into the Guibas-Stolfi Delaunay triangulation algorithm. Through a combination of algorithm design and careful engineering of the implementation, the resulting program can triangulate a set of random rational points in the unit circle only four to five times slower than can a floating-point implementation of the algorithm. The algorithms, engineering process, and software tools developed are described.

high-performance computer architecture | 2002

Evaluation of a multithreaded architecture for cellular computing

Călin Caşcaval; José G. Castaños; Luis Ceze; Monty M. Denneau; Manish Gupta; Derek Lieber; José E. Moreira; Karin Strauss; Henry S. Warren

Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thousands of chips can be built by replicating this basic cell in a regular pattern. In this paper, we describe the Cyclops architecture and evaluate two of its new hardware features: a memory hierarchy with a flexible cache organization and fast barrier hardware. Our experiments with the STREAM benchmark show that a particular design can achieve a sustainable memory bandwidth of 40 GB/s, equal to the peak hardware bandwidth and similar to the performance of a 128-processor SGI Origin 3800. For small vectors, we have observed in-cache bandwidth above 80 GB/s. We also show that the fast barrier hardware can improve the performance of the Splash-2 FFT kernel by up to 10%. Our results demonstrate that the Cyclops approach of integrating a large number of simple processing elements and multiple memory banks in the same chip is an effective alternative for designing high-performance systems.

ACM Sigarch Computer Architecture News | 2003

Dissecting Cyclops: a detailed analysis of a multithreaded architecture

George S. Almasi; Cǎlin Caşcaval; José G. Castaños; Monty M. Denneau; Derek Lieber; José E. Moreira; Henry S. Warren

Multiprocessor systems-on-a-chip offer a structured approach to managing complexity in chip design. Cyclops is a new family of multithreaded architectures which integrates processing logic, main memory and communications hardware on a single chip. Its simple, hierarchical design allows the hardware architect to manage a large number of components to meet the design constraints in terms of performance, power or application domain.This paper evaluates several alternative Cyclops designs with different relative costs and trade-offs. We compare the performance of several scientific kernels running on different configurations of this architecture. We show that by increasing the number of threads sharing a floating point unit we can hide fairly high cache and memory latencies. We prove that we can reach the theoretical peak performance of the chip and we identify the optimal balance of components for each application. We demonstrate that the design is well adapted to solve problems that are difficult to optimize. For example, we show that sparse matrix vector multiplication obtains 16 GFlops out of 32 GFlops of peak performance.

conference on high performance computing (supercomputing) | 2006

Designing a highly-scalable operating system: the Blue Gene/L story

José E. Moreira; Michael Brutman; José G. Castaños; Thomas Eugene Engelsiepen; Mark E. Giampapa; Tom Gooding; Roger L. Haskin; Todd Inglett; Derek Lieber; Patrick McCarthy; Michael Mundy; Jeffrey J. Parker; Brian Paul Wallenfelt

Blue Gene/L, is currently the worlds fastest and most scalable supercomputer. It has demonstrated essentially linear scaling all the way to 131,072 processors in several benchmarks and real applications. The operating systems for the compute and I/O nodes of Blue Gene/L are among the components responsible for that scalability. Compute nodes are dedicated to running application processes, whereas I/O nodes are dedicated to performing system functions. The operating systems adopted for each of these nodes reflect this separation of junction. Compute nodes run a lightweight operating system called the compute node kernel. I/O nodes run a port of the Linux operating system. This paper discusses the architecture and design of this solution for Blue Gene/L in the context of the hardware characteristics that led to the design decisions. It also explains and demonstrates how those decisions are instrumental in achieving the performance and scalability for which Blue Gene/L is famous

Ibm Journal of Research and Development | 2005

Blue Gene/L programming and operating environment

José E. Moreira; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Bergner; José R. Brunheroto; Michael Brutman; José G. Castaños; Paul G. Crumley; Manish Gupta; Todd Inglett; Derek Lieber; David Limpert; Patrick McCarthy; Mark Megerian; Mark P. Mendell; Michael Mundy; Don Reed; Ramendra K. Sahoo; Alda Sanomiya; Richard Shok; Brian E. Smith; Greg Stewart

With up to 65,536 compute nodes and a peak performance of more than 360 teraflops, the Blue Gene®/L (BG/L) supercomputer represents a new level of massively parallel systems. The system software stack for BG/L creates a programming and operating environment that harnesses the raw power of this architecture with great effectiveness. The design and implementation of this environment followed three major principles: simplicity, performance, and familiarity. By specializing the services provided by each component of the system architecture, we were able to keep each one simple and leverage the BG/L hardware features to deliver high performance to applications. We also implemented standard programming interfaces and programming languages that greatly simplified the job of porting applications to BG/L. The effectiveness of our approach has been demonstrated by the operational success of several prototype and production machines, which have already been scaled to 16,384 nodes.

International Journal of Parallel Programming | 2002

Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer

George S. Almasi; Calin Cascaval; José G. Castaños; Monty M. Denneau; Wilm E. Donath; Maria Eleftheriou; Mark E. Giampapa; C. T. Howard Ho; Derek Lieber; José E. Moreira; Dennis M. Newns; Marc Snir; Henry S. Warren

The IBM Blue Gene/C parallel computer aims to demonstrate the feasibility of a cellular architecture computer with millions of concurrent threads of execution. One of the major challenges in this project is showing that applications can successfully scale to this massive amount of parallelism. In this paper we demonstrate that the simulation of protein folding using classical molecular dynamics falls in this category. Starting from the sequential version of a well known molecular dynamics code, we developed a new parallel implementation that exploited the multiple levels of parallelism present in the Blue Gene/C cellular architecture. We performed both analytical and simulation studies of the behavior of this application when executed on a very large number of threads. As a result, we demonstrate that this class of applications can execute efficiently on a large cellular machine.

International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.

international conference on supercomputing | 2001

Demonstrating the scalability of a molecular dynamics application on a Petaflop computer

The IBM Blue Gene project has endeavored into the development of a cellular architecture computer with millions of concurrent threads of execution. One of the major challenges of this project is demonstrating that applications can successfully exploit this massive amount of parallelism. Starting from the sequential version of a well known molecular dynamics code, we developed a new application that exploits the multiple levels of parallelism in the Blue Gene cellular architecture. We perform both analytical and simulation studies of the behavior of this application when executed on a very large number of threads. As a result, we demonstrate that this class of applications can execute efficiently on a large cellular machine.

european conference on parallel processing | 2003

An Overview of the Blue Gene/L System Software Organization

George S. Almasi; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Luis Ceze; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Xavier Martorell; José E. Moreira; Alda Sanomiya; Karin Strauss

The Blue Gene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments.

Explore More