Rutger F. H. Hofman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rutger F. H. Hofman is active.

Explore More

Publication

Featured researches published by Rutger F. H. Hofman.

acm sigplan symposium on principles and practice of parallel programming | 1999

MagPIe: MPI's collective communication operations for clustered wide area systems

Thilo Kielmann; Rutger F. H. Hofman; Henri E. Bal; Aske Plaat; Raoul Bhoedjang

Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIEs algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIEs advantage increases for higher wide area latencies.

Concurrency and Computation: Practice and Experience | 2005

Ibis: a Flexible and Efficient Java based Grid Programming Environment

Rob V. van Nieuwpoort; Jason Maassen; Gosia Wrzesińska; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Henri E. Bal

In computational Grids, performance‐hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high‐performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Javas ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object‐based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero‐copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects. Copyright

ACM Transactions on Computer Systems | 1998

Performance evaluation of the Orca shared-object system

Henri E. Bal; Raoul Bhoedjang; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Koen Langendoen; Tim Rühl; M. Frans Kaashoek

Orca is a portable, object-based distributed shared memory (DSM) system. This article studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The article gives a quantitative analysis of Orcas coherence protocol (based on write-updates with function shipping), the totally ordered group communication protocol, the strategy for object placement, and the all-software, user-space architecture. Performance measurements for 10 parallel applications illustrate the trade-offs made in the design of Orca and show that essentially the right design decisions have been made. A write-update protocol with function shipping is effective for Orca, especially since it is used in combination with techniques that avoid replicating objects that have a low read/write ratio. The overhead of totally ordered group communication on application performance is low. The Orca system is able to make near-optimal decisions for object placement and replication. In addition, the article compares the performance of Orca with that of a page-based DSM (TreadMarks) and another object-based DSM (CRL). It also analyzes the communication overhead of the DSMs for several applications. All performance measurements are done on a 32-node Pentium Pro cluster with Myrinet and Fast Ethernet networks. The results show that Orca programs send fewer messages and less data than the TreadMarks and CRL programs and obtain better speedups.

Operating Systems Review | 2000

The distributed ASCI Supercomputer project

Henri E. Bal; Raoul Bhoedjang; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Jason Maassen; Rob V. van Nieuwpoort; John W. Romein; Luc Renambot; Tim Rühl; Ronald Veldema; Kees Verstoep; Aline Baggio; G.C. Ballintijn; Ihor Kuz; Guillaume Pierre; Maarten van Steen; Andrew S. Tanenbaum; G. Doornbos; Desmond Germans; Hans J. W. Spoelder; Evert Jan Baerends; Stan J. A. van Gisbergen; Hamideh Afsermanesh; Dick Van Albada; Adam Belloum; David Dubbeldam; Z.W. Hendrikse; Bob Hertzberger; Alfons G. Hoekstra

The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project.

ACM Transactions on Programming Languages and Systems | 2001

Efficient Java RMI for Parallel Programming

Jason Maassen; Rob V. van Nieuwpoort; Ronald Veldema; Henri E. Bal; Thilo Kielmann; Ceriel J. H. Jacobs; Rutger F. H. Hofman

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Suns RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols.A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.We evaluate the performance of Manta using benchmarks and applications that run on a 32-node Myrinet cluster. The time for a null-RMI (without parameters or a return value) of Manta is 35 times lower than for the Sun JDK 1.2, and only slightly higher than for a C-based RPC protocol. This high performance is accomplished by pushing almost all of the runtime overhead of RMI to compile time. We study the performance differences between the Manta and the Sun RMI protocols in detail. The poor performance of the Sun RMI protocol is in part due to an inefficient implementation of the protocol. To allow a fair comparison, we compiled the applications and the Sun RMI protocol with the native Manta compiler. The results show that Mantas null-RMI latency is still eight times lower than for the compiled Sun RMI protocol and that Mantas efficient RMI protocol results in 1.8 to 3.4 times higher speedups for four out of six applications.

Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande | 2002

Ibis: an efficient Java-based grid programming environment

Rob V. van Nieuwpoort; Jason Maassen; Rutger F. H. Hofman; Thilo Kielmann; Henri E. Bal

In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place.

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande | 2001

Runtime optimizations for a Java DSM implementation

Ronald Veldema; Rutger F. H. Hofman; Raoul Bhoedjang; Henri E. Bal

Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Javas memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems. This paper focuses on Jackals runtime system, which implements a multiple-writer, home-based consistency protocol. Protocol actions are triggered by software access checks that Jackals compiler inserts before object and array references. We describe optimizations for Jackals runtime system, which mainly consist of discovering opportunities to dispense with flushing of cached data. We give performance results for different runtime optimizations, and compare their impact with the impact of one compiler optimization. We find that our runtime optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations described in [24]. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Javas Remote Method Invocation (RMI) instead of shared memory.

merged international parallel processing symposium and symposium on parallel and distributed processing | 1998

Optimizing parallel applications for wide-area clusters

Henri E. Bal; Aske Plaat; Mirjam G. Bakker; Peter Dozy; Rutger F. H. Hofman

Recent developments in networking technology cause a growing interest in connecting local area clusters of workstations over wide area links, creating multilevel clusters, or metacomputers. Often, latency and bandwidth of local area and wide area networks differ by two orders of magnitude or more. One would expect only very coarse grain applications to achieve good performance. To test this intuition, we analyze the behavior of several existing medium-grain applications on a wide-area multicluster. We find that high performance can be obtained if the programs are optimized to take the multilevel network structure into account. The optimizations reduce intercluster traffic and hide intercluster latency, and substantially improve performance on wide area multiclusters. As a result, the range of metacomputing applications is larger than previously assumed.

high performance distributed computing | 2004

Wide-area communication for grids: an integrated solution to connectivity, performance and security problems

Alexandre Denis; Olivier Aumage; Rutger F. H. Hofman; Kees Verstoep; Thilo Kielmann; Henri E. Bal

Grid computing applications are challenged by current wide-area networks: firewalls, private IP addresses and network address translation (MAT) hamper connectivity, the TCP protocol can hardly exploit the available bandwidth, and security features like authentication and encryption are usually difficult to integrate. Existing systems (like GridFTP, JXTA, SOCKS) each address only one of these issues. However, applications need to cope with all of them, at the same time. Unfortunately, existing solutions are often not easy to combine, and a particular solution for one subproblem may reduce the applicability or performance of another. We identify the building blocks that are needed for connection establishment and efficient link utilization. We present an integrated solution, implemented within the Java-based Ibis runtime system. OurNetlbis implementation lets applications span multiple sites of a grid, and copes with firewalls, local IP addresses, secure communication, and TCP bandwidth problems.

acm sigplan symposium on principles and practice of parallel programming | 2001

Source-level global optimizations for fine-grain distributed shared memory systems

Ronald Veldema; Rutger F. H. Hofman; Raoul Bhoedjang; Ceriel J. H. Jacobs; Henri E. Bal

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration. The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graphs root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance. Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackals compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackals optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.

Explore More