Is this you? Create Your Porfile

Ronald Veldema

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ronald Veldema is active.

Explore More

Publication

Featured researches published by Ronald Veldema.

acm sigplan symposium on principles and practice of parallel programming | 1999

An efficient implementation of Java's remote method invocation

Jason Maassen; Rob V. van Nieuwpoort; Ronald Veldema; Henri E. Bal; Aske Plaat

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymorphism, which requires the system to be able to download remote classes into a running application. Suns RMI implementation achieves this kind of flexibility by passing around object type information and processing it at run time, which causes a major run time overhead. Using Suns JDK 1.1.4 on a Pentium Pro/Myri.net cluster, for example, the latency for a null RMI (without parameters or a return value) is 1228 μsec, which is about a factor of 40 higher than that of a user-level RPC. In this paper, we study an alternative approach for implementing RMI, based on native compilation. This approach allows for better optimization, eliminates the need for processing of type information at run time, and makes a light weight communication protocol possible. We have built a Java system based on a native compiler, which supports both compile time and run time generation of marshallers. We find that almost all of the run time overhead of RMI can be pushed to compile time. With this approach, the latency of a null RMI is reduced to 34 μsec, while still supporting polymorphic RMIs (and allowing interoperability with other JVMs).

Operating Systems Review | 2000

The distributed ASCI Supercomputer project

Henri E. Bal; Raoul Bhoedjang; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Jason Maassen; Rob V. van Nieuwpoort; John W. Romein; Luc Renambot; Tim Rühl; Ronald Veldema; Kees Verstoep; Aline Baggio; G.C. Ballintijn; Ihor Kuz; Guillaume Pierre; Maarten van Steen; Andrew S. Tanenbaum; G. Doornbos; Desmond Germans; Hans J. W. Spoelder; Evert Jan Baerends; Stan J. A. van Gisbergen; Hamideh Afsermanesh; Dick Van Albada; Adam Belloum; David Dubbeldam; Z.W. Hendrikse; Bob Hertzberger; Alfons G. Hoekstra

The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project.

ACM Transactions on Programming Languages and Systems | 2001

Efficient Java RMI for Parallel Programming

Jason Maassen; Rob V. van Nieuwpoort; Ronald Veldema; Henri E. Bal; Thilo Kielmann; Ceriel J. H. Jacobs; Rutger F. H. Hofman

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Suns RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols.A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.We evaluate the performance of Manta using benchmarks and applications that run on a 32-node Myrinet cluster. The time for a null-RMI (without parameters or a return value) of Manta is 35 times lower than for the Sun JDK 1.2, and only slightly higher than for a C-based RPC protocol. This high performance is accomplished by pushing almost all of the runtime overhead of RMI to compile time. We study the performance differences between the Manta and the Sun RMI protocols in detail. The poor performance of the Sun RMI protocol is in part due to an inefficient implementation of the protocol. To allow a fair comparison, we compiled the applications and the Sun RMI protocol with the native Manta compiler. The results show that Mantas null-RMI latency is still eight times lower than for the compiled Sun RMI protocol and that Mantas efficient RMI protocol results in 1.8 to 3.4 times higher speedups for four out of six applications.

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande | 2001

Runtime optimizations for a Java DSM implementation

Ronald Veldema; Rutger F. H. Hofman; Raoul Bhoedjang; Henri E. Bal

Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Javas memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems. This paper focuses on Jackals runtime system, which implements a multiple-writer, home-based consistency protocol. Protocol actions are triggered by software access checks that Jackals compiler inserts before object and array references. We describe optimizations for Jackals runtime system, which mainly consist of discovering opportunities to dispense with flushing of cached data. We give performance results for different runtime optimizations, and compare their impact with the impact of one compiler optimization. We find that our runtime optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations described in [24]. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Javas Remote Method Invocation (RMI) instead of shared memory.

Proceedings of the ACM 1999 conference on Java Grande | 1999

Wire-area parallel computing in Java

Rob V. van Nieuwpoort; Jason Maassen; Henri E. Bal; Thilo Kielmann; Ronald Veldema

Java’s support for parallel and distributed processing makes the lan-guage attractive for metacomputing applications, such as parallelapplications that run on geographically distributed (wide-area) sys-tems. To obtain actual experience with a Java-centric approach tometacomputing, we have built and used a high-performance wide-area Java system, called Manta. Manta implements the Java RMImodel using different communication protocols (active messagesand TCP/IP) for different networks. The paper shows how wide-area parallel applications can be expressed and optimized usingJava RMI. Also, it presents performance results of several applica-tions on a wide-area system consisting of four Myrinet-based clus-ters connected by ATM WANs.

Concurrency and Computation: Practice and Experience | 2007

JaMP: an implementation of OpenMP for a Java DSM

Michael Klemm; Matthias Bezold; Ronald Veldema; Michael Philippsen

Although OpenMP is a widely agreed‐upon standard for the C/C++ and Fortran programming languages for the semi‐automatic parallelization of programs for shared memory machines, not much has been done on the binding of OpenMP to Java that targets clusters with distributed memory. This paper presents three major contributions: (1) JaMP is an adaptation of the OpenMP standard to Java that implements a large subset of the OpenMP specification with an expressiveness comparable to that of OpenMP; (2) we suggest a set of extensions that allow a better integration of OpenMP into the Java language; (3) we present our prototype implementation of JaMP in the research compiler Jackal, a software‐based distributed shared memory implementation for Java. We evaluate the performance of JaMP with a set of micro‐benchmarks and with OpenMP versions of the parallel Java Grande Forum (JGF) benchmarks. The micro‐benchmarks show that OpenMP for Java can be implemented without much overhead. The JGF benchmarks achieve a good speed‐up of 5–8 on eight nodes. Copyright

acm sigplan symposium on principles and practice of parallel programming | 2001

Source-level global optimizations for fine-grain distributed shared memory systems

Ronald Veldema; Rutger F. H. Hofman; Raoul Bhoedjang; Ceriel J. H. Jacobs; Henri E. Bal

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration. The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graphs root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance. Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackals compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackals optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.

Proceedings of the 3rd International Workshop on Multicore Software Engineering | 2010

JCudaMP: OpenMP/Java on CUDA

Georg Dotzler; Ronald Veldema; Michael Klemm

We present an OpenMP framework for Java that can exploit an available graphics card as an application accelerator. Dynamic languages (Java, C#, etc.) pose a challenge here because of their write-once-run-everywhere approach. This renders it impossible to make compile-time assumptions on whether and which type of accelerator or graphics card might be available in the system at run-time. We present an execution model that dynamically analyzes the running environment to find out what hardware is attached. Based on the results it dynamically rewrites the bytecode and generates the necessary gpGPU code on-the-fly. Furthermore, we solve two extra problems caused by the combination of Java and CUDA. First, CUDA-capable hardware usually has little memory (compared to main memory). However, as Java is a pointer-free language, array data can be stored in main memory and buffered in GPU memory. Second, CUDA requires one to copy data to and from the graphics cards memory explicitly. As modern languages use many small objects, this would involve many copy operations when done naively. This is exacerbated because Java uses arrays-of-arrays to implement multi-dimensional arrays. A clever copying technique and two new array packages allow for more efficient use of CUDA.

Concurrency and Computation: Practice and Experience | 2000

Wide-area parallel programming using the remote method invocation model

Rob V. van Nieuwpoort; Jason Maassen; Henri E. Bal; Thilo Kielmann; Ronald Veldema

Java’s support for parallel and distributed processing makes the language attractive for metacomputing applications, such as parallel applications that run on geographically distributed (wide-area) systems. To obtain actual experience with a Java-centric approach to metacomputing, we have built and used a high-performance wide-area Java system, called Manta. Manta implements the Java Remote Method Invocation (RMI) model using different communication protocols (active messages and TCP/IP) for different networks. The paper shows how widearea parallel applications can be expressed and optimized using Java RMI. Also, it presents performance results of several applications on a wide-area system consisting of four Myrinetbased clusters connected by ATM WANs. We finally discuss alternative programming models, namely object replication, JavaSpaces, and MPI for Java.

The Journal of Logic and Algebraic Programming | 2007

Model checking a cache coherence protocol of a Java DSM implementation

Jun Pang; Wan Fokkink; Rutger F. H. Hofman; Ronald Veldema

Jackal is a fine-grained distributed shared memory implementation of the Java programming language. It aims to implement Java’s memory model and allows multithreaded Java programs to run unmodified on a distributed memory system. It employs a multiple-writer cache coherence protocol. In this paper, we report on our analysis of this protocol. We present its formal specification in μCRL, and discuss the abstractions that were made to avoid state explosion. Requirements were formulated and model checked with respect to several configurations. Our analysis revealed two errors in the implementation.

Explore More