Henri E. Bal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henri E. Bal is active.

Explore More

Publication

Featured researches published by Henri E. Bal.

ACM Computing Surveys | 1989

Programming languages for distributed computing systems

Henri E. Bal; Jennifer G. Steiner; Andrew S. Tanenbaum

When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on message passing, rendezvous, remote procedure call, objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages.

IEEE Transactions on Software Engineering | 1992

Orca: a language for parallel programming of distributed systems

Henri E. Bal; M.F. Kaashoek; Andrew S. Tanenbaum

A detailed description is given of the Orca language design and the design choices are discussed. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy-to-use language that is type-secure and provides clean semantics. Three example parallel applications in Orca, one of which is described in detail, are discussed. One of the existing implementations, which is based on reliable broadcasting, is described. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. The authors compare Orca with several related languages and systems. >

acm sigplan symposium on principles and practice of parallel programming | 1999

MagPIe: MPI's collective communication operations for clustered wide area systems

Thilo Kielmann; Rutger F. H. Hofman; Henri E. Bal; Aske Plaat; Raoul Bhoedjang

Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIEs algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIEs advantage increases for higher wide area latencies.

mobile computing, applications, and services | 2010

Cuckoo: A Computation Offloading Framework for Smartphones

Roelof Kemp; Nicholas Palmer; Thilo Kielmann; Henri E. Bal

Offloading computation from smartphones to remote cloud resources has recently been rediscovered as a technique to enhance the performance of smartphone applications, while reducing the energy usage.

IEEE Computer | 1998

User-level network interface protocols

Raoul Bhoedjang; Tim Rühl; Henri E. Bal

Modern high speed local area networks offer great potential for communication intensive applications, but their performance is limited by the use of traditional communication protocols, such as TCP/IP. In most cases, these protocols require that all network access be through the operating system, which adds significant overhead to both the transmission path (typically a system call and data copy) and the receive path (typically an interrupt, a system call, and a data copy). To address this performance problem, several user level communication architectures have been developed that remove the operating system from the critical communication path. The article describes six important issues to consider in designing communication protocols for user level architectures. The issues discussed focus on the performance and semantics of a communication system. These issues include data transfer, address translation, protection, and control transfer mechanisms, as well as the issues of reliability and multicast. To provide a basis for analyzing these issues, the authors present a simple network interface protocol for Myricoms Myrinet network, which has a programmable network interface. Researchers can thus explore many protocol design options, and several groups have designed communication systems for Myrinet. The authors refer to 11 such systems, all of which differ significantly in how they resolve these design issues but all of which aim for high performance and provide a lean, low level, and more or less generic communication facility.

Concurrency and Computation: Practice and Experience | 2005

Ibis: a Flexible and Efficient Java based Grid Programming Environment

Rob V. van Nieuwpoort; Jason Maassen; Gosia Wrzesińska; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Henri E. Bal

In computational Grids, performance‐hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high‐performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Javas ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object‐based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero‐copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects. Copyright

international semantic web conference | 2010

OWL reasoning with WebPIE: calculating the closure of 100 billion triples

Jacopo Urbani; Spyros Kotoulas; Jason Maassen; Frank van Harmelen; Henri E. Bal

In previous work we have shown that the MapReduce framework for distributed computation can be deployed for highly scalable inference over RDF graphs under the RDF Schema semantics. Unfortunately, several key optimizations that enabled the scalable RDFS inference do not generalize to the richer OWL semantics. In this paper we analyze these problems, and we propose solutions to overcome them. Our solutions allow distributed computation of the closure of an RDF graph under the OWL Horst semantics. We demonstrate the WebPIE inference engine, built on top of the Hadoop platform and deployed on a compute cluster of 64 machines. We have evaluated our approach using some real-world datasets (UniProt and LDSR, about 0.9-1.5 billion triples) and a synthetic benchmark (LUBM, up to 100 billion triples). Results show that our implementation is scalable and vastly outperforms current systems when comparing supported language expressivity, maximum data size and inference speed.

international parallel and distributed processing symposium | 2000

Fast Measurement of LogP Parameters for Message Passing Platforms

Thilo Kielmann; Henri E. Bal; Kees Verstoep

Performance modeling is important for implementing efficient parallel applications and runtime systems. The LogP model captures the relevant aspects of message passing in distributed-memory architectures. In this paper we describe an efficient method that measures LogP parameters for a given message passing platform. Measurements are performed for messages of different sizes, as covered by the parameterized LogP model, a slight extension of LogP and LogGP. To minimize both intrusiveness and completion time of the measurement, we propose a procedure that sends as few messages as possible. An implementation of this procedure, called the MPI LogP benchmark, is available from our WWW site.

acm sigplan symposium on principles and practice of parallel programming | 2001

Efficient load balancing for wide-area divide-and-conquer applications

Rob V. van Nieuwpoort; Thilo Kielmann; Henri E. Bal

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth. In this paper, we experimentally compare RS with existing load-balancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Cluster-aware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divide-and-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.

ACM Transactions on Computer Systems | 1998

Performance evaluation of the Orca shared-object system

Henri E. Bal; Raoul Bhoedjang; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Koen Langendoen; Tim Rühl; M. Frans Kaashoek

Orca is a portable, object-based distributed shared memory (DSM) system. This article studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The article gives a quantitative analysis of Orcas coherence protocol (based on write-updates with function shipping), the totally ordered group communication protocol, the strategy for object placement, and the all-software, user-space architecture. Performance measurements for 10 parallel applications illustrate the trade-offs made in the design of Orca and show that essentially the right design decisions have been made. A write-update protocol with function shipping is effective for Orca, especially since it is used in combination with techniques that avoid replicating objects that have a low read/write ratio. The overhead of totally ordered group communication on application performance is low. The Orca system is able to make near-optimal decisions for object placement and replication. In addition, the article compares the performance of Orca with that of a page-based DSM (TreadMarks) and another object-based DSM (CRL). It also analyzes the communication overhead of the DSMs for several applications. All performance measurements are done on a 32-node Pentium Pro cluster with Myrinet and Fast Ethernet networks. The results show that Orca programs send fewer messages and less data than the TreadMarks and CRL programs and obtain better speedups.

Explore More