Gosia Wrzesińska | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gosia Wrzesińska is active.

Explore More

Publication

Featured researches published by Gosia Wrzesińska.

Concurrency and Computation: Practice and Experience | 2005

Ibis: a Flexible and Efficient Java based Grid Programming Environment

Rob V. van Nieuwpoort; Jason Maassen; Gosia Wrzesińska; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Henri E. Bal

In computational Grids, performance‐hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high‐performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Javas ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object‐based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero‐copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects. Copyright

ACM Transactions on Programming Languages and Systems | 2010

Satin: A high-level and efficient grid programming model

Rob V. van Nieuwpoort; Gosia Wrzesińska; Ceriel J. H. Jacobs; Henri E. Bal

Computational grids have an enormous potential to provide compute power. However, this power remains largely unexploited today for most applications, except trivially parallel programs. Developing parallel grid applications simply is too difficult. Grids introduce several problems not encountered before, mainly due to the highly heterogeneous and dynamic computing and networking environment. Furthermore, failures occur frequently, and resources may be claimed by higher-priority jobs at any time. In this article, we solve these problems for an important class of applications: divide-and-conquer. We introduce a system called Satin that simplifies the development of parallel grid applications by providing a rich high-level programming model that completely hides communication. All grid issues are transparently handled in the runtime system, not by the programmer. Satins programming model is based on Java, features spawn-sync primitives and shared objects, and uses asynchronous exceptions and an abort mechanism to support speculative parallelism. To allow an efficient implementation, Satin consistently exploits the idea that grids are hierarchically structured. Dynamic load-balancing is done with a novel cluster-aware scheduling algorithm that hides the long wide-area latencies by overlapping them with useful local work. Satins shared object model lets the application define the consistency model it needs. If an application needs only loose consistency, it does not have to pay high performance penalties for wide-area communication and synchronization. We demonstrate how grid problems such as resource changes and failures can be handled transparently and efficiently. Finally, we show that adaptivity is important in grids. Satin can increase performance considerably by adding and removing compute resources automatically, based on the applications requirements and the utilization of the machines and networks in the grid. Using an extensive evaluation on real grids with up to 960 cores, we demonstrate that it is possible to provide a simple high-level programming model for divide-and-conquer applications, while achieving excellent performance on grids. At the same time, we show that the divide-and-conquer model scales better on large systems than the master-worker approach, since it has no single central bottleneck.

acm sigplan symposium on principles and practice of parallel programming | 2007

Self-adaptive applications on the grid

Gosia Wrzesińska; Jason Maassen; Henri E. Bal

Grids are inherently heterogeneous and dynamic. One important problemin grid computing is resource selection, that is, finding anappropriate resource set for the application. Another problem is adaptation to the changing characteristics of the grid environment. Existing solutions to these two problems require that a performance model for an application is known. However, constructing such models is a complex task. In this paper, we investigate an approach that does not require performance models. We start an application on any set of resources. During the application run, we periodically collect the statistics about the application run and deduce application requirements from these statistics. Then, we adjustthe resource set to better fit the application needs. This approach allows us to avoid performance bottlenecks, such as overloaded WAN links or very slow processors, and therefore can yield significant performance improvements. We evaluate our approach in a number of scenarios typical for the Grid.

ieee international conference on high performance computing data and analytics | 2006

Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments

Gosia Wrzesińska; Rob V. van Nieuwpoort; Jason Maassen; Thilo Kielmann; Henri E. Bal

Divide-and-conquer is a well-suited programming paradigm for parallel Grid applications. Our Satin system efficiently schedules the fine-grained tasks of a divide-andconquer application across multiple clusters in a grid. To accommodate long-running applications, we present a fault-tolerance mechanism for Satin that has negligible overhead during normal execution, while minimizing the amount of redundant work done after a crash of one or more nodes. We study the impact of our fault-tolerance mechanism on application efficiency, both on the Dutch DAS-2 system and using the European testbed of the ECfunded project GridLab.

IEEE Computer | 2010

Real-World Distributed Computer with Ibis

Henri E. Bal; Jason Maassen; Rob V. van Nieuwpoort; Niels Drost; Roelof Kemp; Timo van Kessel; Nick Palmer; Gosia Wrzesińska; Thilo Kielmann; Kees van Reeuwijk; Frank J. Seinstra; Ceriel J. H. Jacobs; Kees Verstoep

The use of parallel and distributed computing systems is essential to meet the ever-increasing computational demands of many scientific and industrial applications. Ibis allows easy programming and deployment of compute-intensive distributed applications, even for dynamic, faulty, and heterogeneous environments.

international conference on e science | 2006

Satin++: Divide-and-Share on the Grid

Gosia Wrzesińska; Jason Maassen; Kees Verstoep; Henri E. Bal

Divide-and-conquer is a popular and effective paradigm for writing grid-enabled applications. I t has been shown to perform well i n environments wtth high network latencies and dynamically changing numbers of processors. However, an important disadvantage of the divide-and-conquer paradigm is its limited applicability due to the lack of a shared data abstraction. W e propose a divide-and-share model: the divide-andconquer model extended with shared objects. Shared objects implement a relaxed consistency model called guard consistency. W e have implemented Satin++: a framework for writing divide-and-share applications. With Satin++ we implemented a number of applications including VLSI routing, N-body simulation and a S A T solver. W e evaluate the performance of our model on a cluster supercomputer and on the heterogeneous, wide-area Grid15000 testbed and demonstrate that our applications can achieve high eficiencies on the Grid.

cluster computing and the grid | 2004

An simple and efficient fault tolerance mechanism for divide-and-conquer systems

Gosia Wrzesińska; R.V. van Nieuwpoort; Jason Maassen; Henri E. Bal

Summary form only given. We study if fault tolerance can be made simpler and more efficient by exploiting the structure of the application. More specifically, we study divide-and-conquer parallelism, which is a popular and effective paradigm for writing parallel Grid applications. We have designed a novel fault tolerance mechanism for divide-and-conquer applications that reduces the amount of redundant computation by storing results of the discarded in a global (replicated) table. These results can later be reused, thereby minimizing the amount of work lost as a result of a crash. The execution time overhead of our mechanism is close to zero. Our mechanism can handle crashes of multiple processors or entire clusters at the same time.. It can also handle crashes of the root node that initially started the parallel computation. We have incorporated our fault tolerance mechanism in Satin, which is a Java-based divide-and-conquer system. Satin is implemented on top of the Ibis communication library. The core of Ibis is implemented in pure Java, without using any native libraries. The Satin runtime system and our fault tolerance extension also are written entirely in Java. The resulting system therefore is highly portable allowing the software to run unmodified on a heterogeneous Grid. We evaluated the performance of our fault tolerance scheme on a cluster of the Distributed ASCI Supercomputer 2 (DAS-2). In the first part of our tests, we show that the execution time overhead of our mechanism is close to zero. The results of the second part of our tests show that our algorithm salvages most of the work done by alive processors. Finally, we carried out tests on the European GridLab testbed. We ran one of our applications on a set of six heterogeneous parallel machines (four different operating systems, four different architectures) located in four different European countries. After manually killing one of the sites, the program recovered and finished normally.

european conference on parallel processing | 2007

Persistent fault-tolerance for divide-and-conquer applications on the grid

Gosia Wrzesińska; Ana-Maria Oprescu; Thilo Kielmann; Henri E. Bal

Grid applications need to be fault tolerant, malleable, and migratable. In previous work, we have presented orphan saving, an efficient mechanism addressing these issues for divide-and-conquer applications. In this paper, we present a mechanism for writing partial results to checkpoint files, adding the capability to also tolerate the total loss of all processors, and to allow suspending and later resuming an application. Both mechanisms have only negligible overheads in the absence of faults, even with extremely short checkpointing intervals like one minute. In the case of faults, the new checkpointing mechanism outperforms orphan saving by 10% to 15 %. Also, suspending/resuming an application has only little overhead, making our approach very attractive for writing grid applications.

Archive | 2007

Redesigning the Segl Problem Solving Environment: A Case Study of Using Mediator Components

Thilo Kielmann; Gosia Wrzesińska; Natalia Currle-Linde; Michael M. Resch

The Science Experimental Grid Laboratory (SEGL) problem solving environment allows users to describe and execute complex parameter study workflows in Grid environments. Its current implementation provides much high-level functionality for executing complex parameter-study workflows. Alternatively, using a toolkit of mediator components that integrate system-component capabilities into application code would allow to build a system like SEGL from existing, more generally applicable components, simplifying its implementation and maintenance. In this paper, we present the given design of the SEGL PSE, analyze the provided functionality, and identify a set of mediator components that can generalize the functionality required by this challenging application category.

The Journal of Supercomputing | 2006