Dick H. J. Epema | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dick H. J. Epema is active.

Explore More

Publication

Featured researches published by Dick H. J. Epema.

international workshop on peer to peer systems | 2005

The bittorrent p2p file-sharing system: measurements and analysis

Johan A. Pouwelse; Pawel Garbacki; Dick H. J. Epema; Henk J. Sips

Of the many P2P file-sharing prototypes in existence, BitTorrent is one of the few that has managed to attract millions of users. BitTorrent relies on other (global) components for file search, employs a moderator system to ensure the integrity of file data, and uses a bartering technique for downloading in order to prevent users from freeriding. In this paper we present a measurement study of BitTorrent in which we focus on four issues, viz. availability, integrity, flashcrowd handling, and download performance. The purpose of this paper is to aid in the understanding of a real P2P system that apparently has the right mechanisms to attract a large user community, to provide measurement data that may be useful in modeling P2P systems, and to identify design issues in such systems.

Future Generation Computer Systems | 1996

A worldwide flock of Condors: load sharing among workstation clusters

Dick H. J. Epema; Miron Livny; R. van Dantzig; X. Evers; Jim Pruyne

Abstract Condor is a distributed batch system for sharing the workload of compute-intensive jobs in a pool of unix workstations connected by a network. In such a Condor pool, idle machines are spotted by Condor and allocated to queued jobs, thus putting otherwise unutilized capacity to efficient use. When institutions owning Condor pools cooperate, they may wish to exploit the joint capacity of their pools in a similar way. So the need arises to extend the Condor load-sharing and protection mechanisms beyond the boundaries of Condor pools, or in other words, to create a flock of Condors. Such a flock may include Condor pools connected by local-area networks as well as by wide-area networks. In this paper we describe the design and implementation of a distributed, layered Condor flocking mechanism. The main concept in this design is the Gateway Machine that represents in each pool idle machines from other pools in the flock and allows job transfers across pool boundaries. Our flocking design is transparent to the workstation owners, to the users, and to Condor itself. We also discuss our experiences with an intercontinental Condor flock.

Future Generation Computer Systems | 2013

Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds

Saeid Abrishami; Mahmoud Naghibzadeh; Dick H. J. Epema

The advent of Cloud computing as a new model of service provisioning in distributed systems encourages researchers to investigate its benefits and drawbacks on executing scientific applications such as workflows. One of the most challenging problems in Clouds is workflow scheduling, i.e., the problem of satisfying the QoS requirements of the user as well as minimizing the cost of workflow execution. We have previously designed and analyzed a two-phase scheduling algorithm for utility Grids, called Partial Critical Paths (PCP), which aims to minimize the cost of workflow execution while meeting a user-defined deadline. However, we believe Clouds are different from utility Grids in three ways: on-demand resource provisioning, homogeneous networks, and the pay-as-you-go pricing model. In this paper, we adapt the PCP algorithm for the Cloud environment and propose two workflow scheduling algorithms: a one-phase algorithm which is called IaaS Cloud Partial Critical Paths (IC-PCP), and a two-phase algorithm which is called IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2). Both algorithms have a polynomial time complexity which make them suitable options for scheduling large workflows. The simulation results show that both algorithms have a promising performance, with IC-PCP performing better than IC-PCPD2 in most cases. Highlights? We propose two workflow scheduling algorithms for IaaS Clouds. ? The algorithms aim to minimize the workflow execution cost while meeting a deadline. ? The pricing model of the Clouds is considered which is based on a time interval. ? The algorithms are compared with a list heuristic through simulation. ? The experiments show the promising performance of both algorithms.

ieee/acm international symposium cluster, cloud and grid computing | 2011

On the Performance Variability of Production Cloud Services

Alexandru Iosup; Nezih Yigitbasi; Dick H. J. Epema

Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.

grid computing | 2010

The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

Derrick Kondo; Bahman Javadi; Alexandru Iosup; Dick H. J. Epema

With the increasing functionality and complexity of distributed systems, resource failures are inevitable. While numerous models and algorithms for dealing with failures exist, the lack of public trace data sets and tools has prevented meaningful comparisons. To facilitate the design, validation, and comparison of fault-tolerant models and algorithms, we have created the Failure Trace Archive (FTA) as an online public repository of availability traces taken from diverse parallel and distributed systems. Our main contributions in this study are the following. First, we describe the design of the archive, in particular the rationale of the standard FTA format, and the design of a toolbox that facilitates automated analysis of trace data sets. Second, applying the toolbox, we present a uniform comparative analysis with statistics and models of failures in nine distributed systems. Third, we show how different interpretations of these data sets can result in different conclusions. This emphasizes the critical need for the public availability of trace data and methods for their analysis.

grid computing | 2006

How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications

Alexandru Iosup; Catalin L. Dumitrescu; Dick H. J. Epema; Hui Li; Lex Wolters

The grid computing vision promises to provide the needed platform for a new and more demanding range of applications. For this promise to become true, a number of hurdles, including the design and deployment of adequate resource management and information services, need to be overcome. In this context, understanding the characteristics of real grid workloads is a crucial step for improving the quality of existing grid services, and in guiding the design of new solutions. Towards this goal, in this work we present the characteristics of traces of four real grid environments, namely LCG, Grid3, and TeraGrid, which are among the largest production grids currently deployed, and the DAS, which is a research grid. We focus our analysis on virtual organizations, on users, and on individual jobs characteristics. We further attempt to quantify the evolution and the performance of the grid systems from which our traces originate. Finally, given the scarcity of the information available for analysis purposes, we discuss the requirements of a new format for grid traces, and we propose the establishment of a virtual center for workload-based grid benchmarking data: the grid workloads archive

cluster computing and the grid | 2009

C-Meter: A Framework for Performance Analysis of Computing Clouds

Nezih Yigitbasi; Alexandru Iosup; Dick H. J. Epema; Simon Ostermann

Cloud computing has emerged as a new technology that provides large amounts of computing and data storage capacity to its users with a promise of increased scalability, high availability, and reduced administration and maintenance costs. As the use of cloud computing environments increases, it becomes crucial to understand the performance of these environments. So, it is of great importance to assess the performance of computing clouds in terms of various metrics, such as the overhead of acquiring and releasing the virtual computing resources, and other virtualization and network communications overheads. To address these issues, we have designed and implemented C-Meter, which is a portable, extensible, and easy-to-use framework for generating and submitting test workloads to computing clouds. In this paper, first we state the requirements for frameworks to assess the performance of computing clouds. Then, we present the architecture of the C-Meter framework and discuss several cloud resource management alternatives. Finally, we present ourearly experiences with C-Meter in Amazon EC2. We show how C-Meter can be used for assessing the overhead of acquiring and releasing the virtual computing resources, for comparing different configurations, and for evaluating different scheduling algorithms.

high performance distributed computing | 2008

The performance of bags-of-tasks in large-scale distributed systems

Alexandru Iosup; Omer Ozan Sonmez; Shanny Anoep; Dick H. J. Epema

Ever more scientists are employing large-scale distributed systems such as grids for their computational work, instead of tightly coupled high-performance computing systems. However, while these distributed systems are more cost-effective, their heterogeneity in terms of hardware, software, and systems administration, and the lack of accurate resource information leads to inefficient scheduling. In addition, and in contrast to the workloads of tightly coupled high-performance computing systems, a large part of the workloads submitted to these distributed systems consists of large sets (bags) of sequential tasks. Therefore, a realistic performance analysis of scheduling bags-of-tasks in large-scale distributed systems is important. Towards this end, we introduce in this paper a realistic workload model for bags-of-tasks, and we explore through trace-based simulations the design space of scheduling bags-of-tasks. Finally, we identify three new scheduling policies that use only inaccurate information when scheduling, and we compare them against known classes of proposed scheduling policies.

IEEE Transactions on Parallel and Distributed Systems | 2012

Cost-Driven Scheduling of Grid Workflows Using Partial Critical Paths

Saeid Abrishami; Mahmoud Naghibzadeh; Dick H. J. Epema

Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling, i.e., the problem of satisfying the QoS of the users as well as minimizing the cost of workflow execution. In this paper, we propose a new QoS-based workflow scheduling algorithm based on a novel concept called Partial Critical Paths (PCP), that tries to minimize the cost of workflow execution while meeting a user-defined deadline. The PCP algorithm has two phases: in the deadline distribution phase it recursively assigns subdeadlines to the tasks on the partial critical paths ending at previously assigned tasks, and in the planning phase it assigns the cheapest service to each task while meeting its subdeadline. The simulation results show that the performance of the PCP algorithm is very promising.

Operating Systems Review | 2000

The distributed ASCI Supercomputer project

Henri E. Bal; Raoul Bhoedjang; Rutger F. H. Hofman; Ceriel J. H. Jacobs; Thilo Kielmann; Jason Maassen; Rob V. van Nieuwpoort; John W. Romein; Luc Renambot; Tim Rühl; Ronald Veldema; Kees Verstoep; Aline Baggio; G.C. Ballintijn; Ihor Kuz; Guillaume Pierre; Maarten van Steen; Andrew S. Tanenbaum; G. Doornbos; Desmond Germans; Hans J. W. Spoelder; Evert Jan Baerends; Stan J. A. van Gisbergen; Hamideh Afsermanesh; Dick Van Albada; Adam Belloum; David Dubbeldam; Z.W. Hendrikse; Bob Hertzberger; Alfons G. Hoekstra

The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project.

Explore More