Maria Chtepen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maria Chtepen is active.

Explore More

Publication

Featured researches published by Maria Chtepen.

IEEE Transactions on Parallel and Distributed Systems | 2009

Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids

Maria Chtepen; Filip Claeys; Bart Dhoedt; F. De Turck; Piet Demeester; Peter Vanrolleghem

A grid is a distributed computational and storage environment often composed of heterogeneous autonomously managed subsystems. As a result, varying resource availability becomes commonplace, often resulting in loss and delay of executing jobs. To ensure good grid performance, fault tolerance should be taken into account. Commonly utilized techniques for providing fault tolerance in distributed systems are periodic job checkpointing and replication. While very robust, both techniques can delay job execution if inappropriate checkpointing intervals and replica numbers are chosen. This paper introduces several heuristics that dynamically adapt the above mentioned parameters based on information on grid status to provide high job throughput in the presence of failure while reducing the system overhead. Furthermore, a novel fault-tolerant algorithm combining checkpointing and replication is presented. The proposed methods are evaluated in a newly developed grid simulation environment dynamic scheduling in distributed environments (DSiDE), which allows for easy modeling of dynamic system and job behavior. Simulations are run employing workload and system parameters derived from logs that were collected from several large-scale parallel production systems. Experiments have shown that adaptive approaches can considerably improve system performance, while the preference for one of the solutions depends on particular system characteristics, such as load, job submission patterns, and failure frequency.

international conference on conceptual structures | 2007

Providing Fault-Tolerance in Unreliable Grid Systems Through Adaptive Checkpointing and Replication

Maria Chtepen; Filip Claeys; Bart Dhoedt; Filip De Turck; Peter Vanrolleghem; Piet Demeester

As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of applications. Two well-known techniques for providing fault-tolerance in grids are periodic task checkpointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant run-time overhead. The latter largely depends on the length of checkpointing interval and the chosen number of replicas, respectively. This paper presents a dynamic scheduling algorithm that switches between periodic checkpointing and replication to exploit the advantages of both techniques and to reduce the overhead. Furthermore, several novel heuristics are discussed that perform on-line adaptive tuning of the checkpointing period based on historical information on resource behavior. Simulation-based comparison of the proposed combined algorithm versus traditional strategies based on checkpointing and replication only, suggests significant reduction of average task makespan for systems with varying load.

information technology interfaces | 2009

Adaptive checkpointing in dynamic grids for uncertain job durations

Maria Chtepen; Bart Dhoedt; Filip De Turck; Piet Demeester; Filip Claeys; Peter Vanrolleghem

Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamically collected feedback on its performance. Simulation results show that the new algorithm performs even better than adaptive approaches that make use of exact information on job execution times.

The Journal of Supercomputing | 2012

Online execution time prediction for computationally intensive applications with periodic progress updates

Maria Chtepen; Filip Claeys; Bart Dhoedt; Filip De Turck; Jan Fostier; Piet Demeester; Peter Vanrolleghem

The effectiveness of distributed execution of computationally intensive applications (jobs) largely depends on the quality of the applied scheduling approach. However, most of the existing non-trivial scheduling algorithms rely on prior knowledge or on prediction of application parameters, such as execution time, size of input and output, dependencies, etc., to assign applications to the available computational resources. A major issue is that these parameters are hard to determine in advance, especially if the end user does not possess an extensive history of previous application runs.In this work we propose an online method for execution time prediction of applications, for which execution progress can be collected at run-time. Using dynamic progress information, the total job execution time can be predicted using extrapolation. However, the predictions achieved by extrapolation are far from precise and often vary over time as a result of changing application dynamics and varying resource load. Therefore, to compute the actual job execution time we match a number of predefined prediction evolution models against the consecutive extrapolations, by adopting nonlinear curve-fitting. The “best-fit” coefficients allow for more accurate execution time prediction.The predictions made are used to enhance a dynamic scheduling algorithm for workflows introduced in our earlier work. The scheduling algorithm is run with and without curve-fitting, showing a performance improvement of up to 15% in the former case.

international conference on computational science | 2005

Computational complexity and distributed execution in water quality management

Maria Chtepen; Filip Claeys; Bart Dhoedt; Peter Vanrolleghem; Piet Demeester

Tourist beaches on the southern coast of Turkey are surveyed in order to facilitate a standardised fuzzy approach to be used in litter prediction and to assess the aesthetic state of the coastal environment for monitoring programs. During these surveys the number of litter items on beaches were counted and recorded in different categories. The main source of litter on beaches was determined as “beach users”. A fuzzy system was developed to predict the classification of the beaches, since uncertainty was generally inherent in beach work due to the high variability of beach characteristics and the sources of litter categories. This resulted in effective utilization of “the judgment and knowledge of beach users” in the evaluation of beach gradings.

Water Science and Technology | 2006

Distributed virtual experiments in water quality management

Filip Claeys; Maria Chtepen; Lorenzo Benedetti; Bart Dhoedt; Peter Vanrolleghem

iasted international conference on parallel and distributed computing and systems | 2006

Evaluation of replication and rescheduling heuristics for grid systems with varying resource availability

Maria Chtepen; Bart Dhoedt; Filip De Turck; Piet Demeester; Filip Claeys; Peter Vanrolleghem

Proceedings of IMM2006, the International Mediterranean Modelling Multiconference | 2006

Dynamic scheduling of computationally intensive applications on unreliable infrastructures

Maria Chtepen; Bart Dhoedt; Filip De Turck; Piet Demeester; Filip Claeys; Peter Vanrolleghem

congress on modelling and simulation | 2009

Performance evaluation and optimization of an adaptive scheduling approach for dependent grid jobs with unknown execution time

Maria Chtepen; Fha Claeys; Bart Dhoedt; Filip De Turck; Peter Vanrolleghem; Piet Demeester

Proceedings of EVGM2008, the 4th IEEE/IFIP International Workshop on End-to-End Virtualization and Grid Management, in conjunction with the 5th International Workshop on Next Generation Networking Middleware (NGNM2008) | 2008