Dalibor Klusáček
CESNET
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dalibor Klusáček.
simulation tools and techniques for communications, networks and system | 2010
Dalibor Klusáček; Hana Rudová
This work describes the Grid and cluster scheduling simulator Alea 2 designed for study, testing and evaluation of various job scheduling techniques. This event-based simulator is able to deal with common problems related to the job scheduling like the heterogeneity of jobs, resources, and the dynamic runtime changes such as the arrivals of new jobs or the resource failures and restarts. The Alea 2 is based on the popular GridSim toolkit [31] and represents a major extension of the Alea simulator, developed in 2007 [16]. The extension covers both improved design, extended functionality as well as the improved scalability and the higher simulation speed. Finally, new visualization interface was introduced into the simulator. The main part of the simulator is a complex scheduler which incorporates several common scheduling algorithms working either on the queue or the schedule (plan) based principle. Additional data structures are used to maintain information about the resource status, the objective functions and for collection and visualization of the simulation results. Many typical objectives such as the machine usage, the average slowdown or the average response time are included. The paper concludes with an example of the Alea 2 execution using a real-life workload, discussing also the scalability of the simulator.
parallel processing and applied mathematics | 2007
Dalibor Klusáček; Luděk Matyska; Hana Rudová
This work concentrates on the design of a system intended for study of advanced scheduling techniques for planning various types of jobs in a Grid environment. The solution is able to deal with common problems of the job scheduling in Grids like heterogeneity of jobs and resources, and dynamic runtime changes such as arrivals of new jobs. Our new simulator called Alea is based on the GridSim simulation toolkit which we extended to provide a simulation environment that supports simulation of varying Grid scheduling problems. To demonstrate the features of the GridSim environment, we implemented an experimental centralised Grid scheduler which uses advanced scheduling techniques for schedule generation. By now local search based algorithms and some dispatching rules were tested. The scheduler is capable to handle both static and dynamic situation. In the static case, all jobs are known in advance while the dynamic situation means that jobs appear in the system during simulation. In this case generated schedule is changing through time as some jobs are already finished while the new ones are arriving. Comparison of FCFS, local search and dispatching rules is presented for both cases and we demonstrate that the new local search based algorithm provides the best schedule while keeping the running time acceptable.
CoreGRID Integration Workshop | 2008
Dalibor Klusáček; Hana Rudová; Ranieri Baraglia; Marco Pasquali; Gabriele Capannini
We propose a novel schedule-based approach for scheduling a continuous stream of batch jobs on the machines of a computational Grid. Our new solutions represented by dispatching rule Earliest Gap-Earliest Deadline First (EG-EDF) and Tabu search are based on the idea of filling gaps in the existing schedule. EG-EDF rule is able to build the schedule for all jobs incrementally by applying technique which fills earliest existing gaps in the schedule with newly arriving jobs. If no gap for a coming job is available EG-EDF rule uses Earliest Deadline First (EDF) strategy for including new job into the existing schedule. Such schedule is then optimized using the Tabu search algorithm moving jobs into earliest gaps again. Scheduling choices are taken to meet the Quality of Service (QoS) requested by the submitted jobs, and to optimize the usage of hardware resources. Proposed solution is compared with FCFS, EASY backfilling, and Flexible backfilling. Experiments shows that EG-EDF rule is able to compute good assignments, often with shorter algorithm runtime w.r.t. the other queue-based algorithms. Further Tabu search optimization results in higher QoS and machine usage.
job scheduling strategies for parallel processing | 2012
Dalibor Klusáček; Hana Rudová
In this work we analyze the performance of scheduling algorithms with respect to fairness. Existing works frequently consider fairness as a job related issue. In our work we analyze fairness with respect to different users of the system as this is a very important real-life problem. First, we discuss how fair are selected popular scheduling algorithms with respect to different users of the system. Next, we present an extension to the well known Conservative backfilling algorithm. Instead of “ad hoc” decisions, the schedule is now created subject to evaluation and optimization. Notably, the fairness is considered as an important metric, which accompanies standard performance related metrics such as slowdown or wait time. To achieve that, an inclusion of fairness as an optimization criterion is proposed. The new extension improves the performance and fairness of Conservative backfilling with respect to other classical techniques such as FCFS, EASY backfilling or aggressive backfilling without reservations.
european conference on parallel processing | 2014
Dalibor Klusáček; Šimon Tóth
Many studies in the past two decades focused on the problem of efficient job scheduling in HPC and Grid-like systems. While many new scheduling algorithms have been proposed for systems with specific requirements, mainstream resource management systems and schedulers are still only using a limited set of scheduling policies. Production systems need to balance various policies that are set in place to satisfy both the resource providers and users (or virtual organizations) in the system. While many works address these separate policies, e.g., fairshare for fair resource allocation, only few works try to address the interactions between these separate solutions. In this paper we describe how to approach these interactions when developing site-specific policies. Notably, we describe how (priority) queues interact with scheduling algorithms, fairshare and with anti-starvation mechanisms. Moreover, we present a case study describing how an advanced simulation tool was used to find new configuration for an actual resource manager deployed in the Czech National Grid, significantly increasing its performance.
computational intelligence | 2011
Dalibor Klusáček; Hana Rudová
Although Grid users demand good performance for their jobs, this requirement is often not satisfied by the widely used queue‐based scheduling approaches. This article concentrates on the application of schedule‐based methods that improve on both the service delivered to the user and the traditional objective of machine usage. Importantly, the interaction between the incremental application of these methods and the dynamic character of the problem allows reasonable runtimes to be achieved. Two new schedule‐based methods that are designed to schedule dynamically arriving jobs on machines in a computational Grid are formally described in the article. The Earliest Gap — Earlier Deadline First (EG‐EDF) policy fills the earliest gap in the known schedule with newly arriving jobs, incrementally building a new schedule. If the gap is not suitable for an incoming job, the EDF policy is used to modify the existing schedule. A Tabu search algorithm is used to further optimize the schedule by moving selected jobs into the earliest suitable gaps. The proposed incremental schedule‐based methods are compared with some of the most common queue‐based scheduling algorithms such as FCFS (First Come First Served), EASY backfilling (Extensible Argonne Scheduler sYstem), Flexible backfilling as well as with the nonincremental version of the EG‐EDF schedule‐based policy.
job scheduling strategies for parallel processing | 2014
Dalibor Klusáček; Hana Rudová
Current production resource management and scheduling systems often use some mechanism to guarantee fair sharing of computational resources among different users of the system. For example, the user who so far consumed small amount of CPU time gets higher priority and vice versa. However, different users may have highly heterogeneous demands concerning system resources, including CPUs, RAM, HDD storage capacity or, e.g., GPU cores. Therefore, it may not be fair to prioritize them only with respect to the consumed CPU time. Still, applied mechanisms often do not reflect other consumed resources or they use rather simplified and “ad hoc” solutions to approach these issues. We show that such solutions may be (highly) unfair and unsuitable for heterogeneous systems. We provide a survey of existing works that try to deal with this situation, analyzing and evaluating their characteristics. Next, we present new enhanced approach that supports multi-resource aware user prioritization mechanism. Importantly, this approach is capable of dealing with the heterogeneity of both jobs and resources. A working implementation of this new prioritization scheme is currently applied in the Czech National Grid Infrastructure MetaCentrum.
job scheduling strategies for parallel processing | 2010
Dalibor Klusáček; Hana Rudová
This paper has been inspired by the study of the complex data set from the Czech National Grid MetaCentrum. Unlike other widely used workloads from Parallel Workloads Archive or Grid Workloads Archive, this data set includes additional information concerning machine failures, job requirements and machine parameters which allows to perform more realistic simulations. We show that large differences in the performance of various scheduling algorithms appear when these additional information are used. Moreover, we studied other publicly available workloads and partially reconstructed information concerning their machine failures and job requirements using statistical and analytical models to demonstrate that similar behavior is also expectable for other workloads. We suggest that additional information about both machines and jobs should be incorporated into the workloads archives to allow proper and more realistic simulations.
job scheduling strategies for parallel processing | 2013
Dalibor Klusáček; Hana Rudová; Michal Jaroš
Current production resource management and scheduling systems often use some mechanism to guarantee fair sharing of computational resources among different users of the system. For example, the user who so far consumed small amount of CPU time gets higher priority and vice versa. The problem with such a solution is that it does not reflect other consumed resources like RAM, HDD storage capacity or GPU cores. Clearly, different users may have highly heterogeneous demands concerning aforementioned resources, yet they are all prioritized only with respect to consumed CPU time. In this paper we show that such a single resource-based approach is unfair and is no longer suitable for nowadays systems. We provide a survey of existing works that somehow try to deal with this situation and we closely analyze and evaluate their characteristics. Next, we propose new enhanced approaches that would allow the development of usable multi resource-aware user prioritization mechanisms. We demonstrate that different consumed resources can be weighted and combined together within a single formula which can be used to establish users’ priorities. Moreover, we show that when it comes to multiple resources, it is not always possible to find a suitable solution that would fulfill all fairness-related requirements.
Computer Science | 2012
Vaclav Chlumsky; Dalibor Klusáček; Miroslaw Ruda
In this work we present a major extension of the open source TORQUE Resource Manager system. We have replaced a naive scheduler provided in the TORQUE distribution with complex scheduling system that allows to plan job execution ahead and predict the behavior of the system. It is based on the application of job schedule, which represents the jobs’ execution plan. Such a functionality is very useful as the plan can be used by the users to see when and where their jobs will be executed. Moreover, created plans can be easily evaluated in order to identify possible inefficiencies. Then, repair actions can be taken immediately and the inefficiencies can be fixed, producing better schedules with respect to considered criteria.