Ivan Gankevich
Saint Petersburg State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ivan Gankevich.
international conference on computational science and its applications | 2014
Ivan Gankevich; Vladimir Korkhov; Serob Balyan; Vladimir Gaiduchok; Dmitry Gushchanskiy; Yuri Tipikin; Alexander B. Degtyarev; Alexander V. Bogdanov
One of efficient ways to conduct experiments on HPC platforms is to create custom virtual computing environments tailored to the requirements of users and their applications. In this paper we investigate virtual private supercomputer, an approach based on virtualization, data consolidation, and cloud technologies. Virtualization is used to abstract applications from underlying hardware and operating system while data consolidation is applied to store data in a distributed storage system. Both virtualization and data consolidation layers offer APIs for distributed computations and data processing. Combined, these APIs shift the focus from supercomputing technologies to problems being solved. Based on these concepts, we propose an approach to construct virtual clusters with help of cloud computing technologies to be used as on-demand private supercomputers and evaluate performance of this solution.
international conference on high performance computing and simulation | 2015
Ivan Gankevich; Yuri Tipikin; Vladimir Gaiduchok
Nowadays, many cluster management systems rely on distributed consensus algorithms to elect a leader that orchestrates subordinate nodes. Contrary to these studies we propose consensus-free algorithm that arranges cluster nodes into multiple levels of subordination. The algorithm structures IP address range of cluster network so that each node has ranked list of candidates, from which it chooses a leader. The results show that this approach easily scales to a large number of nodes due to its asynchronous nature, and enables fast recovery from node failures as they occur only on one level of hierarchy. Multiple levels of subordination are useful for efficiently collecting monitoring and accounting data from large number of nodes, and for scheduling general-purpose tasks on a cluster.
international conference on high performance computing and simulation | 2016
Ivan Gankevich; Yuri Tipikin; Vladimir Korkhov; Vladimir Gaiduchok
Nowadays many job schedulers rely on checkpoint mechanisms to make long-running batch jobs resilient to node failures. At large scale stopping a job and creating its image consumes considerable amount of time. The aim of this study is to propose a method that eliminates this overhead. For this purpose we decompose a problem being solved into computational microkernels which have strict hierarchical dependence on each other. When a kernel abruptly stops its execution due to a node failure, it is responsibility of its principal to restart computation on a healthy node. In the course of experiments we successfully applied this method to make hydrodynamics HPC application run on constantly changing number of nodes. We believe, that this technique can be generalised to other types of scientific applications as well.
international conference on computational science and its applications | 2015
Ivan Gankevich; Yuri Tipikin; Alexander B. Degtyarev; Vladimir Korkhov
Efficient management of a distributed system is a common problem for universitys and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.
international conference on computational science and its applications | 2016
Ivan Gankevich; Yuri Tipikin; Vladimir Korkhov; Vladimir Gaiduchok; Alexander B. Degtyarev; Alexander V. Bogdanov
Master node fault-tolerance is the topic that is often dimmed in the discussion of big data processing technologies. Although failure of a master node can take down the whole data processing pipeline, this is considered either improbable or too difficult to encounter. The aim of the studies reported here is to propose rather simple technique to deal with master-node failures. This technique is based on temporary delegation of master role to one of the slave nodes and transferring updated state back to the master when one step of computation is complete. That way the state is duplicated and computation can proceed to the next step regardless of a failure of a delegate or the master (but not both). We run benchmarks to show that a failure of a master is almost “invisible” to other nodes, and failure of a delegate results in recomputation of only one step of data processing pipeline. We believe that the technique can be used not only in Big Data processing but in other types of applications.
international conference on high performance computing and simulation | 2017
Ivan Gankevich; Yuri Tipikin; Vladimir Korkhov
In this paper we describe a new framework for creating distributed programmes which are resilient to cluster node failures. Our main goal is to create a simple and reliable model, that ensures continuous execution of parallel programmes without creation of checkpoints, memory dumps and other I/O intensive activities. To achieve this we introduce multi-layered system architecture, each layer of which consists of unified entities organised into hierarchies, and then show how this system handles different node failure scenarios. We benchmark our system on the example of real-world HPC application on both physical and virtual clusters. The results of the experiments show that our approach has low overhead and scales to a large number of cluster nodes.
trans. computational science | 2016
Alexander V. Bogdanov; V. Yu. Gaiduchok; Ivan Gankevich; Yu. A. Tipikin; N. V. Yuzhanin
This paper discusses the unusual use of service desk system--- tracking of computational tasks on supercomputers as well as the collection of statistical information on the calculations performed. Particular attention is paid to the possibilities of using such statistics by a supercomputer user as well as the data center staff. The analysis of system requirements for tracking computational tasks and capabilities of service desk and job scheduler systems led to design and implementation of a way to integrate these systems to improve computational tasks management.
trans. computational science | 2016
Alexander B. Degtyarev; Ivan Gankevich
There are many causes of imbalanced load on a multiprocessor system such as heterogeneity of processors, parallel execution of tasks of varying complexity and also difficulties in estimating complexity of a particular task, however, if one can treat computer as an event-driven processing system and treat tasks as events running through this system the problem of load balance can be reduced to a well-posed mathematical problem which further simplifies to solving a single equation. The load balancer measures both complexity of the task being solved and performance of a computer running this particular task so that a load distribution can be adjusted accordingly. Such load balancer is implemented as a computer program and is known to be able to balance the load on heterogeneous processors in a number of scenarios.
Archive | 2018
Ivan Gankevich; Alexander B. Degtyarev
Simulation of sea waves is a problem appearing in the framework of developing software-based ship motion modelling applications. These applications generally use linear wave theory to generate small-amplitude waves programmatically and determine impact of external excitations on a ship hull. Using linear wave theory is feasible for ocean waves, but is not accurate for shallow-water and storm waves. To cope with these shortcomings we introduce autoregressive moving-average (ARMA) model, which is widely known in oceanography, but rarely used for sea wave modelling. The new model allows to generate waves of arbitrary amplitudes, is accurate for both shallow and deep water, and its software implementation shows superior performance by relying on fast Fourier transform family of algorithms. Integral characteristics of wavy surface produced by ARMA model are verified against the ones of real sea surface. Despite all its advantages, ARMA model requires a new method to determine wave pressures, an instance of which is included in the chapter.
international conference on high performance computing and simulation | 2017
Svetlana Sveshnikova; Ivan Gankevich
Research reproducibility is an emerging topic in computer science. One of the problems in research reproducibility is the absence of tools to reproduce specified operating system with specific version of the software installed. In the proposal reported here we investigate how a tool based on lightweight virtualisation technologies reproduces them. The experiments show that creating reproducible environment adds significant overhead only on the first run of the application, and propose a number of ways to improve the tool.