Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Krzysztof Rojek is active.

Publication


Featured researches published by Krzysztof Rojek.


Scientific Programming | 2015

Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor

Lukasz Szustak; Krzysztof Rojek; Tomasz Olas; Lukasz Kuczynski; Kamil Halbiniak; Pawel Gepner

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.


Concurrency and Computation: Practice and Experience | 2015

Adaptation of fluid model EULAG to graphics processing unit architecture

Krzysztof Rojek; Milosz Ciznicki; Bogdan Rosa; Michal Kulczewski; Krzysztof Kurowski; Zbigniew P. Piotrowski; Lukasz Szustak; Damian Karol Wójcik; Roman Wyrzykowski

The goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok (EULAG) to future graphics processing units (GPU) platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm (MPDATA) and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual (GCR) are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply‐periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency. Copyright


international conference on parallel processing | 2013

Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Lukasz Szustak; Krzysztof Rojek; Pawel Gepner

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms, and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model.


parallel computing | 2014

Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

Roman Wyrzykowski; Lukasz Szustak; Krzysztof Rojek

Abstract EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components. Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are: • method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations; • method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques; • method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources; • approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs. Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.


parallel processing and applied mathematics | 2011

Parallelization of EULAG model on multicore architectures with GPU accelerators

Krzysztof Rojek; Lukasz Szustak

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed by the group headed by Piotr K. Smolarkiewicz for simulating thermo-fluid flows across a wide range of scales and physical scenarios. In this paper we focus on development of the most time-consuming calculations of the EULAG model, which is multidimensional positive definite advection transport algorithm (MPDATA). Our work consists of two parts. The first part is based on the GPU parallelization using ATI Radeon HD 5870 GPU, NVIDIA Tesla C1060 GPU, and Fermi based NVIDIA Tesla M2070-Q, while the second one assumes the multicore CPU parallelization using AMD Phenom II X6 CPU, and Intel Xeon E3-1200 CPU with Sandy Bridge architecture. In our work, we use such standards for multicore and GPGPU programming as OpenCL and OpenMP. The GPU parallelization is based on decomposition of the algorithm into several smaller tasks called kernels. They are executed in a FIFO order corresponding to the dependency tree expressing data dependencies between kernels. To optimize performance of the resulting implementation, we utilize the extensive vectorization of each kernel, as well as overlapping of data transfer with computations. At the same time, when considering CPU parallelization we focus on multicore processing, vectorization and cache reusing. To achieve high efficiency of computations, the SIMD processing is applied using standard SSE and new AVX extensions. In this paper we provide performance analysis based on the Roofline Model, which shows inherent hardware limitations for MPDATA, as well as potential benefit and priority of optimizations. In order to alleviate memory bottleneck and improve efficient cache reusing, we propose to use the loop tiling technique.


international conference on large scale scientific computing | 2011

Using blue gene/p and GPUs to accelerate computations in the EULAG model

Roman Wyrzykowski; Krzysztof Rojek; Łukasz Szustak

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed by the group headed by Piotr K. Smolarkiewicz for simulating thermo-fluid flows across a wide range of scales and physical scenarios. This paper presents perspectives of the EULAG parallelization based on the MPI, OpenMP, and OpenCL standards. We focus on development of computational kernels of the EULAG model. They consist of the most time-consuming calculations of the model, which are: laplacian algorithm (laplc) and multidimensional positive definite advection transport algorithm (MPDATA). The first challenge of our work was parallelization of the laplc subroutine using MPI across nodes and OpenMP within nodes, on the BlueGene/P supercomputer located in the Bulgarian Supercomputing Center. The second challenge was to accelerate computations of the Eulag model using modern GPUs. We discuss the scalability issue for the OpenCL implementation of the linear part of MPDATA on ATI Radeon HD 5870 GPU with AMD Phenom II X4 CPU, and NVIDIA Tesla C1060 GPU with AMD Phenom II X4 CPU.


international conference on large-scale scientific computing | 2013

Towards Efficient Decomposition and Parallelization of MPDATA on Hybrid CPU-GPU Cluster

Roman Wyrzykowski; Lukasz Szustak; Krzysztof Rojek; Adam Tomas

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive definite advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG.


international conference on parallel processing | 2013

Performance Analysis for Stencil-Based 3D MPDATA Algorithm on GPU Architecture

Krzysztof Rojek; Lukasz Szustak; Roman Wyrzykowski

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive defined advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG.


Concurrency and Computation: Practice and Experience | 2017

Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures

Krzysztof Rojek; Roman Wyrzykowski; Lukasz Kuczynski

In this work, we focus on a systematic adaptation of the stencil‐based multidimensional positive definite advection transport algorithm (MPDATA) to different graphics processing unit (GPU)‐based computing platforms. Another objective of this work is to compare the performance of MPDATA on several platforms, including a multi‐GPU system with two NVIDIA Tesla K80 cards, and single‐card platforms with Tesla K20X, GeForce GTX TITAN, and GeForce GTX 980. The usage of the following optimization methods is proposed to improve the overall performance: (i) reducing the number of operations by the subexpression elimination when implementing 2.5D blocking; (ii) reorganization of boundary conditions for reducing branch instructions; (iii) advanced memory management to increase the coalesced memory access; and (iv) warps rearrangement for optimizing the data access to GPU global memory. The presented methods of the MPDATA adaptation to GPU architectures allow us to efficiently use many graphics processors within a single node by applying peer‐to‐peer data transfers between GPU global memories. We propose an auto‐tuning procedure to compensate architectural differences between the considered platforms. This procedure takes into account algorithm/GPU‐specific parameters. The proposed approach to adaptation of MPDATA to GPU architectures allows us to achieve up to 482.5 Gflop/s for the platform equipped with two NVIDIA K80 GPUs. Copyright


parallel computing technologies | 2015

Parallelization of 3D MPDATA Algorithm Using Many Graphics Processors

Krzysztof Rojek; Roman Wyrzykowski

EULAG Eulerian/semi-Lagrangian fluid solver is an established numerical model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive definite advection transport algorithm MPDATA is among the most time-consuming components of EULAG. In this study, we focus on adapting the 3D MPDATA computations to clusters with graphics processors. Our approach is based on a hierarchical decomposition including the level of cluster, as well as an optimized distribution of computations between GPU resources within each node. To implement the resulting computing scheme, the MPI standard is used across nodes, while CUDA is applied inside nodes. We present performance results for the 3D MPDATA code running on the NVIDIA GeForce GTX TITAN graphics card, as well as on the Piz Daint cluster equipped with NVIDIA Tesla K20x GPUs. In particular, the sustained performance of 138 Gflop/s is achieved for a single GPU, which scales upi¾?to more than 11 Tflop/s for 256 GPUs.

Collaboration


Dive into the Krzysztof Rojek's collaboration.

Top Co-Authors

Avatar

Roman Wyrzykowski

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lukasz Szustak

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lukasz Kuczynski

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Łukasz Szustak

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Bogdan Rosa

University of Delaware

View shared research outputs
Top Co-Authors

Avatar

Adam Tomas

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Kamil Halbiniak

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Milosz Ciznicki

Poznań University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge