Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roman Wyrzykowski is active.

Publication


Featured researches published by Roman Wyrzykowski.


Concurrency and Computation: Practice and Experience | 2015

Adaptation of fluid model EULAG to graphics processing unit architecture

Krzysztof Rojek; Milosz Ciznicki; Bogdan Rosa; Michal Kulczewski; Krzysztof Kurowski; Zbigniew P. Piotrowski; Lukasz Szustak; Damian Karol Wójcik; Roman Wyrzykowski

The goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok (EULAG) to future graphics processing units (GPU) platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm (MPDATA) and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual (GCR) are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply‐periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency. Copyright


parallel computing | 2014

Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

Roman Wyrzykowski; Lukasz Szustak; Krzysztof Rojek

Abstract EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components. Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are: • method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations; • method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques; • method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources; • approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs. Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.


parallel processing and applied mathematics | 2007

Parallel implementation of Cholesky LL T -algorithm in FPGA-based processor

Oleg Maslennikow; Volodymyr Lepekha; Anatoli Sergiyenko; Adam Tomas; Roman Wyrzykowski

The fixed-size processor array architecture, which is intended for realization of matrix LLT-decomposition based on Cholesky algorithm, is proposed. In order to implement this architecture in modern FPGA devices, the arithmetic unit (AU) operating in the rational fraction arithmetic is designed. The AU is intended for configuring in the Xilinx Virtex4 FPGAs, and its hardware complexity is much less than the complexity of similar AUs operating with floating-point numbers.


international conference on parallel processing | 2001

FEM Computations on Clusters Using Different Models of Parallel Programming

Tomasz Olas; Konrad Karczewski; Adam Tomas; Roman Wyrzykowski

ParallelNuscaS is an object-oriented package for parallel finite elemt modeling, developed at the Technical University of Czestochowa. This paper is devoted to the investigation of the package performance on the ACCORD cluster, which this year was built in the Institute of Mathematics and Computer Science of this University. At present, ACCORD contains 18 Pentium III 750 MHz processors, or 9 SMP nodes, connected both by the fast MYRINET networkand standard Fast Ethernet, as well as 8 SMP nodes with 16 AMD Athlon MP 1.2 GHZ processors. We discuss the implementation and performance of parallel FEM computations not only for the message-passing model of parallel programming, but also for the hybrid model, which is a mixture of multithreading inside SMP nodes and message passing between them.


IEEE Transactions on Parallel and Distributed Systems | 2017

Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing

Alexey L. Lastovetsky; Lukasz Szustak; Roman Wyrzykowski

Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data exchange, maximizing this way the utilization of processors. In this paper, we challenge the universality of the load-balancing approach to optimization of the performance of parallel applications. First, we formulate conditions that should be satisfied by the performance profile of an application in order for the application to achieve its best performance via load balancing. Then we use a real-life scientific application, EULAG MPDATA kernel, to demonstrate that its performance profile on a modern parallel architecture, Intel Xeon Phi, significantly deviates from these conditions. Based on this observation, we propose a method of performance optimization of scientific applications through load imbalancing. In the case of data parallel application, the method uses functional performance models of the application to find partitioning that minimizes its computation time but not necessarily balances the load of processors. We apply this method to optimization of MPDATA on Intel Xeon Phi. Experimental results demonstrate that the performance of this carefully optimized load-balanced application can be further improved by 15percent using the proposed load-imbalancing technique.


international conference on large scale scientific computing | 2011

Using blue gene/p and GPUs to accelerate computations in the EULAG model

Roman Wyrzykowski; Krzysztof Rojek; Łukasz Szustak

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed by the group headed by Piotr K. Smolarkiewicz for simulating thermo-fluid flows across a wide range of scales and physical scenarios. This paper presents perspectives of the EULAG parallelization based on the MPI, OpenMP, and OpenCL standards. We focus on development of computational kernels of the EULAG model. They consist of the most time-consuming calculations of the model, which are: laplacian algorithm (laplc) and multidimensional positive definite advection transport algorithm (MPDATA). The first challenge of our work was parallelization of the laplc subroutine using MPI across nodes and OpenMP within nodes, on the BlueGene/P supercomputer located in the Bulgarian Supercomputing Center. The second challenge was to accelerate computations of the Eulag model using modern GPUs. We discuss the scalability issue for the OpenCL implementation of the linear part of MPDATA on ATI Radeon HD 5870 GPU with AMD Phenom II X4 CPU, and NVIDIA Tesla C1060 GPU with AMD Phenom II X4 CPU.


international conference on large-scale scientific computing | 2013

Towards Efficient Decomposition and Parallelization of MPDATA on Hybrid CPU-GPU Cluster

Roman Wyrzykowski; Lukasz Szustak; Krzysztof Rojek; Adam Tomas

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive definite advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG.


international conference on parallel processing | 2013

Performance Analysis for Stencil-Based 3D MPDATA Algorithm on GPU Architecture

Krzysztof Rojek; Lukasz Szustak; Roman Wyrzykowski

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive defined advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG.


parallel processing and applied mathematics | 2009

Parallel implementation of conjugate gradient method on graphics processors

Marcin Wozniak; Tomasz Olas; Roman Wyrzykowski

Nowadays GPUs become extremely promising multi/manycore architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV ), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.


parallel computing | 1999

Parallel Finite Element Modeling of Solidification Processes

Roman Wyrzykowski; N. Sczygiol; Tomasz Olas; Juri Kanevski

In the paper, parallelization of finite element modeling of solidification is considered. The core of this modeling is solving large sparse linear systems. The Aztec library is used for implementing the model problem on massively parallel computers. Now the complete parallel code is available. The performance results of numerical experiments carried out on the IBM SP2 parallel computer are presented.

Collaboration


Dive into the Roman Wyrzykowski's collaboration.

Top Co-Authors

Avatar

Tomasz Olas

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Konrad Karczewski

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Krzysztof Rojek

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lukasz Szustak

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jerzy Wasniewski

Technical University of Denmark

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lukasz Kuczynski

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Adam Tomas

Częstochowa University of Technology

View shared research outputs
Top Co-Authors

Avatar

Marcin Paprzycki

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jack Dongarra

Oak Ridge National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge