Lukasz Kuczynski
Częstochowa University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lukasz Kuczynski.
Scientific Programming | 2015
Lukasz Szustak; Krzysztof Rojek; Tomasz Olas; Lukasz Kuczynski; Kamil Halbiniak; Pawel Gepner
The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.
Concurrency and Computation: Practice and Experience | 2017
Krzysztof Rojek; Roman Wyrzykowski; Lukasz Kuczynski
In this work, we focus on a systematic adaptation of the stencil‐based multidimensional positive definite advection transport algorithm (MPDATA) to different graphics processing unit (GPU)‐based computing platforms. Another objective of this work is to compare the performance of MPDATA on several platforms, including a multi‐GPU system with two NVIDIA Tesla K80 cards, and single‐card platforms with Tesla K20X, GeForce GTX TITAN, and GeForce GTX 980. The usage of the following optimization methods is proposed to improve the overall performance: (i) reducing the number of operations by the subexpression elimination when implementing 2.5D blocking; (ii) reorganization of boundary conditions for reducing branch instructions; (iii) advanced memory management to increase the coalesced memory access; and (iv) warps rearrangement for optimizing the data access to GPU global memory. The presented methods of the MPDATA adaptation to GPU architectures allow us to efficiently use many graphics processors within a single node by applying peer‐to‐peer data transfers between GPU global memories. We propose an auto‐tuning procedure to compensate architectural differences between the considered platforms. This procedure takes into account algorithm/GPU‐specific parameters. The proposed approach to adaptation of MPDATA to GPU architectures allows us to achieve up to 482.5 Gflop/s for the platform equipped with two NVIDIA K80 GPUs. Copyright
parallel computing | 2010
Roman Wyrzykowski; Lukasz Kuczynski; Marcin Wozniak
Erasure codes can improve the availability of distributed storage in comparison with replication systems. In this paper, we focus on investigating how to map systematically the Reed-Solomon and Cauchy Reed-Solomon erasure codes onto the Cell/B.E. and GPU multicore architecture. A method for the systematic mapping of computation kernels of encoding/decoding algorithms onto the Cell/B.E. architecture is proposed. This method takes into account properties of the architecture on all three levels of its parallel processing hierarchy. The performance results are shown to be very promising. The possibility of using GPUs is studied as well, based on the Cauchy version of Reed-Solomon codes.
parallel computing | 2006
Roman Wyrzykowski; Norbert Meyer; Tomasz Olas; Lukasz Kuczynski; Bogdan Ludwiczak; Cezary Czaplewski; Stanisław Ołdziej
In the first part, we present the concept and implementation of the National Cluster of Linux System (CLUSTERIX) - a truly distributed national computing infrastructure with 12 sites (64-bit Linux PC-clusters) located accross Poland. The second part presents our experience in adaptation of selected scientific applications to the cross-site execution as meta-applications, using the MPICH-G2 environment. The performance results of experiments confirm that CLUSTERIX can be an efficient platform for running meta-applications. However, harnessing its computing power needs to take into account the hierarchical architecture of the infrastructure.
International Journal of High Performance Computing Applications | 2018
Lukasz Szustak; Kamil Halbiniak; Lukasz Kuczynski; Joanna Wróbel; Adam Kulawik
Modern heterogeneous computing platforms have become powerful HPC solutions, which could be applied to a wide range of real-life applications. In particular, the hybrid platforms equipped with Intel Xeon Phi coprocessors offer the advantages of massively parallel computing, while supporting practically the same parallel programming model as conventional homogeneous solutions. However, there is still an open issue as to how scientific applications can efficiently utilize hybrid platforms with Intel MIC coprocessors. In this article, we propose an approach for porting a real-life scientific application to such hybrid platforms, assuming no significant modifications of the application code. It allows us to take advantage of all the computing components, including two CPUs and two coprocessors, for the parallel execution of computational workloads. In this study, we focus on the parallel implementation of a numerical model of the dendritic solidification process in isothermal conditions. We develop a sequence of steps that are necessary for the porting and optimization of the solidification application to hybrid platforms with Intel coprocessors. The main challenges include not only overlapping data movements with computations, but also ensuring adequate utilization of cores/threads and vector units of processors, as well as coprocessors. To reach this aim, we propose an efficient and flexible method for the workload distribution between heterogeneous computing components. For implementing the potential benefits of the proposed approach, we choose a heterogeneous programming model based on a combination of the offload mode for Intel MIC and OpenMP programming standard. The developed approach allows us to execute the whole application up to 9.33× faster than the original parallel version that uses two CPUs. Furthermore, the CPU–MIC hybrid platforms enable achieving the speedup of about 1.9× that of the CPU platform with 24 cores based on the Ivy Bridge architecture, and about 1.5× that of the Haswell-based CPU platform with 36 cores.
international conference on parallel processing | 2013
Roman Wyrzykowski; Marcin Woźniak; Lukasz Kuczynski
Erasure codes such as Reed-Solomon codes can improve the availability of distributed storage in comparison with replication systems. In previous studies we investigated implementation of these codes on multi/many-core architectures, such as Cell/B.E. and GPUs. In particular, it was shown that bandwidth of PCIe bus is a bottleneck for the implementation on GPUs.
parallel processing and applied mathematics | 2007
Roman Wyrzykowski; Lukasz Kuczynski
Nowadays grid applications process large volumes of data. This creates the need for an effective distributed data-management solutions. For the ClusteriX grid project, the CDMS (ClusteriX Data Management System) has been developed. Analysis of user requirements and existing implementations of data management systems in grids have been the foundations for its creation. Special attention has been paid to make the system user-friendly and efficient. In this paper, we propose to use the innovative Cell Broadband Engine to implement a set of measures which are necessary to fulfill the security and availability requirements of grid data management systems. Also, we discuss how this goal can be achieved in the CDMS2, an improved version of the CDMS.
Knowledge and Data Management in GRIDs | 2007
Konrad Karczewski; Lukasz Kuczynski
Nowadays grid applications process large volumes of data. This creates the need for an effective data-management solutions. For the ClusteriX project the CDMS (ClusteriX Data Management System) is being developed. Analysis of user requirements and existing implementations of a Data Management System have been the foundations for its creation. Special attention has been paid to make the system user-friendly and efficient.
Lecture Notes in Computer Science | 2006
Lukasz Kuczynski; Konrad Karczewski; Roman Wyrzykowski
ieee international conference on high performance computing data and analytics | 2014
Roman Wyrzykowski; Marcin Wozniak; Lukasz Kuczynski; Emmanuel Jeannot; Julius Žilinskas