Tomasz Olas
Częstochowa University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomasz Olas.
Scientific Programming | 2015
Lukasz Szustak; Krzysztof Rojek; Tomasz Olas; Lukasz Kuczynski; Kamil Halbiniak; Pawel Gepner
The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.
international conference on parallel processing | 2001
Tomasz Olas; Konrad Karczewski; Adam Tomas; Roman Wyrzykowski
ParallelNuscaS is an object-oriented package for parallel finite elemt modeling, developed at the Technical University of Czestochowa. This paper is devoted to the investigation of the package performance on the ACCORD cluster, which this year was built in the Institute of Mathematics and Computer Science of this University. At present, ACCORD contains 18 Pentium III 750 MHz processors, or 9 SMP nodes, connected both by the fast MYRINET networkand standard Fast Ethernet, as well as 8 SMP nodes with 16 AMD Athlon MP 1.2 GHZ processors. We discuss the implementation and performance of parallel FEM computations not only for the message-passing model of parallel programming, but also for the hybrid model, which is a mixture of multithreading inside SMP nodes and message passing between them.
parallel processing and applied mathematics | 2009
Marcin Wozniak; Tomasz Olas; Roman Wyrzykowski
Nowadays GPUs become extremely promising multi/manycore architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV ), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.
parallel computing | 1999
Roman Wyrzykowski; N. Sczygiol; Tomasz Olas; Juri Kanevski
In the paper, parallelization of finite element modeling of solidification is considered. The core of this modeling is solving large sparse linear systems. The Aztec library is used for implementing the model problem on massively parallel computers. Now the complete parallel code is available. The performance results of numerical experiments carried out on the IBM SP2 parallel computer are presented.
parallel processing and applied mathematics | 2009
Tomasz Olas; Robert Leśniak; Roman Wyrzykowski; Pawel Gepner
Numerical modeling of 3D thermomechanical problems is a complex and time-consuming issue. Adaptive techniques are powerful tools to perform efficiently such modeling using the FEM analysis. During the adaptation computational workloads change unpredictably at the runtime, therefore dynamic load balancing is required. This paper presents new developments in the parallel FEM package NuscaS; they allow for extending its functionality and increasing performance. In particular, by including dynamic load balancing capabilities, this package allows us to solve efficiently adaptive FEM problems with 3D unstructured meshes on distributed-memory parallel computers such as PC-clusters. For solving sparse systems of equations, NuscaS uses the message-passing paradigm to implement the PCG iterative method with geometric multigrid as a preconditioner. The implementation of load balancing is based on the proposed performance model.
parallel computing | 2006
Roman Wyrzykowski; Norbert Meyer; Tomasz Olas; Lukasz Kuczynski; Bogdan Ludwiczak; Cezary Czaplewski; Stanisław Ołdziej
In the first part, we present the concept and implementation of the National Cluster of Linux System (CLUSTERIX) - a truly distributed national computing infrastructure with 12 sites (64-bit Linux PC-clusters) located accross Poland. The second part presents our experience in adaptation of selected scientific applications to the cross-site execution as meta-applications, using the MPICH-G2 environment. The performance results of experiments confirm that CLUSTERIX can be an efficient platform for running meta-applications. However, harnessing its computing power needs to take into account the hierarchical architecture of the infrastructure.
international conference on parallel processing | 2003
Tomasz Olas; Roman Wyrzykowski; Adam Tomas; Konrad Karczewski
ParallelNuscaS is an object-oriented package for the parallel finite element modeling, developed at Czestochowa University of Technology. This paper is devoted to modeling the performance of the package on PC-clusters. Such modeling allows for analyzing and predicting the performance of this complex scientific application on different computing platforms. Because both uniprocessor and SMP nodes are considered, we investigate not only the message-passing model of parallel programming, but also the hybrid model, which is a mixture of multithreading inside SMP nodes and message passing between them.
parallel computing in electrical engineering | 2002
Tomasz Olas; Lukasz Lacinski; Konrad Karczewski; Adam Tomas; Roman Wyrzykowski
ParallelNuscaS is an object-oriented package for FEM modeling on clusters, developed at the Technical University of Czestochowa. This paper is devoted to the investigation of the influence of communication mechanisms used in the ACCORD cluster on performance of FEM computations. Last year this cluster was built in the Institute of Mathematics and Computer Science of this University At present, ACCORD contains 18 Pentium III 750 MHz processors, or 9 SMP nodes, connected both by the fast MYRINET network and standard Fast Ethernet, as well as 8 SMP nodes with 16 AMD Athlon MP 1.2 GHZ processors, connected only by the Fast Ethernet.
international conference on artificial intelligence and soft computing | 2015
Tomasz Olas; Robert Nowicki; Roman Wyrzykowski; Adam Krzyzak
In the paper, the parallel realization of the Boltzmann Restricted Machine (RBM) is proposed. The implementation intends to use multicore architectures of modern CPUs and Intel Xeon Phi coprocessor. The learning procedure is based on the matrix description of RBM, where the learning samples are grouped into packages, and represented as matrices. The influence of the package size on convergence of learning, as well as on performance of computation, are studied for various number of threads, using conventional CPU and Intel Phi architecures. Our research confirms a potential usefulness of MIC parallel architecture for implementation of RBM and similar algorithms.
international conference on large scale scientific computing | 2011
Tomasz Olas; Roman Wyrzykowski
Our parallel FEM package NuscaS allows us to solve adaptive FEM problems with 3D unstructured meshes on distributed-memory parallel computers such as PC-clusters. For solving sparse systems of equations, NuscaS uses the message-passing paradigm to implement the PCG method with geometric multigrid as a preconditioner. For the mesh adaptation, the 8-tetrahedra longest-edge partition is used as a refinement mesh algorithm. In this paper, a new method for parallelizing this algorithm is presented. It was developed for the message-passing model, and implemented using the MPI standard. The new solution is based on a decentralized approach. So it is more scalable in comparison to previous implementations, where a centralized synchronizing node (coordinator processor or gateway node) is required. Both the sequential and parallel versions of the mesh adaptation are carefully optimized to maximize performance. One of key solutions is the usage of suitable data structures, such as hash tables. They allow for high performance while preserving modest memory requirements.