Michal Kierzynka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michal Kierzynka is active.

Explore More

Publication

Featured researches published by Michal Kierzynka.

BMC Bioinformatics | 2011

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

Jacek Blazewicz; Wojciech Frohmberg; Michal Kierzynka; Erwin Pesch; Paweł T. Wojciechowski

BackgroundPairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment.ResultsIn this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable.ConclusionsThe article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.

Journal of Parallel and Distributed Computing | 2013

G-MSA - A GPU-based, fast and accurate algorithm for multiple sequence alignment

Jacek Blazewicz; Wojciech Frohmberg; Michal Kierzynka; Paweł T. Wojciechowski

Multiple sequence alignment (MSA) methods are essential in biological analysis. Several MSA algorithms have been proposed in recent years. The quality of the results produced by those methods is reasonable, but there is no single method that consistently outperforms others. Additionally, the increasing number of sequences in the biological databases is perceived as one of the upcoming challenges for alignment methods in the nearest future. The lack of performance concerns not only the alignment problems, but may be observed in many areas of biologically related research. To overcome this problem in the field of pairwise alignment, several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of GPU platform. Therefore, our main idea was to design and implement an MSA method which can take advantage of modern graphics cards. Our solution is based on T-Coffee-well known for its high accuracy MSA algorithm. Its computational time, however, is often unacceptable. Performed tests show that our method, named G-MSA, is highly efficient achieving up to 193-fold speedup on a single GPU while the quality of its results remains very good. Due to effective memory usage the method can perform alignment for huge sets of sequences that previously could only be aligned on computer clusters. Moreover, multiple GPUs support with load balancing makes the application very scalable.

Scientific Programming | 2013

From physics model to results: An optimizing framework for cross-architecture code generation

Marek Blazewicz; Ian Hinder; David M. Koppelman; Steven R. Brandt; Milosz Ciznicki; Michal Kierzynka; Frank Löffler; Jian Tao

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemoras capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.

Scientific Programming | 2011

CaKernel --A parallel application programming framework for heterogenous computing architectures

Marek Blazewicz; Steven R. Brandt; Michal Kierzynka; Krzysztof Kurowski; Bogdan Ludwiczak; Jian Tao; Jan Węglarz

With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA --CaCUDA that now supports OpenCL. We present CaKernel --a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units GPUs and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

parallel processing and applied mathematics | 2011

Efficient isosurface extraction using marching tetrahedra and histogram pyramids on multiple GPUs

Milosz Ciznicki; Michal Kierzynka; Krzysztof Kurowski; Bogdan Ludwiczak; Krystyna Napierala; Jarosław Palczyński

The algorithms for isosurface extraction have become crucial in petroleum industry, medicine and many other fields over the last years. Nowadays market demands engender a need for methods that not only construct accurate 3D models but also deal with the problem efficiently. Recently, a few highly optimized approaches taking advantage of modern graphics processing units (GPUs) have been published in the literature. However, despite their satisfactory speed, they all may be unsuitable in real-life applications due to limits on maximum domain size they can process. In this paper we present a novel approach to surface extraction by combining the algorithm of Marching Tetrahedra with the idea of Histogram Pyramids. Our GPU-based application can process CT and MRI scan data. Thanks to domain decomposition, the only limiting factor for the size of input instance is the amount of memory needed to store the resulting model. The solution is also immensely fast achieving up to 107-fold speedup comparing to a serial CPU code. Moreover, multiple GPUs support makes it very scalable. Provided tool enables the user to visualize generated model and to modify it in an interactive manner.

embedded and ubiquitous computing | 2014

Parallel Architecture Benchmarking: From Embedded Computing to HPC, a FiPS Project Perspective

Yves Lhuillier; Jean Marc Philippe; Alexandre Guerre; Michal Kierzynka; Ariel Oleksia

With the growing numbers of both parallel architectures and related programming models, the benchmarking tasks become very tricky since parallel programming requires architecture-dependent compilers and languages as well as high programming expertise. More than just comparing architectures with synthetic benchmarks, benchmarking is also more and more used to design specialized systems composed of heterogeneous computing resources to optimize the performance or performance/watt ratio (e.g. embedded systems designers build System-on-Chip (SoC) out of dedicated and well-chosen components). In the High-Performance-Computing (HPC) domain, systems are designed with symmetric and scalable computing nodes built to deliver the highest performance on a wide variety of applications. However, HPC is now facing cost and power consumption issues which motivate the design of heterogeneous systems. This is one of the rationales of the European FiPS project, which proposes to develop hardware architecture and software methodology easing the design of such systems. Thus, having a fair comparison between architectures while considering an application is of growing importance. Unfortunately, porting it on all available architectures using the related programming models is impossible. To tackle this challenge, we introduced a novel methodology to evaluate and to compare parallel architectures in order to ease the work of the programmer. Based on the usage of micro benchmarks, code profiling and characterization tools, this methodology introduces a semi-automatic prediction of sequential applications performances on a set of parallel architectures. In addition, performance estimation is correlated with the cost of other criteria such as power or portability effort. Introduced for targeting vision-based embedded applications, our methodology is currently being extended to target more complex applications from HPC world. This paper extends our work with new experiments and early results on a real HPC application of DNA sequencing.

Journal of Computational Science | 2014

Benchmarking JPEG 2000 implementations on modern CPU and GPU architectures

Milosz Ciznicki; Michal Kierzynka; Krzysztof Kurowski; Pawel Gepner

Abstract The use of graphics hardware for non-graphics applications has become popular among many scientific programmers and researchers as we have observed a higher rate of theoretical performance increase than the CPUs in recent years. However, performance gains may be easily lost in the context of a specific parallel application due to various both hardware and software factors. JPEG 2000 is a complex standard for data compression and coding, that provides many advanced capabilities demanded by more specialized applications. There are several JPEG 2000 implementations that utilize emerging parallel architectures with the built-in support for parallelism at different levels. Unfortunately, many available implementations are only optimized for a certain parallel architecture or they do not take advantage of recent capabilities provided by modern hardware and low level APIs. Thus, the main aim of this paper is to present a comprehensive real performance analysis of JPEG 2000. It consists of a chain of data and compute intensive tasks that can be treated as good examples of software benchmarks for modern parallel hardware architectures. In this paper we compare achieved performance results of various JPEG 2000 implementations executed on selected architectures for different data sets to identify possible bottlenecks. We discuss also best practices and advices for parallel software development to help users to evaluate in advance and then select appropriate solutions to accelerate the execution of their applications.

international conference on conceptual structures | 2012

Benchmarking Data and Compute Intensive Applications on Modern CPU and GPU Architectures

Milosz Ciznicki; Michal Kierzynka; Krzysztof Kurowski; Pawel Gepner

Abstract The use of graphics hardware for non-graphics applications has become popular among many scientific programmers and researchers as we have observed a higher rate of theoretical performance increase than the CPUs in recent years. However, performance gains may be easily lost in the context of a specific parallel application due to various both hardware and software factors. Consequently, software benchmarks and performance testing are still the best techniques to compare the effciency of emerging parallel architectures with the built-in support for parallelism at different levels. Unfortunately, many available benchmarks are either relatively simple application kernels, they have been optimized only for a certain parallel architecture or they do not take advantage of recent capabilities provided by modern hardware and low level APIs. Thus, the main aim of this paper is to present a comprehensive real performance analysis of selected applications following the complex standard for data compression and coding -JPEG 2000. It consists of a chain of data and compute intensive tasks that can be treated as good examples of software benchmarks for modern parallel hardware architectures. In this paper we compare achieved performance results of our standard based benchmarks executed on selected architectures for different data sets to identify possible bottlenecks. We discuss also best practices and advices for parallel software development to help users to evaluate in advance and then select appropriate solutions to accelerate the execution of their applications.

Foundations of Computing and Decision Sciences | 2013

Dna Sequence Assembly Involving an Acyclic Graph Model

Jacek Blazewicz; Wojciech Frohmberg; Piotr Gawron; Marta Kasprzak; Michal Kierzynka; Aleksandra Swiercz; Paweł T. Wojciechowski

Abstract The problem of DNA sequence assembly is well known for its high complexity. Experimental errors of di erent kinds present in data and huge sizes of the problem instances make this problem very hard to solve. In order to deal with such data, advanced efficient heuristics must be constructed. Here, we propose a new approach to the sequence assembly problem, modeled as the problem of searching for paths in an acyclic digraph. Since the graph representing an assembly instance is not acyclic in general, it is heuristically transformed into the acyclic form. This approach reduces the time of computations significantly and allows to maintain high quality of produced solutions.

Future Generation Computer Systems | 2017

Energy efficiency of sequence alignment tools—Software and hardware perspectives

Michal Kierzynka; Lars Kosmann; Micha vor dem Berge; Stefan Krupop; Jens Hagemeyer; René Griessl; Meysam Peykanu; Ariel Oleksiak

Abstract Pairwise sequence alignment is ubiquitous in modern bioinformatics. It may be performed either explicitly, e.g. to find the most similar sequences in a database, or implicitly as a hidden building block of more complex methods, e.g. for reads mapping. The alignment algorithms have been widely investigated over the last few years, mainly with respect to their speed. However, no attention was given to their energy efficiency, which is becoming critical in high performance computing and cloud environment. We compare the energy efficiency of the most established software tools performing exact pairwise sequence alignment on various computational architectures: CPU, GPU and Intel Xeon Phi. The results show that the energy consumption may differ as much as nearly 5 times. Substantial differences are reported even for different implementations running on the same hardware. Moreover, we present an FPGA implementation of one of the tested tools—G-DNA, and show how it outperforms all the others on the energy efficiency front. Finally, some details regarding the special RECS ® | Box servers used in our study are outlined. This hardware is designed and manufactured within the FiPS project by the Bielefeld University and christmann informationstechnik + medien with a special purpose to deliver highly heterogeneous computational environment supporting energy efficiency and green ICT.

Explore More