Tomáš Brzobohatý
Technical University of Ostrava
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomáš Brzobohatý.
Journal of Computational and Applied Mathematics | 2010
Zdenek Dostál; Tomáš Kozubek; Petr Horyl; Tomáš Brzobohatý; Alexandros Markopoulos
A Total FETI (TFETI) based domain decomposition algorithm with preconditioning by a natural coarse grid of rigid body motions is adapted to the solution of two-dimensional multibody contact problems of elasticity with the Coulomb friction and proved to be scalable for the Tresca friction. The algorithm finds an approximate solution at the cost asymptotically proportional to the number of variables provided the ratio of the decomposition parameter and the discretization parameter is bounded. The analysis is based on the classical results by Farhat, Mandel, and Roux on scalability of FETI with a natural coarse grid for linear problems and on our development of optimal quadratic programming algorithms for bound and equality constrained problems. The algorithm preserves parallel scalability of the classical FETI method. Both theoretical results and numerical experiments indicate a high efficiency of our algorithm. In addition, its performance is illustrated on analysis of the yielding clamp connection with the Coulomb friction.
Proceedings of the Platform for Advanced Scientific Computing Conference on | 2016
Lubomír Říha; Tomáš Brzobohatý; Alexandros Markopoulos; Ondřej Meca; Tomáš Kozubek
This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods. The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper. The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
parallel computing | 2016
Lubomír Říha; Tomáš Brzobohatý; Alexandros Markopoulos; Marta Jarošová; Tomáš Kozubek; David Horák; Václav Hapla
Implementation, performance, and scalability results of communication layer for Total FETI and Hybrid Total FETI solver.In HTFETI several neighboring subdomains aggregated into clusters. This reduces the size of coarse problem and improves scalability.Optimization of nearest neighbor communication - global gluing matrix.Implementation of communication hiding and avoiding techniques inside the communication layerBenchmarks - elastic 3D cube up to 1.6 billion DOF and realistic car engine benchmark.Large test executed on Total FETI to see the real potential of communication layer on smaller clusters. This paper describes the implementation, performance, and scalability of our communication layer developed for Total FETI (TFETI) and Hybrid Total FETI (HTFETI) solvers. HTFETI is based on our variant of the Finite Element Tearing and Interconnecting (FETI) type domain decomposition method. In this approach a small number of neighboring subdomains is aggregated into clusters, which results in a smaller coarse problem. To solve the original problem TFETI method is applied twice: to the clusters and then to the subdomains in each cluster.The current implementation of the solver is focused on the performance optimization of the main CG iteration loop, including: implementation of communication hiding and avoiding techniques for global communications; optimization of the nearest neighbor communication - multiplication with a global gluing matrix; and optimization of the parallel CG algorithm to iterate over local Lagrange multipliers only.The performance is demonstrated on a linear elasticity 3D cube and real world benchmarks.
Numerical Linear Algebra With Applications | 2015
Zdenek Dostál; Tomáš Kozubek; Oldrich Vlach; Tomáš Brzobohatý
Summary A cheap symmetric stiffness-based preconditioning of the Hessian of the dual problem arising from the application of the finite element tearing and interconnecting domain decomposition to the solution of variational inequalities with varying coefficients is proposed. The preconditioning preserves the structure of the inequality constraints and affects both the linear and nonlinear steps, so that it can improve the rate of convergence of the algorithms that exploit the conjugate gradient steps or the gradient projection steps. The bounds on the regular condition number of the Hessian of the preconditioned problem, which are independent of the coefficients, are given. The related stiffness scaling is also considered and analysed. The improvement is demonstrated by numerical experiments including the solution of a contact problem with variationally consistent discretization of the non-penetration conditions. The results are relevant also for linear problems. Copyright
Advances in Engineering Software | 2017
Lubomír źíha; Tomáš Brzobohatý; Alexandros Markopoulos
Hybrid parallelization of the Finite Element Tearing and Interconnecting method.Performance comparison of the hybrid parallelization to MPI-only parallelization.TFETI implementation for better utilization of the multi-core computer cluster. This paper describes our new hybrid parallelization of the Finite Element Tearing and Interconnecting (FETI) method for the multi-socket and multi-core computer cluster. This is an essential step in our development of the Hybrid FETI solver were small number of neighboring subdomains is aggregated into clusters and each cluster is processed by a single compute node.In our previous work we have implemented FETI solver using MPI parallelization into our ESPRESO solver. The proposed hybrid implementation provides better utilization of resources of modern HPC machines using advanced shared memory runtime systems such as Cilk++ runtime. Cilk++ is an alternative to OpenMP which is used by ESPRESO for shared memory parallelization.We have compared the performance of the hybrid parallelization to MPI-only parallelization. The results show that we have reduced both solver runtime and memory utilization. This allows a solver to use a larger number of smaller sub-domains and in order to solve larger problems using a limited number of compute nodes. This feature is essential for users with smaller computer clusters.In addition, we have evaluated this approach with large-scale benchmarks of size up to 1.3 billion of unknowns to show that the hybrid parallelization also reduces runtime of the FETI solver for these types of problems.
ieee international conference on high performance computing data and analytics | 2015
Lubomír Říha; Tomáš Brzobohatý; Alexandros Markopoulos; Tomáš Kozubek; Ondřej Meca; Olaf Schenk; Wim Vanroose
This paper presents a new approach developed for acceleration of FETI solvers by Graphic Processing Units (GPU) using the Schur complement (SC) technique. By using the SCs FETI solvers can avoid working with sparse Cholesky decomposition of the stiffness matrices. Instead a dense structure in form of SC is computed and used by conjugate gradient (CG) solver. In every iteration of CG solver a forward and backward substitution which are sequential are replaced by highly parallel General Matrix Vector Multiplication (GEMV) routine. This results in 4.1 times speedup when the Tesla K20X GPU accelerator is used and its performance is compared to a single 16-core AMD Opteron 6274 (Interlagos) CPU.
Computers & Mathematics With Applications | 2014
Zdeněk Dostál; Tomáš Brzobohatý; David Horák; Tomáš Kozubek; Petr Vodstrčil
New convergence results for a variant of the inexact augmented Lagrangian algorithm SMALBE [Z. Dostal, An optimal algorithm for bound and equality constrained quadratic programming problems with bounded spectrum, Computing 78 (2006) 311-328] for the solution of strictly convex bound and equality constrained quadratic programming problems are presented. The algorithm SMALBE-M presented here uses a fixed regularization parameter and controls the precision of the solution of auxiliary bound constrained problems by a multiple of the norm of violation of the equality constraints and a constant which is updated in order to enforce the increase of Lagrangian function. A nice feature of SMALBE-M is its capability to find an approximate solution of important classes of problems in a number of iterations that is independent of the conditioning of the equality constraints. Here we prove the R-linear rate of convergence of the outer loop of SMALBE-M for any positive regularization parameter after the strong active constraints of the solution are identified. The theoretical results are illustrated by solving two benchmarks, including the contact problem of elasticity discretized by two million of nodal variables. The numerical experiments indicate that the inexact solution of auxiliary problems in the inner loop results in a very small increase of the number of outer iterations as compared with the exact algorithm. The results do not assume independent equality constraints and remain valid when the solution is dual degenerate.
ieee international conference on high performance computing data and analytics | 2017
Ondřej Meca; Lubomír Říha; Alexandros Markopoulos; Tomáš Brzobohatý; Tomáš Kozubek
ESPRESO is a FEM package that includes a Hybrid Total FETI (HTFETI) linear solver targeted at solving large scale engineering problems. The scalability of the solver was tested on several of the world’s largest supercomputers. To provide our scalable implementation of HTFETI algorithms to all potential users, a simple C API was developed and is presented. The paper describes API methods, compilation and linking process.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014) | 2015
Lubomír Říha; Tomáš Brzobohatý; Alexandros Markopoulos; Marta Jarošová; Tomáš Kozubek
We describe the implementation and the performance and scalability results of a hybrid FETI (Finite Element Tearing and Interconnecting) solver based on our variant of the FETI type domain decomposition method called Total FETI. In our approach a small number of neighboring subdomains is aggregated into clusters, which results into a smaller coarse problem. To solve the original problem Total FETI method is applied twice: to the clusters and then to the subdomains in each cluster.Current implementation of the solver is focused on the performance optimization of the main CG iteration loop, including: implementation of communication hiding and avoiding techniques for global communications; optimization of the nearest neighbor communication - multiplication with global gluing matrix; and optimization of the parallel CG algorithm to iterate over local Lagrange multipliers only.The performance is demonstrated on a linear elasticity synthetic 3D cube and real world benchmarks.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014) | 2015
Tomáš Brzobohatý; Lubomír Říha; Tomas Karasek; Tomáš Kozubek
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.