Marcin Krotkiewski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcin Krotkiewski is active.

Explore More

Publication

Featured researches published by Marcin Krotkiewski.

Geochemistry Geophysics Geosystems | 2008

MILAMIN: MATLAB-based finite element method solver for large problems

Marcin Dabrowski; Marcin Krotkiewski; Daniel W. Schmid

The finite element method (FEM) combined with unstructured meshes forms an elegant and versatile approach capable of dealing with the complexities of problems in Earth science. Practical applications often require high-resolution models that necessitate advanced computational strategies. We therefore developed “Million a Minute” (MILAMIN), an efficient MATLAB implementation of FEM that is capable of setting up, solving, and postprocessing two-dimensional problems with one million unknowns in one minute on a modern desktop computer. MILAMIN allows the user to achieve numerical resolutions that are necessary to resolve the heterogeneous nature of geological materials. In this paper we provide the technical knowledge required to develop such models without the need to buy a commercial FEM package, programming compiler-language code, or hiring a computer specialist. It has been our special aim that all the components of MILAMIN perform efficiently, individually and as a package. While some of the components rely on readily available routines, we develop others from scratch and make sure that all of them work together efficiently. One of the main technical focuses of this paper is the optimization of the global matrix computations. The performance bottlenecks of the standard FEM algorithm are analyzed. An alternative approach is developed that sustains high performance for any system size. Applied optimizations eliminate Basic Linear Algebra Subprograms (BLAS) drawbacks when multiplying small matrices, reduce operation count and memory requirements when dealing with symmetric matrices, and increase data transfer efficiency by maximizing cache reuse. Applying loop interchange allows us to use BLAS on large matrices. In order to avoid unnecessary data transfers between RAM and CPU cache we introduce loop blocking. The optimization techniques are useful in many areas as demonstrated with our MILAMIN applications for thermal and incompressible flow (Stokes) problems. We use these to provide performance comparisons to other open source as well as commercial packages and find that MILAMIN is among the best performing solutions, in terms of both speed and memory usage. The corresponding MATLAB source code for the entire MILAMIN, including input generation, FEM solver, and postprocessing, is available from the authors (http://www.milamin.org) and can be downloaded as auxiliary material.

parallel computing | 2010

Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Marcin Krotkiewski; Marcin Dabrowski

We present a massively parallel implementation of symmetric sparse matrix-vector product for modern clusters with scalar multi-core CPUs. Matrices with highly variable structure and density arising from unstructured three-dimensional FEM discretizations of mechanical and diffusion problems are studied. A metric of the effective memory bandwidth is introduced to analyze the impact on performance of a set of simple, well-known optimizations: matrix reordering, manual prefetching, and blocking. A modification to the CRS storage improving the performance on multi-core Opterons is shown. The performance of an entire SMP blade rather than the per-core performance is optimized. Even for the simplest 4 node mechanical element our code utilizes close to 100% of the per-blade available memory bandwidth. We show that reducing the storage requirements for symmetric matrices results in roughly two times speedup. Blocking brings further storage savings and a proportional performance increase. Our results are compared to existing state-of-the-art implementations of SpMV, and to the dense BLAS2 performance. Parallel efficiency on 5400 Opteron cores of the Cray XT4 cluster is around 80-90% for problems with approximately 25^3 mesh nodes per core. For a problem with 820 million degrees of freedom the code runs with a sustained performance of 5.2 TeraFLOPs, over 20% of the theoretical peak.

parallel computing | 2013

Efficient 3D stencil computations using CUDA

Marcin Krotkiewski; Marcin Dabrowski

We present an efficient implementation of 7-point and 27-point stencils on high-end Nvidia GPUs. A new method of reading the data from the global memory to the shared memory of thread blocks is developed. The method avoids conditional statements and requires only two coalesced instructions to load the tile data with the halo (ghost zone). Additional optimizations include storing only one XY tile of data at a time in the shared memory to lower shared memory requirements, common subexpression elimination to reduce the number of instructions, and software prefetching to overlap arithmetic and memory instructions, and enhance latency hiding. The efficiency of our implementation is analyzed using a simple stencil memory footprint model that takes into account the actual halo overhead due to the minimum memory transaction size on the GPUs. Through experiments we demonstrate that in our implementation the memory overhead due to the halos is largely eliminated by good reuse of the halo data in the memory caches, and that our method of reading the data is close to optimal in terms of memory bandwidth usage. Detailed performance analysis for single precision stencil computations, and performance results for single and double precision arithmetic on two Tesla cards are presented. Our stencil implementations are more efficient than any other implementation described in the literature to date. On Tesla C2050 with single and double precision arithmetic our 7-point stencil achieves an average throughput of 12.3 and 6.5Gpts/s, respectively (98 GFLOP/s and 52 GFLOP/s, respectively). The symmetric 27-point stencil sustains a throughput of 10.9 and 5.8 Gpts/s, respectively.

12th European Conference on the Mathematics of Oil Recovery | 2010

On the Stokes-Brinkman Equations for Modeling Flow in Carbonate Reservoirs

I. Ligaarden; Marcin Krotkiewski; K.A. Lie; M. Pal; Daniel W. Schmid

Cavities and fractures can significantly affect the flow paths of carbonate reservoirs and should be accurately accounted for during flow simulation. Herein, our goal is to compute the effective permeability of rock samples based on high-resolution 3D CT-scans containing millions of voxels. Hence, we need a flow model that properly accounts for the effects of Darcy flow in the porous material and Stokes flow in the void volumes on relevant scales. The presence of different length scales and large contrasts in the petrophysical parameters lead to highly ill-conditioned linear systems that make such a flow model very difficult to solve, even on large-scale parallel computers. To identify simplifications that render the problem computationally tractable, we analyze the relative importance of the Stokes and Darcy terms for a wide variety of parameter ranges on an idealized 2D model. We find that a system with a through-going free flow region surrounded by a low permeable matrix can be accurately modeled by ignoring the Darcy matrix and simulating only the Stokes flow. Using this insight, we are able to compute the effective permeability of a specific model from a CT-scan that contains more than eight million voxels.

Physical Review E | 2016

Reynolds-number dependence of the longitudinal dispersion in turbulent pipe flow.

Christopher Hawkins; Luiza Angheluta; Marcin Krotkiewski; Bjørn Jamtveit

In Taylors theory, the longitudinal dispersion in turbulent pipe flows approaches, on long time scales, a diffusive behavior with a constant diffusivity K_{L}, which depends empirically on the Reynolds number Re. We show that the dependence on Re can be determined from the turbulent energy spectrum. By using the intimate connection between the friction factor and the longitudinal dispersion in wall-bounded turbulence, we predict different asymptotic scaling laws of K_{L}(Re) depending on the different turbulent cascades in two-dimensional turbulence. We also explore numerically the K_{L}(Re) dependence in turbulent channel flows with smooth and rough walls using a lattice Boltzmann method.

73rd EAGE Conference and Exhibition - Workshops 2011 | 2011

3D Fold Pattern Formation

Daniel W. Schmid; Marcin Dabrowski; Marcin Krotkiewski

Folds on all scales from millimeters to kilometers can be the result of the mechanical instability that arises when a mechanically stratified system is subjected to layer-parallel compression. While the resulting fold patterns are three dimensional, their geometries are often simplified by assuming that there is no shape variation in the third dimension. This facilitates the analysis and has resulted in a large number of studies that investigate the folding instability for a variety of rheologies: viscous, power-law, and anisotropic. Studies of three dimensional folding have mostly focused on analog models and geometric models. The latter led to the development of fold shape classification tools; in particular fold interference pattern classification (e.g., Grasemann et al. 2004, Odriscoll 1962, Ramsay 1967, Thiessen and Means 1980) and geometrical analysis based on differential geometry geometry (e.g., Lisle and Toimil 2007, Mynatt et al. 2007). The theoretical aspects of the mechanics of folding in three dimensional geological systems are only analyzed by in a few papers; Fletcher (1991, 1995), Ghosh (1970), Kaus and Schmalholz (2006), Muhlhaus (1998), and Schmid et al. (2008). The target of this paper is to study is the evolution of fold patterns that emerge out of randomly perturbed layers for different loading conditions. In order to be able to statistically analyze such systems where many folds interact, a large number of folds is required and consequently large numerical resolutions (in the order of 100’000’000 unknowns). We describe the developed numerical model and analyze the fold patterns using differential geometry. The obtained results indicate that the (Gaussian curvature based) aspect ratio of folds in map view may be used to infer the relative strength of the two principal in-plane loads.

Marine and Petroleum Geology | 2009