Is this you? Create Your Porfile

Ivan Šimeček

Czech Technical University in Prague

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ivan Šimeček is active.

Explore More

Publication

Featured researches published by Ivan Šimeček.

symbolic and numeric algorithms for scientific computing | 2009

Sparse Matrix Computations Using the Quadtree Storage Format

Ivan Šimeček

Computations with sparse matrices are widespread in scientific projects. Used data format affects strongly the performance. Efficient formats for storing sparse matrices are still under development, since the computation using widely-used formats (like XY or CSR) is slow and specialized formats (like SPARSITY or CARB) have a large transformation overhead.In this paper, we represent some improvements to the quadtree storage format. We also compare the performance during the execution of some basic routines from the linear algebra using widely-used formats and the quadtree storage format.

parallel processing and applied mathematics | 2007

Sparse matrix-vector multiplication - final solution?

Ivan Šimeček; Pavel Tvrdík

Algorithms for the sparse matrix-vector multiplication (shortly SpM×V ) are important building blocks in solvers of sparse systems of linear equations. Due to matrix sparsity, the memory access patterns are irregular and the utilization of a cache suffers from low spatial and temporal locality. To reduce this effect, the register blocking formats were designed. This paper introduces a new combined format, for storing sparse matrices that extends possibilities of the variable-sized register blocking format.

international conference on parallel processing | 2003

Analytical Modeling of Optimized Sparse Linear Code

Pavel Tvrdík; Ivan Šimeček

In this paper, we describe source code transformations based on sw-pipelining, loop unrolling, and loop fusion for the sparse matrix-vector multiplication and for the Conjugate Gradient algorithm that enable data prefetching and overlapping of load and FPU arithmetic instructions and improve the temporal cache locality. We develop a probabilistic model for estimation of the numbers of cache misses for 3 types of data caches: direct mapped and s-way set associative with random and with LRU replacement strategies. Using HW cache monitoring tools, we compare the predicted number of cache misses with real numbers on Intel x86 architecture with L1 and L2 caches. The accuracy of our analytical model is around 97%. The errors in estimations are due to minor simplifying assumptions in our model.

symbolic and numeric algorithms for scientific computing | 2006

A New Approach for Accelerating the Sparse Matrix-Vector Multiplication

Pavel Tvrdík; Ivan Šimeček

Sparse matrix-vector multiplication (shortly SpMtimesV) is one of most common subroutines in the numerical linear algebra. The problem is that the memory access patterns during the SpMtimesV are irregular and the utilization of cache can suffer from low spatial or temporal locality. This paper introduces new approach for the acceleration the SpMtimesV. This approach consists of 3 steps. The first step divides the whole matrix into smaller parts (regions) those can fit in the cache. The second step improves locality during the multiplication due to better utilization of distant references. The last step maximizes machine computation performance of the partial multiplication for each region. In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements proved that our approach gives a significant speedup for almost all matrices arising from various technical areas

international conference on parallel processing | 2004

Analytical model for analysis of cache behavior during cholesky factorization and its variants

Ivan Šimeček; Pavel Tvrdík

In this paper, we apply several transformations to Cholesky factorization and describe a new transformation called dynamic loop reversal which can increase temporal and spatial locality. We also describe a probabilistic analytical model of the cache behavior during the standard and recursive Cholesky factorization and use it for studying effects of these transformations. Automatic methods for predicting the cache behavior have been described in the literature, but they are inaccurate in case of recursive calls, since they do not take into account the interactions between subroutines. Our model is more accurate, since it takes most of the interactions, namely on the last level of recursion, into account. We have evaluated the accuracy of the model by measurements on a cache monitor. The comparisons of the numbers of measured cache misses and the numbers of cache misses estimated by the model indicate that the accuracy of the model is within the range on units of percents.

Acta Polytechnica | 2006

Performance Aspects of Sparse Matrix-Vector Multiplication

Ivan Šimeček

Sparse matrix-vector multiplication (shortly SpM×V) is an important building block in algorithms solving sparse systems of linear equations, e.g., FEM. Due to matrix sparsity, the memory access patterns are irregular and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM×V are based on matrix reordering and register blocking [1, 2], sometimes combined with software-pipelining [3]. Due to its overhead, register blocking achieves good speedups only for a large number of executions of SpM×V with the same matrix A.We have investigated the impact of two simple SW transformation techniques (software-pipelining and loop unrolling) on the performance of SpM×V, and have compared it with several implementation modifications aimed at reducing computational and memory complexity and improving the spatial locality. We investigate performance gains of these modifications on four CPU platforms.

federated conference on computer science and information systems | 2016

Block iterators for sparse matrices

Daniel Langr; Ivan Šimeček; T. Dytrych

Finding an optimal block size for a given sparse matrix forms an important problem for storage formats that partition matrices into uniformly-sized blocks. Finding a solution to this problem can take a significant amount of time, which, effectively, may negate the benefits that such a format brings into sparse-matrix computations. A key for an efficient solution is the ability to quickly iterate, for a particular block size, over matrix nonzero blocks. This work proposes an efficient parallel algorithm for this task and evaluate it experimentally on modern multi-core and many-core high performance computing (HPC) architectures.

international conference on parallel processing | 2013

Scalable Parallel Generation of Very Large Sparse Benchmark Matrices

Daniel Langr; Ivan Šimeček; Pavel Tvrdík; T. Dytrych

We present a method and an accompanying algorithm for scalable parallel generation of sparse matrices intended primarily for benchmarking purposes, namely for evaluation of performance and scalability of generic massively parallel algorithms that involve sparse matrices. The proposed method is based on enlargement of small input matrices, which are supposed to be obtained from public sparse matrix collections containing numerous matrices arising in different application domains and thus having different structural and numerical properties. The resulting matrices are distributed among processors of a parallel computer system. The enlargement process is designed so its users may easily control structural and numerical properties of resulting matrices as well as the distribution of their nonzero elements to particular processors.

Soft Computing | 2013

A New Approach for Indexing Powder Diffraction Data Suitable for GPGPU Execution

Ivan Šimeček

Powder diffraction (based typically on X-ray usage) is a well-established method for a complete analysis and structure determination of crystalline materials. One of the key parts of the experimental data processing is the process of indexation - determination of lattice parameters. The lattice parameters are essential information required for phase identification as well as for eventual phase structure solution.

International Journal of Parallel Programming | 2015

Downsampling Algorithms for Large Sparse Matrices

Daniel Langr; Pavel Tvrdík; Ivan Šimeček; T. Dytrych

Mapping of sparse matrices to processors of a parallel system may have a significant impact on the development of sparse-matrix algorithms and, in effect, to their efficiency. We present and empirically compare two downsampling algorithms for sparse matrices. The first algorithm is independent of particular matrix-processors mapping, while the second one is adapted for cases where matrices are partitioned among processors according to contiguous chunks of rows/columns. We show that the price for the versatility of the first algorithm is the collective communication performed by all processors. The second algorithm uses more efficient communication strategy, which stems from the knowledge of mapping of matrices to processors, and effectively outperforms the first algorithm in terms of running time.

Explore More