Olaf Schenk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Olaf Schenk is active.

Explore More

Publication

Featured researches published by Olaf Schenk.

Future Generation Computer Systems | 2004

Solving unsymmetric sparse systems of linear equations with PARDISO

Olaf Schenk; Klaus Gärtner

Supernode partitioning for unsymmetric matrices together with complete block diagonal supernode pivoting and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and unsymmetric prepermutation of rows is used to place large matrix entries on the diagonal. Complete block diagonal supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The level-3 BLAS efficiency is retained and an advanced two-level left-right looking scheduling scheme results in good speedup on SMP machines. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.

international parallel and distributed processing symposium | 2011

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Matthias Christen; Olaf Schenk; Helmar Burkhart

Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware micro architectures, meticulous architecture-specific tuning is required to elicit the machines full compute power. We present a code generation and auto-tuning framework \textsc{Patus} for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.

Computational Optimization and Applications | 2007

Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization

Olaf Schenk; Andreas Wächter; Michael Hagemann

Abstract Interior-point methods are among the most efficient approaches for solving large-scale nonlinear programming problems. At the core of these methods, highly ill-conditioned symmetric saddle-point problems have to be solved. We present combinatorial methods to preprocess these matrices in order to establish more favorable numerical properties for the subsequent factorization. Our approach is based on symmetric weighted matchings and is used in a sparse direct LDLT factorization method where the pivoting is restricted to static supernode data structures. In addition, we will dynamically expand the supernode data structure in cases where additional fill-in helps to select better numerical pivot elements. This technique can be seen as an alternative to the more traditional threshold pivoting techniques. We demonstrate the competitiveness of this approach within an interior-point method on a large set of test problems from the CUTE and COPS sets, as well as large optimal control problems based on partial differential equations. The largest nonlinear optimization problem solved has more than 12 million variables and 6 million constraints.

Bit Numerical Mathematics | 2000

Efficient Sparse LU Factorization with Left-Right Looking Strategy on Shared Memory Multiprocessors

Olaf Schenk; K. Gärtner; Wolfgang Fichtner

An efficient sparse LU factorization algorithm on popular shared memory multi-processors is presented. Pipelining parallelism is essential to achieve higher parallel efficiency and it is exploited with a left-right looking algorithm. No global barrier is used and a completely asynchronous scheduling scheme is one central point of the implementation. The algorithm has been successfully tested on SUN Enterprise, DEC AlphaServer, SGI Origin 2000 and Cray T90 and J90 parallel computers, delivering up to 2.3 GFlop/s on an eight processor DEC AlphaServer for medium-size semiconductor device simulations and structural engineering problems.

Siam Review | 2008

On Large-Scale Diagonalization Techniques for the Anderson Model of Localization

Olaf Schenk; Matthias Bollhöfer; Rudolf A. Römer

We propose efficient preconditioning algorithms for an eigenvalue problem arising in quantum physics, namely, the computation of a few interior eigenvalues and their associated eigenvectors for large-scale sparse real and symmetric indefinite matrices of the Anderson model of localization. We compare the Lanczos algorithm in the 1987 implementation by Cullum and Willoughby with the shift-and-invert techniques in the implicitly restarted Lanczos method and in the Jacobi-Davidson method. Our preconditioning approaches for the shift-and-invert symmetric indefinite linear system are based on maximum weighted matchings and algebraic multilevel incomplete

international conference on parallel processing | 2013

Fast methods for computing selected elements of the green's function in massively parallel nanoelectronic device simulations

Andrey Kuzmin; Mathieu Luisier; Olaf Schenk

LDL^T

SIAM Journal on Scientific Computing | 2009

Algebraic Multilevel Preconditioner for the Helmholtz Equation in Heterogeneous Media

Matthias Bollhöfer; Marcus J. Grote; Olaf Schenk

factorizations. These techniques can be seen as a complement to the alternative idea of using more complete pivoting techniques for the highly ill-conditioned symmetric indefinite Anderson matrices. We demonstrate the effectiveness and the numerical accuracy of these algorithms. Our numerical examples reveal that recent algebraic multilevel preconditioning solvers can accelerate the computation of a large-scale eigenvalue problem corresponding to the Anderson model of localization by several orders of magnitude.

parallel computing | 2002

Two-level dynamic scheduling in PARDISO: improved scalability on shared memory multiprocessing systems

Olaf Schenk; Klaus Gärtner

The central computation in atomistic, quantum transport simulation consists in solving the Schrodinger equation several thousand times with non-equilibrium Greens function (NEGF) equations. In the NEGF formalism, a numerical linear algebra problem is identified related to the computation of a sparse inverse subset of general sparse unsymmetric matrices. The computational challenge consists in computing all the diagonal entries of the Greens functions, which represent the inverse of the electron Hamiltonian matrix. Parallel upward and downward traversals of the elimination tree are used to perform these computations very efficiently and reduce the overall simulation time for realistic nanoelectronic devices. Extensive large-scale numerical experiments on the CRAY-XE6 Monte Rosa at the Swiss National Supercomputing Center and on the BG/Q at the Argonne Leadership Computing Facility are presented.

international conference on computational science | 2002