Olaf Schenk
University of Lugano
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olaf Schenk.
Future Generation Computer Systems | 2004
Olaf Schenk; Klaus Gärtner
Supernode partitioning for unsymmetric matrices together with complete block diagonal supernode pivoting and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and unsymmetric prepermutation of rows is used to place large matrix entries on the diagonal. Complete block diagonal supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The level-3 BLAS efficiency is retained and an advanced two-level left-right looking scheduling scheme results in good speedup on SMP machines. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.
international parallel and distributed processing symposium | 2011
Matthias Christen; Olaf Schenk; Helmar Burkhart
Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware micro architectures, meticulous architecture-specific tuning is required to elicit the machines full compute power. We present a code generation and auto-tuning framework \textsc{Patus} for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.
Computational Optimization and Applications | 2007
Olaf Schenk; Andreas Wächter; Michael Hagemann
Abstract Interior-point methods are among the most efficient approaches for solving large-scale nonlinear programming problems. At the core of these methods, highly ill-conditioned symmetric saddle-point problems have to be solved. We present combinatorial methods to preprocess these matrices in order to establish more favorable numerical properties for the subsequent factorization. Our approach is based on symmetric weighted matchings and is used in a sparse direct LDLT factorization method where the pivoting is restricted to static supernode data structures. In addition, we will dynamically expand the supernode data structure in cases where additional fill-in helps to select better numerical pivot elements. This technique can be seen as an alternative to the more traditional threshold pivoting techniques. We demonstrate the competitiveness of this approach within an interior-point method on a large set of test problems from the CUTE and COPS sets, as well as large optimal control problems based on partial differential equations. The largest nonlinear optimization problem solved has more than 12 million variables and 6 million constraints.
Bit Numerical Mathematics | 2000
Olaf Schenk; K. Gärtner; Wolfgang Fichtner
An efficient sparse LU factorization algorithm on popular shared memory multi-processors is presented. Pipelining parallelism is essential to achieve higher parallel efficiency and it is exploited with a left-right looking algorithm. No global barrier is used and a completely asynchronous scheduling scheme is one central point of the implementation. The algorithm has been successfully tested on SUN Enterprise, DEC AlphaServer, SGI Origin 2000 and Cray T90 and J90 parallel computers, delivering up to 2.3 GFlop/s on an eight processor DEC AlphaServer for medium-size semiconductor device simulations and structural engineering problems.
Siam Review | 2008
Olaf Schenk; Matthias Bollhöfer; Rudolf A. Römer
We propose efficient preconditioning algorithms for an eigenvalue problem arising in quantum physics, namely, the computation of a few interior eigenvalues and their associated eigenvectors for large-scale sparse real and symmetric indefinite matrices of the Anderson model of localization. We compare the Lanczos algorithm in the 1987 implementation by Cullum and Willoughby with the shift-and-invert techniques in the implicitly restarted Lanczos method and in the Jacobi-Davidson method. Our preconditioning approaches for the shift-and-invert symmetric indefinite linear system are based on maximum weighted matchings and algebraic multilevel incomplete
international conference on parallel processing | 2013
Andrey Kuzmin; Mathieu Luisier; Olaf Schenk
LDL^T
SIAM Journal on Scientific Computing | 2009
Matthias Bollhöfer; Marcus J. Grote; Olaf Schenk
factorizations. These techniques can be seen as a complement to the alternative idea of using more complete pivoting techniques for the highly ill-conditioned symmetric indefinite Anderson matrices. We demonstrate the effectiveness and the numerical accuracy of these algorithms. Our numerical examples reveal that recent algebraic multilevel preconditioning solvers can accelerate the computation of a large-scale eigenvalue problem corresponding to the Anderson model of localization by several orders of magnitude.
parallel computing | 2002
Olaf Schenk; Klaus Gärtner
The central computation in atomistic, quantum transport simulation consists in solving the Schrodinger equation several thousand times with non-equilibrium Greens function (NEGF) equations. In the NEGF formalism, a numerical linear algebra problem is identified related to the computation of a sparse inverse subset of general sparse unsymmetric matrices. The computational challenge consists in computing all the diagonal entries of the Greens functions, which represent the inverse of the electron Hamiltonian matrix. Parallel upward and downward traversals of the elimination tree are used to perform these computations very efficiently and reduce the overall simulation time for realistic nanoelectronic devices. Extensive large-scale numerical experiments on the CRAY-XE6 Monte Rosa at the Swiss National Supercomputing Center and on the BG/Q at the Argonne Leadership Computing Facility are presented.
international conference on computational science | 2002
Olaf Schenk; Klaus Gärtner
An algebraic multilevel (ML) preconditioner is presented for the Helmholtz equation in heterogeneous media. It is based on a multilevel incomplete
Journal of Parallel and Distributed Computing | 2008
Olaf Schenk; Matthias Christen; Helmar Burkhart
LDL^T