Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christof Vömel is active.

Publication


Featured researches published by Christof Vömel.


SIAM Journal on Scientific Computing | 2008

Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers

James Demmel; Osni Marques; Beresford N. Parlett; Christof Vömel

PERFORMANCE AND ACCURACY OF LAPACK’S SYMMETRIC TRIDIAGONAL EIGENSOLVERS JAMES W. DEMMEL † OSNI A. MARQUES ‡ BERESFORD N. PARLETT † AND CHRISTOF V OMEL ‡ Abstract. We compare four algorithms from the latest LAPACK 3.1 release for computing eigenpairs of a symmetric tridiagonal matrix. These include QR iteration, bisection and inverse iteration (BI), the Divide-and-Conquer method (DC), and the method of Multiple Relatively Robust Representations (MR). Our evaluation considers speed and accuracy when computing all eigenpairs, and additionally subset computations. Using a variety of carefully selected test problems, our study includes a variety of today’s computer architectures. Our conclusions can be summarized as follows. (1) DC and MR are generally much faster than QR and BI on large matrices. (2) MR almost always does the fewest floating point operations, but at a lower MFlop rate than all the other algorithms. (3) The exact performance of MR and DC strongly depends on the matrix at hand. (4) DC and QR are the most accurate algorithms with observed accuracy O( ne). The accuracy of BI and MR is generally O(ne). (5) MR is preferable to BI for subset computations. Key words. LAPACK, symmetric eigenvalue problem, inverse iteration, Divide & Conquer, QR algorithm, MRRR algorithm, accuracy, performance, benchmark. AMS subject classifications. 15A18, 15A23. 1. Introduction. One goal of the latest 3.1 release [25] of LAPACK [1] is to pro- duce the fastest possible symmetric eigensolvers subject to the constraint of delivering small residuals and orthogonal eigenvectors. For an input matrix A that may be dense or banded, one standard approach is the conversion to tridiagonal form T , then the eigenvalues and eigenvectors of T are found, and last the eigenvectors of T transformed to eigenvectors of A. Depending on the situation, all the eigenpairs or just some of them may be de- sired. LAPACK, for some algorithms, allows selection by eigenvalue indices (‘find λ i , λ i+1 , ...λ j , where λ 1 ≤ λ 2 ≤ · · · ≤ λ n are all the eigenvalues in increasing order’, and their eigenvectors) or by an interval (‘find all the eigenvalues in [a, b] and their eigenvectors’). This paper analyzes the performance and accuracy of four algorithms: 1. QR iteration, in LAPACK’s driver STEV (QR for short), 2. Bisection and Inverse Iteration, in STEVX (BI for short), 3. Divide and Conquer, in STEVD (DC for short), 4. Multiple Relatively Robust Representations, in STEVR (MR for short) Section 2 gives a brief description of these algorithms with references. For a representative picture of each algorithm’s capacities, we developed an ex- tensive set of test matrices [7], broken into two classes: (1) ‘practical matrices’ based on reducing matrices from a variety of practical applications to tridiagonal form, and generating some other tridiagonals with similar spectra, and (2) synthetic ‘testing ma- trices’ chosen to have extreme distributions of eigenvalues or other properties designed to exercise one or more of the algorithms, see Section 3.1 for details. The timing and † Mathematics Department and Computer Science Division, University of California, Berkeley, CA 94720, USA. {demmel@cs,parlett@math}.berkeley.edu ‡ Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. {oamarques,cvoemel}@lbl.gov


ACM Transactions on Mathematical Software | 2010

ScaLAPACK's MRRR algorithm

Christof Vömel

The (sequential) algorithm of Multiple Relatively Robust Representations, MRRR, is a more efficient variant of inverse iteration that does not require reorthogonalization. It solves the eigenproblem of an unreduced symmetric tridiagonal matrix T ∈ Rn × n at O(n2) cost. The computed normalized eigenvectors are numerically orthogonal in the sense that the dot product between different vectors is O (n ε), where ε refers to the relative machine precision. This article describes the design of ScaLAPACKs parallel MRRR algorithm. One emphasis is on the critical role of the representation tree in achieving both adequate accuracy and parallel scalability. A second point concerns the favorable properties of this code: subset computation, the use of static memory, and scalability. Unlike ScaLAPACKs Divide & Conquer and QR, MRRR can compute subsets of eigenpairs at reduced cost. And in contrast to inverse iterations which can fail, it is guaranteed to produce a satisfactory answer while maintaining memory scalability. ParEig, the parallel MRRR algorithm for PLAPACK, uses dynamic memory allocation. This is avoided by our code at marginal additional cost. We also use a different representation tree criterion that allows for more accurate computation of the eigenvectors but can make parallelization more difficult.


Journal of Computational Physics | 2008

State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems

Christof Vömel; Stanimire Tomov; Osni Marques; Andrew Canning; Lin-Wang Wang; Jack J. Dongarra

The band edge states determine optical and electronic properties of semiconductor nano-structures which can be computed from an interior eigenproblem. We study the reliability and performance of state-of-the-art iterative eigensolvers on large quantum dots and wires, focusing on variants of preconditioned CG, Lanczos, and Davidson methods. One Davidson variant, the GD+k (Olsen) method, is identified to be as reliable as the commonly used preconditioned CG while consistently being between two and three times faster.


SIAM Journal on Scientific Computing | 2012

Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

Christof Vömel; Stanimire Tomov; Jack J. Dongarra

With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by the LAPACK divide and conquer algorithm on such modern heterogeneous systems. We focus on how to make the best use of the individual strengths of the massively parallel manycore GPUs and multicore CPUs. The resulting algorithm overcomes performance bottlenecks faced by current implementations that are optimized for a homogeneous multicore. On a dual socket quad-core Intel Xeon 2.33 GHz with an NVIDIA GTX 280 GPU, we typically obtain up to about a tenfold improvement in performance for the complete dense problem. The techniques described here thus represent an example of how to develop numerical software to efficiently use heterogeneous architectures. As heterogeneity becomes more common in the architecture design, the significance of and need for this work are expected to grow.


SIAM Journal on Matrix Analysis and Applications | 2005

Task Scheduling in an Asynchronous Distributed Memory Multifrontal Solver

Patrick R. Amestoy; Iain S. Duff; Christof Vömel

We describe the improvements to the task scheduling for MUMPS, an asynchronous distributed memory direct solver for sparse linear systems. In the new approach, we determine, during the analysis of the matrix, candidate processes for the tasks that will be dynamically scheduled during the subsequent factorization. This approach significantly improves the scalability of the solver in terms of execution time and storage. By comparison with the previous version of MUMPS, we demonstrate the efficiency and the scalability of the new algorithm on up to 512 processors. Our test cases include matrices from regular three-dimensional grids and irregular grids from real-life applications.


parallel computing | 2003

Adapting a parallel sparse direct solver to architectures with clusters of SMPs

Patrick R. Amestoy; Iain S. Duff; Stéphane Pralet; Christof Vömel

We consider the direct solution of general sparse linear systems baseds on a multifrontal method. The approach combines partial static scheduling of the task dependency graph during the symbolic factorization and distributed dynamic scheduling during the numerical factorization to balance the work among the processes of a distributed memory computer. We show that to address clusters of Symmetric Multi-Processor (SMP) architectures, and more generally non-uniform memory access multiprocessors, our algorithms for both the static and the dynamic scheduling need to be revisited to take account of the non-uniform cost of communication. The performance analysis on an IBM SP3 with 16 processors per SMP node and up to 128 processors shows that we can significantly reduce both the amount of inter-node communication and the solution time.


ACM Transactions on Mathematical Software | 2008

Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers

Osni Marques; Christof Vömel; James Demmel; Beresford N. Parlett

LAPACK is often mentioned as a positive example of a software library that encapsulates complex, robust, and widely used numerical algorithms for a wide range of applications. At installation time, the user has the option of running a (limited) number of test cases to verify the integrity of the installation process. On the algorithm developers side, however, more exhaustive tests are usually performed to study algorithm behavior on a variety of problem settings and also computer architectures. In this process, difficult test cases need to be found that reflect particular challenges of an application or push algorithms to extreme behavior. These tests are then assembled into a comprehensive collection, therefore making it possible for any new or competing algorithm to be stressed in a similar way. This article describes an infrastructure for exhaustively testing the symmetric tridiagonal eigensolvers implemented in LAPACK. It consists of two parts: a selection of carefully chosen test matrices with particular idiosyncrasies and a portable testing framework that allows for easy testing and data processing. The tester facilitates experiments with algorithmic choices, parameter and threshold studies, and performance comparisons on different architectures.


SIAM Journal on Scientific Computing | 2005

Glued Matrices and the MRRR Algorithm

Inderjit S. Dhillon; Beresford N. Parlett; Christof Vömel

During the last ten years, Dhillon and Parlett devised a new algorithm (multiple relatively robust representations (MRRR)) for computing numerically orthogonal eigenvectors of a symmetric tridiagonal matrix


parallel computing | 2006

Prospectus for the next LAPACK and ScaLAPACK libraries

James Demmel; Jack J. Dongarra; Beresford N. Parlett; William Kahan; Ming Gu; David Bindel; Yozo Hida; Xiaoye S. Li; Osni Marques; E. Jason Riedy; Christof Vömel; Julien Langou; Piotr Luszczek; Jakub Kurzak; Alfredo Buttari; Julie Langou; Stanimire Tomov

T


SIAM Journal on Scientific Computing | 2011

Detecting Localization in an Invariant Subspace

Christof Vömel; Beresford N. Parlett

with

Collaboration


Dive into the Christof Vömel's collaboration.

Top Co-Authors

Avatar

Osni Marques

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Iain S. Duff

Rutherford Appleton Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

James Demmel

University of California

View shared research outputs
Top Co-Authors

Avatar

Lin-Wang Wang

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Julien Langou

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge