Bruno Lang
University of Wuppertal
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bruno Lang.
parallel computing | 2011
Thomas Auckenthaler; Volker Blum; Hans-Joachim Bungartz; Thomas Huckle; Rainer Johanni; Lukas Krämer; Bruno Lang; Hermann Lederer; Paul R. Willems
The computation of selected eigenvalues and eigenvectors of a symmetric (Hermitian) matrix is an important subtask in many contexts, for example in electronic structure calculations. If a significant portion of the eigensystem is required then typically direct eigensolvers are used. The central three steps are: reduce the matrix to tridiagonal form, compute the eigenpairs of the tridiagonal matrix, and transform the eigenvectors back. To better utilize memory hierarchies, the reduction may be effected in two stages: full to banded, and banded to tridiagonal. Then the back transformation of the eigenvectors also involves two stages. For large problems, the eigensystem calculations can be the computational bottleneck, in particular with large numbers of processors. In this paper we discuss variants of the tridiagonal-to-banded back transformation, improving the parallel efficiency for large numbers of processors as well as the per-processor utilization. We also modify the divide-and-conquer algorithm for symmetric tridiagonal matrices such that it can compute a subset of the eigenpairs at reduced cost. The effectiveness of our modifications is demonstrated with numerical experiments.
ACM Transactions on Mathematical Software | 2000
Christian H. Bischof; Bruno Lang; Xiaobai Sun
We present a software toolbox for symmetric band reduction via orthogonal transformations, together with a testing and timing program. The toolbox contains drivers and computational routines for the reduction of full symmetric matrices to banded form and the reduction of banded matrices to narrower banded or tridiagonal form, with optional accumulation of the orthogonal transformations, as well as repacking routines for storage rearrangement. The functionality and the calling sequences of the routines are described, with a detailed discussion of the “control” parameters that allow adaptation of the codes to particular machine and matrix characteristics. We also briefly describe the testing and timing program included in the toolbox.
ACM Transactions on Mathematical Software | 2000
Christian H. Bischof; Bruno Lang; Xiaobai Sun
We develop an algorithmic framework for reducing the bandwidth of symmetric matrices via orthogonal similarity transformations. This framework includes the reduction of full matrices to banded or tridiagonal form and the reduction of banded matrices to narrower banded or tridiagonal form, possibly in multiple steps. Our framework leads to algorithms that require fewer floating-point operations than do standard algorithms, if only the eigenvalues are required. In addition, it allows for space-time tradeoffs and enables or increases the use of blocked transformations.
source code analysis and manipulation | 2002
Christian H. Bischof; H. M. Bücker; Bruno Lang; Arno Rasch; Andre Vehreschild
Derivatives of mathematical functions play a key role in various areas of numerical and technical computing. Many of these computations are done in MATLAB, a popular environment for technical computing providing engineers and scientists with capabilities for mathematical computing, analysis, visualization, and algorithmic development. For functions written in the MATLAB language, a novel software tool is proposed to automatically transform a given MATLAB program into another MATLAB program capable of computing not only the original function but also user-specified derivatives of that function. That is, a program transformation known as automatic differentiation is performed to change the semantics of the program in a fashion based on the chain rule of differential calculus. The crucial ingredient of the tool is a combination of source-to-source transformation and operator overloading. The overall design of the tool is described and numerical experiments are reported demonstrating the efficiency of the resulting code for a sample problem.
Journal of Physics: Condensed Matter | 2014
Andreas Marek; Volker Blum; Rainer Johanni; Ville Havu; Bruno Lang; Thomas Auckenthaler; Alexander Heinecke; Hans-Joachim Bungartz; Hermann Lederer
Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intels MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
SIAM Journal on Scientific Computing | 1993
Bruno Lang
An algorithm is presented for reducing symmetric banded matrices to tridiagonal form via Householder transformations. The algorithm is numerically stable and is well suited to parallel execution on distributed memory multiple instruction multiple data (MIMD) computers. Numerical experiments on the iPSC/860 hypercube show that the new method yields nearly full speedup if it is run on multiple processors. In addition, even on a single processor the new method usually will be several times faster than the corresponding EISPACK and LAPACK routines.
parallel computing | 1999
Bruno Lang
We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singular values are sought then the reduction to condensed form may be done in two or more steps to make best use of optimized level-3 BLAS. If eigenvectors and/or singular vectors are required, too, then their accumulation can be speeded up by another blocking technique. The efficiency of the blocked algorithms depends heavily on the values of certain control parameters. We also present a very simple performance model that allows selecting these parameters automatically.
SIAM Journal on Numerical Analysis | 2005
Andreas Frommer; Bruno Lang
We show how interval arithmetic can be used in connection with Borsuks theorem to computationally prove the existence of a solution of a system of nonlinear equations. It turns out that this new test, which can be checked computationally in several different ways, is more general than an existing test based on Mirandas theorem in the sense that it is successful for a larger set of situations. A numerical example is included.
Siam Review | 2007
Martin Mo¨nnigmann; Wolfgang Marquardt; Christian H. Bischof; Thomas Beelitz; Bruno Lang; Paul R. Willems
We propose a novel approach for the parametrically robust design of dynamic systems. The approach can be applied to system models with parameters that are uncertain in the sense that values for these parameters are not known precisely, but only within certain bounds. The novel approach is guaranteed to find an optimal steady state that is stable for each parameter combination within these bounds. Our approach combines the use of a standard solver for constrained optimization problems with the rigorous solution of nonlinear systems. The constraints for the optimization problems are based on the concept of parameter space normal vectors that measure the distance of a tentative optimum to the nearest known critical point, i.e., a point where stability may be lost. Such normal vectors are derived using methods from nonlinear dynamics. After the optimization, the rigorous solver is used to provide a guarantee that no critical points exist in the vicinity of the optimum, or to detect such points. In the latter case, the optimization is resumed, taking the newly found critical points into account. This optimize-and-verify procedure is repeated until the rigorous nonlinear solver can guarantee that the vicinity of the optimum is free from critical points and therefore the optimum is parametrically robust. In contrast to existing design methodologies, our approach can be automated and does not rely on the experience of the designing engineer. A simple model of a fermenter is used to illustrate the concepts and the order of activities arising in a typical design process.
SIAM Journal on Scientific Computing | 1998
Bruno Lang
This paper presents a technique that allows using level 3 BLAS in a number of rotation-based algorithms. In particular, the update of an orthogonal transformation matrix which often involves the vast majority of operations can be done with a matrix--matrix product. As a case study, the technique is applied to the QR and QL algorithms for computing the eigensystem of a symmetric tridiagonal matrix. The modifications do not affect the convergence properties of the algorithms nor do they significantly increase the overall number of operations. Thus, the computations can be sped up by more than 50% on machines with a distinct memory hierarchy, like the Intel i860 or IBM RS/6000, provided the block size is set appropriately. We also present a simple theoretical analysis that allows selecting an almost-optimal block size.