Bruno Lang

Parallel Computing, 2011

Article history: Available online xxxx Keywords: Electronic structure calculations Eigenvalue and... more

1 Scientific Highlight of the Month

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science

Journal of Physics: Condensed Matter, 2014

Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic struc... more Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N 3 ) with the size of the investigated problem, N , and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the ELPA library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the ScaLAPACK library but replacing all actual parallel solution steps with subroutines of its own. The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.

A Framework for Symmetric Band Reduction

by Bernd Bischof and Bruno Lang

ACM Transactions on Mathematical Software, 1999

this paper, we generalize the ideas behind the RS-algorithms and the MHLalgorithm.We develop a ba... more this paper, we generalize the ideas behind the RS-algorithms and the MHLalgorithm.We develop a band reduction algorithm that eliminates d subdiagonalsof a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin tothe MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, theband reduction algorithm is repeatedly used until the reduced matrix is tridiagonal.If d = b 1,

Efficient parallel reduction to bidiagonal form

Parallel Computing, 1999

Most methods for calculating the SVD (singular value decomposition) require to ®rst bidiagonalize... more Most methods for calculating the SVD (singular value decomposition) require to ®rst bidiagonalize the matrix. The blocked reduction of a general, dense matrix to bidiagonal form, as implemented in ScaLAPACK, does about one half of the operations with BLAS3. By subdividing the reduction into two stages dense 3 banded and banded 3 bidiagonal with cubic and quadratic arithmetic costs, respectively, we are able to carry out a much higher portion of the calculations in matrix±matrix multiplications. Thus, higher performance can be expected. This paper presents and compares three parallel techniques for reducing a full matrix to banded form. (The second reduction stage is described in another paper [B. Lang, Parallel Comput. 22 (1996) 1±18]). Numerical experiments on the Intel Paragon and IBM SP/1 distributed memory parallel computers demonstrate that the two-stage reduction approach can be signi®cantly superior if only the singular values are required. Ó . This work was partially funded by Deutsche Forschungsgemeinschaft, Gesch aftszeichen Fr 755/6-1 and Fr 755/6-2. 0167-8191/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 -8 1 9 1 ( 9 9 ) 0 0 0 4 1 -1 parallel computers [1,2,4] and to novel accuracy issues, do most of the work on a full or triangular matrix.

Automated Gradient Calculation

Computation of Sensitivity Information for Aircraft Design by Automatic Differentiation

ABSTRACT

An interactive environment for supporting the transition from simulation to optimization

Lecture Notes in Computer Science, 2002

... Institute for Scientific Computing Aachen University of Technology, D–52056 Aachen, Germany {... more

Scientific Programming, 2003

Numerical simulation is a powerful tool in science and engineering, and it is also used for optim... more Numerical simulation is a powerful tool in science and engineering, and it is also used for optimizing the design of products and experiments rather than only for reproducing the behavior of scientific and engineering systems. In order to reduce the number of simulation runs, the traditional "trial and error" approach for finding near-to-optimum design parameters is more and more replaced with efficient numerical optimization algorithms. Done by hand, the coupling of simulation and optimization software is tedious and error-prone. In this note we introduce a software environment called EFCOSS (Environment For Combining Optimization and Simulation Software) that facilitates and speeds up this task by doing much of the required work automatically. Our framework includes support for automatic differentiation providing the derivatives required by many optimization algorithms. We describe the process of integrating the widely used computational fluid dynamics package FLUENT and a MINPACK-1 least squares optimizer into EFCOSS and follow a sample session solving a data assimilation problem.

Automatic parallelism in differentiation of Fourier transforms

Efficient implementation of serial and parallel support vector machine training with a multi-parameter kernel for large-scale data mining

Proceedings of the 2003 ACM symposium on Applied computing - SAC '03, 2003

Abstract For functions given in the form of a computer program, automatic differentiation is an e... more

This work deals with aspects of support vector learning for large-scale data mining tasks. Based ... more This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decompo-sition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.

On the Use of a Differentiated Finite Element Package for Sensitivity Analysis

On The Optimal Working Set Size in Serial and Parallel Support Vector Machine Learning With The Decomposition Algorithm

Lecture Notes in Computer Science, 2001

Derivatives are ubiquitous in various areas of computational science including sensitivity analys... more Derivatives are ubiquitous in various areas of computational science including sensitivity analysis and parameter optimization of computer models. Among the various methods for obtaining derivatives, automatic differentiation (AD) combines freedom from approximation errors, high performance, and the ability to handle arbitrarily complex codes arising from large-scale scientific investigations. In this note, we show how AD technology can aid in the

Australasian Conference on Knowledge Discovery and Data Mining, 2006

The support vector machine (SVM) is a well- established and accurate supervised learning method f... more The support vector machine (SVM) is a well- established and accurate supervised learning method for the classification of data in various application fields. The statistical learning task - the so-called training - can be formulated as a quadratic optimiza- tion problem. During the last years the decompo- sition algorithm for solving this optimization prob- lem became the most frequently used

Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs

Explicit loop scheduling in OpenMP for parallel automatic differentiation

Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation, 2002

... by x = A\b. Fur-thermore, all operators exist in a “pointwise” version; for instance the poin... more

On Preconditioners for the Borsuk Existence Test

Derivatives of almost arbitrary functions can be evaluated efficiently by automatic differentiati... more Derivatives of almost arbitrary functions can be evaluated efficiently by automatic differentiation whenever the functions are given in the form of computer programs in a high-level programming language such as Fortran, C, or C++. In contrast to numerical differentiation, where derivatives are only approximated, automatic differentiation generates derivatives that are accurate up to machine precision. Sophisticated software tools implementing the technology of automatic differentiation are capable of automatically generating code for the product of the Jacobian matrix and a so-called seed matrix. It is shown how these tools can benefit from concepts of shared memory programming to parallelize, in a completely mechanical fashion, the gradient operations associated with each statement of the given code. The feasibility of our approach is demonstrated by numerical experiments. They were performed with a code that was generated automatically by the Adifor system and augmented with OpenM...

PAMM, 2004

Existence or fixed point theorems, combined with interval analytic methods, provide a means to co... more Existence or fixed point theorems, combined with interval analytic methods, provide a means to computationally prove the existence of a zero of a nonlinear system in a given interval vector. One such test is based on Borsuk's existence theorem. We discuss preconditioning techniques that are aimed at improving the effectiveness of this test. *

An O(n2) algorithm for the bidiagonal SVD

Preface

by Bruno Lang and Götz Alefeld

Reliable Computing, 1996

ABSTRACT The relatively robust representations (RRR) algorithm computes the eigendecomposition of... more ABSTRACT The relatively robust representations (RRR) algorithm computes the eigendecomposition of a symmetric tridiagonal matrix T with an complexity. This article discusses how this method can be extended to the bidiagonal SVD B=UΣVT. It turns out that using the RRR algorithm as a black box to compute BTB=VΣ2VT and BBT=UΣ2UT separately may give poor results for ∥BV−UΣ∥. The use of the standard Jordan–Wielandt representation can fail as well if clusters of tiny singular values are present. A solution is to work on BTB and to keep factorizations of BBT implicitly. We introduce a set of coupling transformations which allow us to replace the representation by a more stable representation , where is a diagonal matrix. Numerical results of our implementation are compared with the LAPACK routines DSTEGR, DBDSQR and DBDSDC.

A Hybrid Subdivision Strategy for Result-Verifying Nonlinear Solvers

PAMM, 2004

Many different heuristics have been proposed for selecting the subdivision direction in branch-an... more Many different heuristics have been proposed for selecting the subdivision direction in branch-and-bound result-verifying nonlinear solvers. We investigate the impact of the box-splitting techniques on the overall performance of the solver and propose a new approach combining some of the simple heuristics in a hybrid way. Numerical experiments with mediumsized example problems indicate that our approach is successful.

Automatic Differentiation for MATLAB Programs

PAMM, 2003

Derivative information is required in numerous applications, including sensitivity analysis and n... more Derivative information is required in numerous applications, including sensitivity analysis and numerical optimization. For simple functions, symbolic differentiation-done either manually or with a computer algebra system-can provide the derivatives, whereas divided differences (DD) have been used traditionally for functions defined by (potentially very complex) computer programs, even if only approximate values can be obtained this way. An alternative approach for such functions is automatic differentiation (AD), yielding exact derivatives at often lower cost than DD, and without restrictions on the program complexity. In this paper we compare the functionality and describe the use of ADMIT/ADMAT and ADiMat. These two AD tools provide derivatives for programs written in the MAT-LAB language, which is widely used for prototype and production software in scientific and engineering applications. While ADMIT/ADMAT implements a pure operator overloading approach of AD, ADiMat also employes source transformation techniques.

Parallel tridiagonalization through two-step band reduction

Proceedings of IEEE Scalable High Performance Computing Conference, 1994

We present a two-step variant of the \successive band reduction" paradigm for the tridiagonalizat... more We present a two-step variant of the \successive band reduction" paradigm for the tridiagonalization of symmetric matrices. Here we reduce a full matrix rst to narrow-banded form and then to tridiagonal form. The rst step allows easy exploitation of block orthogonal transformations. In the second step, we employ a new blocked version of a banded matrix tridiagonalization algorithm by Lang. In particular, we are able to express the update of the orthogonal transformation matrix in terms of block transformations. This expression leads to an algorithm that is almost entirely based on BLAS-3 kernels and has greatly improved data movement and communication characteristics. We also present some performance results on the Intel Touchstone DELTA and the IBM SP1.

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

Parallel Computing, 2011

Article history: Available online xxxx Keywords: Electronic structure calculations Eigenvalue and... more

1 Scientific Highlight of the Month