Jan Hückelheim
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Hückelheim.
ACM Transactions on Mathematical Software | 2017
Jan Hückelheim; Laurent Hascoët; Jens-Dominik Müller
Algorithmic differentiation (AD) by source-transformation is an established method for computing derivatives of computational algorithms. Static dataflow analysis is commonly used by AD tools to determine the set of active variables, that is, variables that are influenced by the program input in a differentiable way and have a differentiable influence on the program output. In this work, a context-sensitive static analysis combined with procedure cloning is used to generate specialised versions of differentiated procedures for each call site. This enables better detection and elimination of unused computations and memory storage, resulting in performance improvements of the generated code, in both forward- and reverse-mode AD. The implications of this multi-activity AD approach on the static analysis of an AD tool is shown using dataflow equations. The worst-case cost of multi-activity AD on the differentiation process is analysed and practical remedies to avoid running into this worst case are presented. The method was implemented in the AD tool Tapenade, and we present its application to a 3D unstructured compressible flow solver, for which we generate an adjoint solver that performs significantly faster when multi-activity AD is used.
Proceedings of the First International Workshop on Software Correctness for HPC Applications | 2017
Markus Schordan; Jan Hückelheim; Pei-Hung Lin; Harshitha Menon
The semantics of floating-point computations are known to be difficult to verify. Software verification tools often provide little or no support for floating-point semantics, making it difficult to prove the correctness of an optimized variant of a program involving floating-point computations. In this paper we present an approach for verifying the equivalence of two program variants involving non-trivial floating-point operations. The selected test case for our approach are two variants of a differentiated code - one automatically generated, the other manually written. The verification technique operates at the source level, therefore we also investigate the generated assembly code variants and reason on a set of selected compiler options and architectures to guarantee that the correctness proof also holds for the generated binaries.
Proceedings of the First International Workshop on Software Correctness for HPC Applications | 2017
Jan Hückelheim; Ziqing Luo; Fabio Luporini; Navjot Kukreja; Gerard J. Gorman; Stephen F. Siegel; Matthew B. Dwyer; Paul D. Hovland
Code generation from domain-specific languages is becoming increasingly popular as a method to obtain optimised low-level code that performs well on a given platform and for a given problem instance. Ensuring the correctness of generated codes is crucial. At the same time, testing or manual inspection of the code is problematic, as the generated code can be complex and hard to read. Moreover, the generated code may change depending on the problem type, domain size, or target platform, making conventional code review or testing methods impractical. As a solution, we propose the integration of formal verification tools into the code generation process. We present a case study in which the CIVL verification tool is combined with the Devito finite difference framework that generates optimised stencil code for PDE solvers from symbolic equations. We show a selection of properties of the generated code that can be automatically specified and verified during the code generation process. Our approach allowed us to detect a previously unknown bug in the Devito code generation tool.
International Journal of High Performance Computing Applications | 2017
Jan Hückelheim; Paul D. Hovland; Michelle Mills Strout; Jens-Dominik Müller
Reverse-mode algorithmic differentiation (AD) is an established method for obtaining adjoint derivatives of computer simulation applications. In computational fluid dynamics (CFD), adjoint derivatives of a cost function output such as drag or lift with respect to design parameters such as surface coordinates or geometry control points are a key ingredient for shape optimization, uncertainty quantification and flow control. The computational cost of CFD applications and their derivatives makes it essential to use high-performance computing hardware efficiently, including multi- and many-core architectures. Nevertheless, OpenMP is not supported in most AD tools, and previously shown methods achieve poor scalability of the derivative code. We present the AD of an OpenMP-parallelized finite volume compressible flow solver for unstructured meshes. Our approach enables us to reuse the parallelization of the original code in the computation of adjoint derivatives. The method works by identifying code segments that can be differentiated in reverse-mode without changing their memory access pattern. The OpenMP parallelization is integrated into the derivative code during the build process in a way that is robust to modifications of the original code and independent of the OpenMP support of the differentiation tool. We show the scalability of our adjoint CFD solver on test cases ranging from thousands to millions of finite volume mesh cells on CPUs with up to 16 threads as well as on an Intel XeonPhi card with 236 threads. We demonstrate that our approach is more practical to implement for production-sized CFD codes and produces more efficient adjoint derivative code than previously shown AD methods.
Archive | 2019
Jan Hückelheim; Jens-Dominik Müller
Gradient-based optimisation using adjoints is an increasingly common approach for industrial flow applications. For cases where the flow is largely unsteady however, the adjoint method is still not widely used, in particular because of its prohibitive computational cost and memory footprint. Several methods have been proposed to reduce the peak memory usage, such as checkpointing schemes or checkpoint compression, at the price of increasing the computational cost even further. We investigate incomplete checkpointing as an alternative, which reduces memory usage at almost no extra computational cost, but instead offers a trade-off between memory footprint and the fidelity of the model. The method works by storing only selected physical time steps and using interpolation to reconstruct time steps that have not been stored. We show that this is enough to compute sufficiently accurate adjoint sensitivities for many relevant cases, and does not add significantly to the computational cost. The method works for general cases and does not require to identify periodic cycles in the flow.
static analysis symposium | 2018
Jan Hückelheim; Ziqing Luo; Sri Hari Krishna Narayanan; Stephen F. Siegel; Paul D. Hovland
There is growing demand for formal verification methods in the scientific and high performance computing communities. For scientific applications, it is not only necessary to verify the absence of violations such as out of bounds access or race conditions, but also to ensure that the results satisfy certain mathematical properties. In this work, we explore the limits of automated bounded verification in the verification of these programs by applying the symbolic execution tool CIVL to some numerical algorithms that are frequently used in scientific programs, namely a conjugate gradient solver, a finite difference stencil, and a mesh quality metric. These algorithms implement differentiable functions, allowing us to use the automatic differentiation tools Tapenade and ADIC in the creation of their specifications.
international conference on parallel processing | 2018
Jan Hückelheim; Paul D. Hovland; Sri Hari Krishna Narayanan; Paulius Velesko
Ensemble computations are used to evaluate a function for multiple inputs, for example in uncertainty quantification. Embedded ensemble computations perform several evaluations within the same program, often enabling a reduced overall runtime by exploiting vectorisation and parallelisation opportunities that are not present in individual ensemble members. This is challenging if members take different control flow paths. We present a source-to-source transformation that turns a given C program into an embedded ensemble program that computes members in a single-instruction-multiple-data fashion using OpenMP SIMD pragmas. We use techniques from whole-function vectorisation, achieving effective vectorisation for moderate amounts of branch divergence, particularly on processors with masked instructions such as recent Xeon Phi or Skylake processors with AVX-512.
Optimization Methods & Software | 2018
Jan Hückelheim; Paul D. Hovland; Michelle Mills Strout; Jens-Dominik Müller
Algorithmic differentiation (AD) is a tool for generating discrete adjoint solvers, which efficiently compute gradients of functions with many inputs, for example for use in gradient-based optimization. AD is often applied to large computations such as stencil operators, which are an important part of most structured-mesh PDE solvers. Stencil computations are often parallelized, for example by using OpenMP, and optimized by using techniques such as cache-blocking and tiling to fully utilize multicore CPUs and many-core accelerators and GPUs. Differentiating these codes with conventional reverse-mode AD results in adjoint codes that cannot be expressed as stencil operations and may not be easily parallelizable. They thus leave most of the compute power of modern architectures unused. We present a method that combines forward-mode AD and loop transformation to generate adjoint solvers that use the same memory access pattern as the original computation that they are derived from and can benefit from the same optimization techniques. The effectiveness of this method is demonstrated by generating a scalable adjoint CFD solver for multicore CPUs and Xeon Phi accelerators.
53rd AIAA Aerospace Sciences Meeting | 2015
Shenren Xu; Jan Hückelheim; Mateusz Gugala; Jens-Dominik Müller
This paper compares two methods to compute the linearized solution of the nonlinear Navier-Stokes equation when the baseline flow exhibits physical or numerical unsteadiness. The first is to develop an implicit nonlinear flow solver that allows the unstable steady state solution to be reached via running the steady RANS solver using a large CFL number. The second method is to treat the flow as an unsteady problem and linearize the whole unsteady problem around a baseline unsteady flow. The comparison is performed for a viscous 2D aerofoil case with truncated trailing edge. Both the computational cost and the accuracy are compared for the two approaches.
Proceedings of the 16th Python in Science Conference | 2017
Navjot Kukreja; Fabio Luporini; Mathias Louboutin; Charles Yount; Jan Hückelheim; Gerard J. Gorman