Fabio Luporini
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fabio Luporini.
ACM Transactions on Mathematical Software | 2017
Florian Rathgeber; David A. Ham; Lawrence Mitchell; Fabio Luporini; Andrew T. T. McRae; Gheorghe-Teodor Bercea; Graham Markall; Paul H. J. Kelly
Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but with a pure Python runtime-only implementation centred on the composition of several existing and new abstractions for particular aspects of scientific computing. The result is a more complete separation of concerns which eases the incorporation of separate contributions from computer scientists, numerical analysts and application specialists. These contributions may add functionality, or improve performance. Firedrake benefits from automatically applying new optimisations. This includes factorising mixed function spaces, transforming and vectorising inner loops, and intrinsically supporting block matrix operations. Importantly, Firedrake presents a simple public API for escaping the UFL abstraction. This allows users to implement common operations that fall outside pure variational formulations, such as flux-limiters.
ACM Transactions on Architecture and Code Optimization | 2015
Fabio Luporini; Ana Lucia Varbanescu; Florian Rathgeber; Gheorghe-Teodor Bercea; J. Ramanujam; David A. Ham; Paul H. J. Kelly
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their op- timization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COF- FEE, a domain-specific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a high-level domain-specific language for PDEs by introducing domain-aware composable optimizations aimed at improving instruction-level parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finite-element forms of increasing complexity show that significant performance improvement is achieved.
ACM Transactions on Mathematical Software | 2017
Fabio Luporini; David A. Ham; Paul H. J. Kelly
We present an algorithm for the optimization of a class of finite-element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite-element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite-element code generation in almost all cases. This validates the effectiveness of the algorithm presented here and illustrates its limitations.
ieee international conference on high performance computing data and analytics | 2016
Navjot Kukreja; Mathias Louboutin; Felippe Vieira; Fabio Luporini; Gerard J. Gorman
Domain specific languages have successfully been used in a variety of fields to cleanly express scientific problems as well as to simplify implementation and performance optimization on different computer architectures. Although a large number of stencil languages are available, finite difference domain specific languages have proved challenging to design because most practical use cases require additional features that fall outside the finite difference abstraction. Inspired by the complexity of real-world seismic imaging problems, we introduce Devito, a domain specific language in which high level equations are expressed using symbolic expressions from the SymPy package. Complex equations are automatically manipulated, optimized, and translated into highly optimized C code that aims to perform comparably or better than hand-tuned code. All this is transparent to users, who only see concise symbolic mathematical expressions.
SIAM Journal on Scientific Computing | 2018
Miklós Homolya; Lawrence Mitchell; Fabio Luporini; David A. Ham
A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation being to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations at the highest possible level of abstraction. TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems. This enables us to preserve the tensor structure of expressions longer through the compilation process than is possible with other form compilers. This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations in the second stage. TSFC also efficiently traverse...
international parallel and distributed processing symposium | 2014
Michelle Mills Strout; Fabio Luporini; Christopher D. Krieger; Carlo Bertolli; Gheorghe-Teodor Bercea; Catherine Olschanowsky; J. Ramanujam; Paul H. J. Kelly
Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependence structure is data-dependent due to indirect references such as A[B[i]]. Such references are a common occurrence in sparse matrix computations, molecular dynamics simulations, and unstructured-mesh computational fluid dynamics (CFD). Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality. These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the sparse matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application. In this paper, we present a generalized full sparse tiling algorithm that uses the newly developed loop chain abstraction as input, improves inter-loop data locality, and creates a task graph to expose shared-memory parallelism at runtime. We evaluate the overhead and performance impact of the generalized full sparse tiling algorithm on two codes: a sparse Jacobi iterative solver and the Airfoil CFD benchmark.
Proceedings of the First International Workshop on Software Correctness for HPC Applications | 2017
Jan Hückelheim; Ziqing Luo; Fabio Luporini; Navjot Kukreja; Gerard J. Gorman; Stephen F. Siegel; Matthew B. Dwyer; Paul D. Hovland
Code generation from domain-specific languages is becoming increasingly popular as a method to obtain optimised low-level code that performs well on a given platform and for a given problem instance. Ensuring the correctness of generated codes is crucial. At the same time, testing or manual inspection of the code is problematic, as the generated code can be complex and hard to read. Moreover, the generated code may change depending on the problem type, domain size, or target platform, making conventional code review or testing methods impractical. As a solution, we propose the integration of formal verification tools into the code generation process. We present a case study in which the CIVL verification tool is combined with the Devito finite difference framework that generates optimised stencil code for PDE solvers from symbolic equations. We show a selection of properties of the generated code that can be automatically specified and verified during the code generation process. Our approach allowed us to detect a previously unknown bug in the Devito code generation tool.
Geoscientific Model Development Discussions | 2018
Mathias Louboutin; Michael Lange; Fabio Luporini; Navjot Kukreja; Philipp A. Witte; Felix J. Herrmann; Paulius Velesko; Gerard J. Gorman
We introduce Devito, a new domain-specific language for implementing high-performance finite difference partial differential equation solvers. The motivating application is exploration seismology where methods such as Full-Waveform Inversion and Reverse-Time Migration are used to invert terabytes of seismic data to create images of the earths subsurface. Even using modern supercomputers, it can take weeks to process a single seismic survey and create a useful subsurface image. The computational cost is dominated by the numerical solution of wave equations and their corresponding adjoints. Therefore, a great deal of effort is invested in aggressively optimizing the performance of these wave-equation propagators for different computer architectures. Additionally, the actual set of partial differential equations being solved and their numerical discretization is under constant innovation as increasingly realistic representations of the physics are developed, further ratcheting up the cost of practical solvers. By embedding a domain-specific language within Python and making heavy use of SymPy, a symbolic mathematics library, we make it possible to develop finite difference simulators quickly using a syntax that strongly resembles the mathematics. The Devito compiler reads this code and applies a wide range of analysis to generate highly optimized and parallel code. This approach can reduce the development time of a verified and optimized solver from months to days.
ieee international conference on high performance computing data and analytics | 2017
Navjot Kukreja; Mathias Louboutin; Fabio Luporini; Gerard J. Gorman
Summary In this talk, I will discuss our approach to the formulation and the performance optimization of finite difference methods for PDEs arising in FWI. Our framework consists of a stack of domain specific languages and optimizing compilers. The mathematical specification of a finite difference method is translated by a compiler, Devito, into C code, applying a sophisticated sequence of transformations. These include standard loop transformations, such as blocking and vectorization, as well as symbolic manipulations to reduce the unusually high arithmetic intensity of the stencils arising in forward and adjoint operators. These include common subexpressions elimination, factorization, code motion and approximation of transient functions. I will show the impact of these transformations on standard Intel Xeon architectures as well as on Intel Knights Landing. Compelling evidence points in the direction that our stencil kernels are significantly bound by the L1 cache. I will conclude discussing future challenges and goals of our work.
Archive | 2016
Florian Rathgeber; Hector Dearman; gbts; Gheorghe-Teodor Bercea; Kaho Sato; Simon W. Funke; Lawrence Mitchell; Miklós Homolya; Francis Russell; Christian T. Jacobs; David A. Ham; Andrew T. T. McRae; Graham Markall; Fabio Luporini