Luis Cebamanos
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luis Cebamanos.
Proceedings of the First Workshop on PGAS Applications | 2016
Anton Shterenlikht; Lee Margetts; José David Arregui-Mena; Luis Cebamanos
Fortran coarrays have been used as an extension to the standard for over 20 years, mostly on Cray systems. Their appeal to users increased substantially when they were standardised in 2010. In this work we show that coarrays offer simple and intuitive data structures for 3D cellular automata (CA) modelling of material microstructures. We show how coarrays can be used together with an MPI finite element (FE) library to create a two-way concurrent hierarchical and scalable multi-scale CAFE deformation and fracture framework. Design of a coarray cellular automata microstructure evolution library CGPACK is described. A highly portable MPI FE library ParaFEM was used in this work. We show that independently CGPACK and ParaFEM programs can scale up well into tens of thousands of cores. Strong scaling of a hybrid ParaFEM/CGPACK MPI/coarray multi-scale framework was measured on an important solid mechanics practical example of a fracture of a steel round bar under tension. That program did not scale beyond 7 thousand cores. Excessive synchronisation might be one contributing factor to relatively poor scaling. Therefore we conclude with a comparative analysis of synchronisation requirements in MPI and coarray programs. Specific challenges of synchronising a coarray library are discussed.
International Conference on Exascale Applications and Software | 2014
Luis Cebamanos; David Henty; Harvey Richardson; Alistair Hart
Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. However, the complexity of their architectures and the impenetrable structure of some large applications makes the hand-tuning algorithms process more challenging and unproductive. On the contrary, auto-tuning technology has appeared as a solution to this problems since it can address the inherent complexity of the latest and future computer architectures. By auto-tuning, an application may be optimised for a target platform by making automated optimal choices. To exploit this technology on modern GPUs, we have created an auto-tuned version of Nek5000 based on OpenACC directives which has demonstrated to obtained improved results over a hand-tune optimised version of the same computation kernels. This paper focuses on a particular role for auto-tuning Nek5000 to utilise a massively parallel GPU accelerated system based on OpenACC directive to adapt the Nek5000 code for the Exascale computation.
Proceedings of the 25th European MPI Users' Group Meeting on | 2018
Anton Shterenlikht; Luis Cebamanos
Fortran coarrays are an attractive alternative to MPI due to a familiar Fortran syntax, single sided communications and implementation in the compiler. Scaling of coarrays is compared in this work to MPI, using cellular automata (CA) 3D Ising magnetisation miniapps, built with the CASUP CA library, https://cgpack.sourceforge.io, developed by the authors. Ising energy and magnetisation were calculated with MPI_ALLREDUCE and Fortran 2018 co_sum collectives. The work was done on ARCHER (Cray XC30) up to the full machine capacity: 109,056 cores. Ping-pong latency and bandwidth results are very similar with MPI and with coarrays for message sizes from 1B to several MB. MPI halo exchange (HX) scaled better than coarray HX, which is surprising because both algorithms use pair-wise communications: MPI IRECV/ISEND/WAITALL vs Fortran sync images. Adding OpenMP to MPI or to coarrays resulted in worse L2 cache hit ratio, and lower performance in all cases, even though the NUMA effects were ruled out. This is likely because the CA algorithm is memory and network bound. The sampling and tracing analysis shows good load balancing in compute in all miniapps, but imbalance in communication, indicating that the difference in performance between MPI and coarrays is likely due to parallel libraries (MPICH2 vs libpgas) and the Cray hardware specific libraries (uGNI vs DMAPP). Overall, the results look promising for coarray use beyond 100k cores. However, further coarray optimisation is needed to narrow the performance gap between coarrays and MPI.
Advances in Engineering Software | 2018
Anton Shterenlikht; Lee Margetts; Luis Cebamanos
Abstract A 3D multi-scale cellular automata finite element (CAFE) framework for modelling fracture in heterogeneous materials is described. The framework is implemented in a hybrid MPI/Fortran coarray code for efficient parallel execution on HPC platforms. Two open source BSD licensed libraries developed by the authors in modern Fortran were used: CGPACK, implementing cellular automata (CA) using Fortran coarrays, and ParaFEM, implementing finite elements (FE) using MPI. The framework implements a two-way concurrent hierarchical information exchange between the structural level (FE) and the microstructure (CA). MPI to coarrays interface and data structures are described. The CAFE framework is used to predict transgranular cleavage propagation in a polycrystalline iron round bar under tension. Novel results enabled by this CAFE framework include simulation of progressive cleavage propagation through individual grains and across grain boundaries, and emergence of a macro-crack from merging of cracks on preferentially oriented cleavage planes in individual crystals. Nearly ideal strong scaling up to at least tens of thousands of cores was demonstrated by CGPACK and by ParaFEM in isolation in prior work on Cray XE6. Cray XC30 and XC40 platforms and CrayPAT profiling were used in this work. Initially the strong scaling limit of hybrid CGPACK/ParaFEM CAFE model was 2000 cores. After replacing all-to-all communication patterns with the nearest neighbour algorithms the strong scaling limit on Cray XC30 was increased to 7000 cores. TAU profiling on non-Cray systems identified deficiencies in Intel Fortran 16 optimisation of remote coarray operations. Finally, coarray synchronisation challenges and opportunities for thread parallelisation in CA are discussed.
international conference on parallel processing | 2017
Dana Akhmetova; Luis Cebamanos; Roman Iakymchuk; Tiberiu Rotaru; Mirko Rahn; Stefano Markidis; Erwin Laure; Valeria Bartsch; Christian Simmendinger
One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs.
2nd International Conference on Exascale Applications and Software (EASC) | 2015
Luis Cebamanos; David Henty; Harvey Richardson; Alistair Hart
Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. However, the complexity of their architectures and the impenetrable structure of some large applications makes the hand-tuning algorithms process more challenging and unproductive. On the contrary, auto-tuning technology has appeared as a solution to this problems since it can address the inherent complexity of the latest and future computer architectures. By auto-tuning, an application may be optimised for a target platform by making automated optimal choices. To exploit this technology on modern GPUs, we have created an auto-tuned version of Nek5000 based on OpenACC directives which has demonstrated to obtained improved results over a hand-tune optimised version of the same computation kernels. This paper focuses on a particular role for auto-tuning Nek5000 to utilise a massively parallel GPU accelerated system based on OpenACC directive to adapt the Nek5000 code for the Exascale computation.
Archive | 2017
Anton Shterenlikht; L. Margetts; Luis Cebamanos
The Institute of Electrical and Electronics Engineers | 2017
Anton Shterenlikht; L. Margetts; José David Arregui-Mena; Luis Cebamanos
the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) | 2015
Jing Gong; Stefano Markidis; Michael Schliephake; Erwin Laure; Luis Cebamanos; Alistair Hart; Misun Min; Paul F. Fischer
Archive | 2016
Luis Cebamanos; Anton Shterenlikht; José David Arregui-Mena; L. Margetts