A. Cristiano I. Malossi

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where A. Cristiano I. Malossi is active.

Explore More

Publication

Featured researches published by A. Cristiano I. Malossi.

ieee international conference on high performance computing data and analytics | 2015

An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle

Johann Rudi; A. Cristiano I. Malossi; Tobin Isaac; Georg Stadler; Michael Gurnis; Peter W. J. Staar; Yves Ineichen; Costas Bekas; Alessandro Curioni; Omar Ghattas

Mantle convection is the fundamental physical process within earths interior responsible for the thermal and geological evolution of the planet, including plate tectonics. The mantle is modeled as a viscous, incompressible, non-Newtonian fluid. The wide range of spatial scales, extreme variability and anisotropy in material properties, and severely nonlinear rheology have made global mantle convection modeling with realistic parameters prohibitive. Here we present a new implicit solver that exhibits optimal algorithmic performance and is capable of extreme scaling for hard PDE problems, such as mantle convection. To maximize accuracy and minimize runtime, the solver incorporates a number of advances, including aggressive multi-octree adaptivity, mixed continuous-discontinuous discretization, arbitrarily-high-order accuracy, hybrid spectral/geometric/algebraic multigrid, and novel Schur-complement preconditioning. These features present enormous challenges for extreme scalability. We demonstrate that---contrary to conventional wisdom---algorithmically optimal implicit solvers can be designed that scale out to 1.5 million cores for severely nonlinear, ill-conditioned, heterogeneous, and anisotropic PDEs.

Philosophical Transactions of the Royal Society A | 2014

Changing computing paradigms towards power efficiency

Pavel Klavík; A. Cristiano I. Malossi; Constantine Bekas; Alessandro Curioni

Power awareness is fast becoming immensely important in computing, ranging from the traditional high-performance computing applications to the new generation of data centric workloads. In this work, we describe our efforts towards a power-efficient computing paradigm that combines low- and high-precision arithmetic. We showcase our ideas for the widely used kernel of solving systems of linear equations that finds numerous applications in scientific and engineering disciplines as well as in large-scale data analytics, statistics and machine learning. Towards this goal, we developed tools for the seamless power profiling of applications at a fine-grain level. In addition, we verify here previous work on post-FLOPS/W metrics and show that these can shed much more light in the power/energy profile of important applications.

international parallel and distributed processing symposium | 2016

Stochastic Matrix-Function Estimators: Scalable Big-Data Kernels with High Performance

Peter W. J. Staar; Panagiotis Kl. Barkoutsos; Roxana Istrate; A. Cristiano I. Malossi; Ivano Tavernelli; Nikolaj Moll; Heiner Giefers; Christoph Hagleitner; Costas Bekas; Alessandro Curioni

In this era of Big Data, large graphs appear in many scientific domains. To extract the hidden knowledge/correlations in these graphs, novel methods need to be developed to analyse these graphs fast. In this paper, we present a unified framework of stochastic matrix-function estimators, which allows one to compute a subset of elements of the matrix f(A), where f is an arbitrary function and A is the adjacency matrix of the graph. The new framework has a computational cost proportional to the size of the subset, i.e. to obtain the diagonal of f(A) with matrix-size N, the computational cost is proportional to N contrary to the traditional N^3 from diagonalization. Furthermore, we will show that the new framework allows us to write implementations of the algorithm that scale naturally with the number of compute nodes and is easily ported to accelerators where the kernels perform very well.

ieee international conference on high performance computing, data, and analytics | 2016

First Experiences with ab initio Molecular Dynamics on OpenPOWER: The Case of CPMD

Valéry Weber; A. Cristiano I. Malossi; Ivano Tavernelli; Teodoro Laino; Costas Bekas; Manish Modani; Nina Wilner; Tom Heller; Alessandro Curioni

In this article, we present the algorithmic adaptation and code re-engineering required for porting highly successful and popular planewave codes to next-generation heterogeneous OpenPOWER architectures that foster acceleration and high bandwidth links to GPUs. Here we focus on CPMD as the most representative software for ab initio molecular dynamics simulations. We have ported the construction of the electronic density, the application of the potential to the wavefunctions and the orthogonalization procedure to the GPU. The different GPU kernels consist mainly of fast Fourier transforms (FFT) and basic linear algebra operations (BLAS). The performance of the new implementation obtained on Firestone (POWER8/Tesla) is discussed. We show that the communication between the host and the GPU contributes a large fraction of the total run time. We expect a strong attenuation of the communication bottleneck when the NVLink high-speed interconnect will be available.

european conference on parallel processing | 2016

The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8

Sandra Catalán; A. Cristiano I. Malossi; Costas Bekas; Enrique S. Quintana-Ortí

The physical limitations of CMOS miniaturization have promoted understanding the interplay between performance and energy into a primary challenge. In this paper we contribute towards this goal by assessing the effect of voltage and frequency scaling VFS on the energy consumption of the dense and sparse matrix-vector products. The optimization of the sparse kernel, from the perspective of both performance and energy efficiency, is especially difficult due to its irregular memory access pattern, but the potential benefits are remarkable because of its varied applications. n nOur experiments with a small synthetic training set show that it is possible to build a general classification of sparse matrices that governs the optimal VFS level from the point of view of energy efficiency. More importantly, this characterization can be leveraged to tune VFS for a major portion of the University of Florida Matrix Collection, when executed on the IBM Power8, yielding significant gains with respect to a power-hungry configuration that simply favours performance.

parallel computing | 2018

A scalable iterative dense linear system solver for multiple right-hand sides in data analytics

Vassilis Kalantzis; A. Cristiano I. Malossi; Costas Bekas; Alessandro Curioni; Efstratios Gallopoulos; Yousef Saad

Abstract We describe Parallel-Projection Block Conjugate Gradient ( pp-bcg ), a distributed iterative solver for the solution of dense and symmetric positive definite linear systems with multiple right-hand sides. In particular, we focus on linear systems appearing in the context of stochastic estimation of the diagonal of the matrix inverse in Uncertainty Quantification. pp-bcg is based on the block Conjugate Gradient algorithm combined with Galerkin projections to accelerate the convergence rate of the solution process of the linear systems. Numerical experiments on massively parallel architectures illustrate the performance of the proposed scheme in terms of efficiency and convergence rate, as well as its effectiveness relative to the (block) Conjugate Gradient and the Cholesky-based ScaLAPACK solver. In particular, on a 4 rack BG/Q with up to 65,536 processor cores using dense matrices of order as high as 524,288 and 800 right-hand sides, pp-bcg can be 2x-3x faster than the aforementioned techniques.

design, automation, and test in europe | 2018

The transprecision computing paradigm: Concept, design, and applications

A. Cristiano I. Malossi; Michael Schaffner; Anca Mariana Molnos; L. Gammaitoni; Giuseppe Tagliavini; Andrew Emerson; Andres Tomas; Dimitrios S. Nikolopoulos; Eric Flamand; Norbert Wehn

Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moores law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, sub-picoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultra-conservative “precise” computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing.

Sustainable Computing: Informatics and Systems | 2015