Is this you? Create Your Porfile

Alessandro Lonardo

Istituto Nazionale di Fisica Nucleare

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alessandro Lonardo is active.

Explore More

Publication

Featured researches published by Alessandro Lonardo.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect

Roberto Ammendola; Massimo Bernaschi; Andrea Biagioni; Mauro Bisson; Massimiliano Fatica; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Enrico Mastrostefano; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.

arXiv: Computational Physics | 2011

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; A. Salamon; G. Salina; Francesco Simula; Laura Tosoratto; P. Vicini

We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera® FPGA, are provided.

ieee international conference on high performance computing data and analytics | 2011

QUonG: A GPU-based HPC System Dedicated to LQCD Computing

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

QUonG is an INFN (Istituto Nazionale di Fisica Nucleare) initiative targeted to develop a high performance computing system dedicated to Lattice QCD computations. QUonG is a massively parallel computing platform that lever-ages on commodity multi-core processors coupled with last generation GPUs. Its network mesh exploits the characteristics of LQCD algorithm for the design of a point-to-point, high performance, low latency 3-d torus network to interconnect the computing nodes. The network is built upon the APE net+ project: it consists of an FPGA-based PCI Express board exposing six full bidirectional off-board links running at 34 Gbps each, and implementing RDMA protocol and an experimental direct network-to-GPU interface, enabling significant access latency reduction for inter-node data transfers. The final shape of a complete QUonG deployment is an assembly of standard 42U racks, each one capable of 60 TFlops/rack of peak performance, at a cost of 5 Ke/TFlops and for an estimated power consumption of 25 KW/rack. A first QUonG system prototype is expected to be delivered at the end of the year 2011.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

legaSCi: Legacy SystemC Model Integration into Parallel Systemc Simulators

Christoph Schumacher; Jan Henrik Weinstock; Rainer Leupers; Gerd Ascheid; Laura Tosoratto; Alessandro Lonardo; Dietmar Petras; Thorsten H. Grötker

Virtual prototyping of parallel and embedded systems increases insight into existing computer systems. It further allows to explore properties of new systems already during their specification phase. Virtual prototypes of such systems benefit from parallel simulation techniques due to the increased simulation speed. One common problem full system simulator implementers face is the revision and integration of legacy models coded without thread-safety and deterministic behavior in mind. To lessen this burden, this paper presents a methodology to integrate unmodified SystemC legacy models into parallel SystemC simulators. Using the proposed technique, the embedded platform simulator of the EU FP7 project EURETILE, which inherited a number of legacy models from its predecessor project SHAPES, has been transformed into a parallel simulation platform, demonstrating speed-ups of up to 3.36 on four simulation host cores.

arXiv: High Energy Physics - Lattice | 2003

Status of the apeNEXT project

R. Ammendola; François Bodin; Ph. Boucaud; N. Cabibbo; F. Di Carlo; R. De Pietri; F. Di Renzo; W. Errico; A. Fucci; Marco Guagnelli; H. Kaldass; Alessandro Lonardo; S. de Lucad; J. Micheli; V. Morenas; O. Pène; R. Petronzio; F. Palombi; Dirk Pleiter; N. Paschedag; F. Rapuano; P. De Riso; A. Salamon; G. Salina; L. Sartori; F. Schifano; H. Simma; R. Tripiccione; P. Vicini

Abstract We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of the software development.

Journal of Systems Architecture | 2016

Dynamic many-process applications on many-tile embedded systems and HPC clusters

Pier Stanislao Paolucci; Andrea Biagioni; Luis Gabriel Murillo; Frédéric Rousseau; Lars Schor; Laura Tosoratto; Iuliana Bacivarov; Robert Lajos Buecs; Clément Deschamps; Ashraf El-Antably; Roberto Ammendola; Nicolas Fournel; Ottorino Frezza; Rainer Leupers; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Elena Pastorelli; Devendra Rai; Davide Rossetti; Francesco Simula; Lothar Thiele; P. Vicini; Jan Henrik Weinstock

In the next decade, a growing number of scientific and industrial applications will require power-efficient systems providing unprecedented computation, memory, and communication resources. A promising paradigm foresees the use of heterogeneous many-tile architectures. The resulting computing systems are complex: they must be protected against several sources of faults and critical events, and application programmers must be provided with programming paradigms, software environments and debugging tools adequate to manage such complexity. The EURETILE (European Reference Tiled Architecture Experiment) consortium conceived, designed, and implemented: 1- an innovative many-tile, many-process dynamic fault-tolerant programming paradigm and software environment, grounded onto a lightweight operating system generated by an automated software synthesis mechanism that takes into account the architecture and application specificities; 2- a many-tile heterogeneous hardware system, equipped with a high-bandwidth, low-latency, point-to-point 3D-toroidal interconnect. The inter-tile interconnect processor is equipped with an experimental mechanism for systemic fault-awareness; 3- a full-system simulation environment, supported by innovative parallel technologies and equipped with debugging facilities. We also designed and coded a set of application benchmarks representative of requirements of future HPC and Embedded Systems, including: 4- a set of dynamic multimedia applications and 5- a large scale simulator of neural activity and synaptic plasticity. The application benchmarks, compiled through the EURETILE software tool-chain, have been efficiently executed on both the many-tile hardware platform and on the software simulator, up to a complexity of a few hundreds of software processes and hardware cores.

arXiv: High Energy Physics - Lattice | 2002

Status of APEmille

A. Bartoloni; Ph. Boucaud; N. Cabibbo; F. Calvayrac; M. Della Morte; R. De Pietri; P. De Riso; F. Di Carlo; F. Di Renzo; W. Errico; Roberto Frezzotti; T. Giorgino; Jochen Heitger; Alessandro Lonardo; M. Loukianov; G. Magazzú; J. Micheli; V. Morenas; N. Paschedag; O. Pène; R. Petronzio; Dirk Pleiter; F. Rapuano; Juri Rolf; Davide Rossetti; L. Sartori; H. Simma; F. Schifano; M. Torelli; R. Tripiccione

Abstract This paper presents the status of the APEmille project, which is essentially completed, as far as machine development and construction is concerned. Several large installations of APEmille are in use for physics production runs leading to many new results presented at this conference. This paper briefly summarizes the APEmille architecture, reviews the status of the installations and presents some performance figures for physics codes.

Future Generation Computer Systems | 2015

A hierarchical watchdog mechanism for systemic fault awareness on distributed systems

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

Systemic fault tolerance is usually pursued with a number of strategies, like redundancy and checkpoint/restart; any of them needs to be triggered by safe and fast fault detection. We devised a hardware/software approach to fault detection that enables a system-level Fault Awareness by implementing a hierarchical Mutual Watchdog. It relies on an improved high performance Network Interface Card (NIC), implementing an n -dimensional mesh topology and a Service Network. The hierarchical watchdog mechanism is able to quickly detect faults on each node, as the Host and the high performance NIC guard each other while every node monitors its own first neighbours in the mesh. Duplicated and distributed Supervisor Nodes receive communication by means of diagnostic messages routed through either the Service Network or the N -dimensional Network, then assemble a global picture of the system status. In this way our approach allows achieving a Fault Awareness with no-single-point-of-failure. We describe an implementation of this hardware/software co-design for our high performance 3D torus NIC, with a focus on how routed diagnostic messages do not affect the system performances. We approach fault tolerance for distributed systems from fault detection and awareness.We propose a HW/SW mechanism based on a mutual watchdog mechanism between Host and NIC.A double diagnostic message path leads to resilient systemic fault awareness.Our tool can interface fault reaction/recovery systems to trigger them automatically.Our mechanism has no impact on system performance.

arXiv: Instrumentation and Detectors | 2014

NaNet: a low-latency NIC enabling GPU-based, real-time low level trigger systems

Roberto Ammendola; Andrea Biagioni; R. Fantechi; Ottorino Frezza; G. Lamanna; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; F. Pantaleo; R. Piandani; L. Pontisso; Davide Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

We implemented the NaNet FPGA-based PCIe Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.

Scientific Programming | 2017

Power-Efficient Computing: Experiences from the COSA Project

Daniele Cesini; Elena Corni; Antonio Falabella; Andrea Ferraro; Lucia Morganti; Enrico Calore; Sebastiano Fabio Schifano; Michele Michelotto; Roberto Alfieri; Roberto De Pietri; Tommaso Boccali; Andrea Biagioni; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; P. Vicini

Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of processors with a high ratio of performance per watt is necessary to understand how to realize energy-efficient computing systems for scientific applications, using this class of processors. Computing On SOC Architecture (COSA) is a three-year project (2015–2017) funded by the Scientific Commission V of the Italian Institute for Nuclear Physics (INFN), which aims to investigate the performance and the total cost of ownership offered by computing systems based on commodity low-power Systems on Chip (SoCs) and high energy-efficient systems based on GP-GPUs. In this work, we present the results of the project analyzing the performance of several scientific applications on several GPU- and SoC-based systems. We also describe the methodology we have used to measure energy performance and the tools we have implemented to monitor the power drained by applications while running.

Explore More