A. Lonardo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where A. Lonardo is active.

Explore More

Publication

Featured researches published by A. Lonardo.

Computing in Science and Engineering | 2006

Computing for LQCD: apeNEXT

F. Belletti; Sebastiano Fabio Schifano; R. Tripiccione; François Bodin; Ph. Boucaud; J. Micheli; O. Pene; N. Cabibbo; S. de Luca; A. Lonardo; D Rossetti; P. Vicini; M. Lukyanov; L. Morin; N. Paschedag; H. Simma; V. Morenas; Dirk Pleiter; F. Rapuano

apeNEXT is the latest in the APE collaborations series of parallel computers for computationally intensive calculations such as quantum chromo dynamics on the lattice. The authors describe the computer architectural choices that have been shaped by almost two decades of collaboration activity.

Journal of Physics: Conference Series | 2012

APEnet+: a 3D Torus network optimized for GPU-based HPC Systems

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; F Lo Cicero; A. Lonardo; P.S. Paolucci; D Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

In the supercomputing arena, the strong rise of GPU-accelerated clusters is a matter of fact. Within INFN, we proposed an initiative — the QUonG project — whose aim is to deploy a high performance computing system dedicated to scientific computations leveraging on commodity multi-core processors coupled with latest generation GPUs. The inter-node interconnection system is based on a point-to-point, high performance, low latency 3D torus network which is built in the framework of the APEnet+ project. It takes the form of an FPGA-based PCIe network card exposing six full bidirectional links running at 34 Gbps each that implements the RDMA protocol. In order to enable significant access latency reduction for inter-node data transfer, a direct network-to-GPU interface was built. The specialized hardware blocks, integrated in the APEnet+ board, provide support for GPU-initiated communications using the so called PCIe peer-to-peer (P2P) transactions. This development is made in close collaboration with the GPU vendor NVIDIA. The final shape of a complete QUonG deployment is an assembly of standard 42U racks, each one capable of 80 TFLOPS/rack of peak performance, at a cost of 5 k€/T F LOPS and for an estimated power consumption of 25 kW/rack. In this paper we report on the status of final rack deployment and on the R&D activities for 2012 that will focus on performance enhancement of the APEnet+ hardware through the adoption of new generation 28 nm FPGAs allowing the implementation of PCIe Gen3 host interface and the addition of new fault tolerance-oriented capabilities.

Journal of Instrumentation | 2013

APEnet+ 34 Gbps data transmission system and custom transmission logic

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; A. Lonardo; F Lo Cicero; Pier Stanislao Paolucci; D Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

APEnet+ is a point-to-point, low-latency, 3D-torus network controller integrated in a PCIe Gen2 board based on the Altera Stratix IV FPGA. We characterize the transmission system (embedded transceivers driving external QSFP+ modules) analyzing signal integrity, throughput, latency, BER and jitter at different data rates up to 34 Gbps. We estimate the efficiency of a custom logic able to sustain 2.6 GB/s per link with an FPGA on-chip memory footprint of 40 KB, providing deadlock-free routing and systemic awareness of faults. Finally, we show the preliminary results obtained with the embedded transceivers of a next-generation FPGA and outline some ideas to increase the performance with the same FPGA memory footprint.

Journal of Instrumentation | 2014

NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; G. Lamanna; A. Lonardo; F Lo Cicero; Pier Stanislao Paolucci; F. Pantaleo; D Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.

field-programmable technology | 2013

Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; A. Lonardo; Pier Stanislao Paolucci; D Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

We developed a custom FPGA-based Network Interface Controller named APEnet+ aimed at GPU accelerated clusters for High Performance Computing. The card exploits peer-to-peer capabilities (GPU-Direct RDMA) for latest NVIDIA GPGPU devices and the RDMA paradigm to perform fast direct communication between computing nodes, offloading the host CPU from network tasks execution. In this work we focus on the implementation of a Virtual to Physical address translation mechanism, using the FPGA embedded soft-processor. Address management is the most demanding task - we estimated up to 70% of the μC load - for the NIC receiving side, resulting being the main culprit for data bottleneck. To improve the performance of this task and hence improve data transfer over the network, we added a specialized hardware logic block acting as a Translation Lookaside Buffer. This block makes use of a peculiar Content Address Memory implementation designed for scalability and speed. We present detailed measurements to demonstrate the benefits coming from the introduction of such custom logic: a substantial address translation latency reduction (from a measured value of 1.9 μs to 124 ns) and a performance enhancement of both host-bound and GPU-bound data transfers (up to ~ 60% of bandwidth increase) in given message size ranges.

digital systems design | 2016

The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems

Manolis Katevenis; Nikolaos Chrysos; Manolis Marazakis; I. Mavroidis; F. Chaix; N. Kallimanis; Javier Navaridas; John Goodacre; P. Vicini; Andrea Biagioni; Pier Stanislao Paolucci; A. Lonardo; Elena Pastorelli; F. Lo Cicero; Roberto Ammendola; P. Hopton; P. Coates; Giuliano Taffoni; Stefano Cozzini; Martin L. Kersten; Ying Zhang; Julio Sahuquillo; S. Lechago; Christian Pinto; B. Lietzow; D. Everett; G. Perna

ExaNest is one of three European projects that support a ground-breaking computing architecture for exascale-class systems built upon power-efficient 64-bit ARM processors. This group of projects share an everything-close and share-anything paradigm, which trims down the power consumption - by shortening the distance of signals for most data transfers - as well as the cost and footprint area of the installation - by reducing the number of devices needed to meet performance targets. In ExaNeSt, we will design and implement: (i) a physical rack prototype and its liquid-cooling subsystem providing ultra-dense compute packaging, (ii) a storage architecture with distributed (in-node) non-volatile memory (NVM) devices, (iii) a unified, low-latency interconnect, designed to efficiently uphold desired Quality-of-Service guarantees for a mix of storage with inter-processor flows, and (iv) efficient rack-level memory sharing, where each page is cacheable at only a single node. Our target is to test alternative storage and interconnect options on actual hardware, using real-world HPC applications. The ExaNeSt consortium brings together technology, skills, and knowledge across the entire value chain, from computing IP, packaging, and system deployment, all the way up to operating systems, storage, HPC, big data frameworks, and cutting-edge applications.

Journal of Instrumentation | 2010

High-speed data transfer with FPGAs and QSFP+ modules

Roberto Ammendola; Andrea Biagioni; Giacomo Chiodi; Ottorino Frezza; F Lo Cicero; A. Lonardo; R Lunadei; Pier Stanislao Paolucci; D Rossetti; A. Salamon; G. Salina; Francesco Simula; Laura Tosoratto; P. Vicini

We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.

Journal of Physics: Conference Series | 2015

Hardware and Software Design of FPGA-based PCIe Gen3 interface for APEnet+ network interconnect system

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; F. Lo Cicero; A. Lonardo; Michele Martinelli; P.S. Paolucci; Elena Pastorelli; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

In the attempt to develop an interconnection architecture optimized for hybrid HPC systems dedicated to scientific computing, we designed APEnet+, a point-to-point, low-latency and high-performance network controller supporting 6 fully bidirectional off-board links over a 3D torus topology.The first release of APEnet+ (named V4) was a board based on a 40 nm Altera FPGA, integrating 6 channels at 34 Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. It has been the first-of-its-kind device to implement an RDMA protocol to directly read/write data from/to Fermi and Kepler NVIDIA GPUs using NVIDIA peer-to-peer and GPUDirect RDMA protocols, obtaining real zero-copy GPU-to-GPU transfers over the network.The latest generation of APEnet+ systems (now named V5) implements a PCIe Gen3 x8 host interface on a 28 nm Altera Stratix V FPGA, with multi-standard fast transceivers (up to 14.4 Gbps) and an increased amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols.Herein we present the APEnet+ V5 architecture, the status of its hardware and its system software design. Both its Linux Device Driver and the low-level libraries have been redeveloped to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware/software co-design.

nuclear science symposium and medical imaging conference | 2010

High speed data transfer with FPGAs and QSFP+modules

Roberto Ammendola; Andrea Biagioni; Giacomo Chiodi; Ottorino Frezza; Francesca Lo Cicero; A. Lonardo; Riccardo Lunadei; Pier Stanislao Paolucci; D Rossetti; A. Salamon; G. Salina; Francesco Simula; Laura Tosoratto; P. Vicini

We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.

nuclear science symposium and medical imaging conference | 2012

A 34 Gbps data transmission system with FPGAs embedded transceivers and QSFP+ modules

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; A. Lonardo; Pier Stanislao Paolucci; D Rossetti; Andrea Saiamon; Francesco Simula; Laura Tosoratto; P. Vicini

APEnet+ is our custom developed PCIe gen2 board based on an Altera Stratix IV FPGA. We demonstrate reliable usage of Alteras embedded transceivers coupled with QSFP+ (Quad Small Form Pluggable) technology. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We use embedded transceivers in a 4 lane configuration, each one capable of 8.5 Gbps, for an aggregate bandwidth of 34 Gpbs per link. On Stratix IV 290 we can place up to 6 bidirectional links, together with a PCIe gen2 ×8 hard IP. We describe design and implementation of this data transmission system.

Explore More