Francesca Lo Cicero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francesca Lo Cicero is active.

Explore More

Publication

Featured researches published by Francesca Lo Cicero.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect

Roberto Ammendola; Massimo Bernaschi; Andrea Biagioni; Mauro Bisson; Massimiliano Fatica; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Enrico Mastrostefano; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.

arXiv: Computational Physics | 2011

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; A. Salamon; G. Salina; Francesco Simula; Laura Tosoratto; P. Vicini

We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera® FPGA, are provided.

ieee international conference on high performance computing data and analytics | 2011

QUonG: A GPU-based HPC System Dedicated to LQCD Computing

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

QUonG is an INFN (Istituto Nazionale di Fisica Nucleare) initiative targeted to develop a high performance computing system dedicated to Lattice QCD computations. QUonG is a massively parallel computing platform that lever-ages on commodity multi-core processors coupled with last generation GPUs. Its network mesh exploits the characteristics of LQCD algorithm for the design of a point-to-point, high performance, low latency 3-d torus network to interconnect the computing nodes. The network is built upon the APE net+ project: it consists of an FPGA-based PCI Express board exposing six full bidirectional off-board links running at 34 Gbps each, and implementing RDMA protocol and an experimental direct network-to-GPU interface, enabling significant access latency reduction for inter-node data transfers. The final shape of a complete QUonG deployment is an assembly of standard 42U racks, each one capable of 60 TFlops/rack of peak performance, at a cost of 5 Ke/TFlops and for an estimated power consumption of 25 KW/rack. A first QUonG system prototype is expected to be delivered at the end of the year 2011.

field-programmable technology | 2013

Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; A. Lonardo; Pier Stanislao Paolucci; D Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

We developed a custom FPGA-based Network Interface Controller named APEnet+ aimed at GPU accelerated clusters for High Performance Computing. The card exploits peer-to-peer capabilities (GPU-Direct RDMA) for latest NVIDIA GPGPU devices and the RDMA paradigm to perform fast direct communication between computing nodes, offloading the host CPU from network tasks execution. In this work we focus on the implementation of a Virtual to Physical address translation mechanism, using the FPGA embedded soft-processor. Address management is the most demanding task - we estimated up to 70% of the μC load - for the NIC receiving side, resulting being the main culprit for data bottleneck. To improve the performance of this task and hence improve data transfer over the network, we added a specialized hardware logic block acting as a Translation Lookaside Buffer. This block makes use of a peculiar Content Address Memory implementation designed for scalability and speed. We present detailed measurements to demonstrate the benefits coming from the introduction of such custom logic: a substantial address translation latency reduction (from a measured value of 1.9 μs to 124 ns) and a performance enhancement of both host-bound and GPU-bound data transfers (up to ~ 60% of bandwidth increase) in given message size ranges.

Journal of Systems Architecture | 2016

Dynamic many-process applications on many-tile embedded systems and HPC clusters

Pier Stanislao Paolucci; Andrea Biagioni; Luis Gabriel Murillo; Frédéric Rousseau; Lars Schor; Laura Tosoratto; Iuliana Bacivarov; Robert Lajos Buecs; Clément Deschamps; Ashraf El-Antably; Roberto Ammendola; Nicolas Fournel; Ottorino Frezza; Rainer Leupers; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Elena Pastorelli; Devendra Rai; Davide Rossetti; Francesco Simula; Lothar Thiele; P. Vicini; Jan Henrik Weinstock

In the next decade, a growing number of scientific and industrial applications will require power-efficient systems providing unprecedented computation, memory, and communication resources. A promising paradigm foresees the use of heterogeneous many-tile architectures. The resulting computing systems are complex: they must be protected against several sources of faults and critical events, and application programmers must be provided with programming paradigms, software environments and debugging tools adequate to manage such complexity. The EURETILE (European Reference Tiled Architecture Experiment) consortium conceived, designed, and implemented: 1- an innovative many-tile, many-process dynamic fault-tolerant programming paradigm and software environment, grounded onto a lightweight operating system generated by an automated software synthesis mechanism that takes into account the architecture and application specificities; 2- a many-tile heterogeneous hardware system, equipped with a high-bandwidth, low-latency, point-to-point 3D-toroidal interconnect. The inter-tile interconnect processor is equipped with an experimental mechanism for systemic fault-awareness; 3- a full-system simulation environment, supported by innovative parallel technologies and equipped with debugging facilities. We also designed and coded a set of application benchmarks representative of requirements of future HPC and Embedded Systems, including: 4- a set of dynamic multimedia applications and 5- a large scale simulator of neural activity and synaptic plasticity. The application benchmarks, compiled through the EURETILE software tool-chain, have been efficiently executed on both the many-tile hardware platform and on the software simulator, up to a complexity of a few hundreds of software processes and hardware cores.

Future Generation Computer Systems | 2015

A hierarchical watchdog mechanism for systemic fault awareness on distributed systems

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

Systemic fault tolerance is usually pursued with a number of strategies, like redundancy and checkpoint/restart; any of them needs to be triggered by safe and fast fault detection. We devised a hardware/software approach to fault detection that enables a system-level Fault Awareness by implementing a hierarchical Mutual Watchdog. It relies on an improved high performance Network Interface Card (NIC), implementing an n -dimensional mesh topology and a Service Network. The hierarchical watchdog mechanism is able to quickly detect faults on each node, as the Host and the high performance NIC guard each other while every node monitors its own first neighbours in the mesh. Duplicated and distributed Supervisor Nodes receive communication by means of diagnostic messages routed through either the Service Network or the N -dimensional Network, then assemble a global picture of the system status. In this way our approach allows achieving a Fault Awareness with no-single-point-of-failure. We describe an implementation of this hardware/software co-design for our high performance 3D torus NIC, with a focus on how routed diagnostic messages do not affect the system performances. We approach fault tolerance for distributed systems from fault detection and awareness.We propose a HW/SW mechanism based on a mutual watchdog mechanism between Host and NIC.A double diagnostic message path leads to resilient systemic fault awareness.Our tool can interface fault reaction/recovery systems to trigger them automatically.Our mechanism has no impact on system performance.

arXiv: Instrumentation and Detectors | 2014

NaNet: a low-latency NIC enabling GPU-based, real-time low level trigger systems

Roberto Ammendola; Andrea Biagioni; R. Fantechi; Ottorino Frezza; G. Lamanna; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; F. Pantaleo; R. Piandani; L. Pontisso; Davide Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

We implemented the NaNet FPGA-based PCIe Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.

Scientific Programming | 2017

Power-Efficient Computing: Experiences from the COSA Project

Daniele Cesini; Elena Corni; Antonio Falabella; Andrea Ferraro; Lucia Morganti; Enrico Calore; Sebastiano Fabio Schifano; Michele Michelotto; Roberto Alfieri; Roberto De Pietri; Tommaso Boccali; Andrea Biagioni; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; P. Vicini

Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of processors with a high ratio of performance per watt is necessary to understand how to realize energy-efficient computing systems for scientific applications, using this class of processors. Computing On SOC Architecture (COSA) is a three-year project (2015–2017) funded by the Scientific Commission V of the Italian Institute for Nuclear Physics (INFN), which aims to investigate the performance and the total cost of ownership offered by computing systems based on commodity low-power Systems on Chip (SoCs) and high energy-efficient systems based on GP-GPUs. In this work, we present the results of the project analyzing the performance of several scientific applications on several GPU- and SoC-based systems. We also describe the methodology we have used to measure energy performance and the tools we have implemented to monitor the power drained by applications while running.

nuclear science symposium and medical imaging conference | 2010

High speed data transfer with FPGAs and QSFP+modules

Roberto Ammendola; Andrea Biagioni; Giacomo Chiodi; Ottorino Frezza; Francesca Lo Cicero; A. Lonardo; Riccardo Lunadei; Pier Stanislao Paolucci; D Rossetti; A. Salamon; G. Salina; Francesco Simula; Laura Tosoratto; P. Vicini

We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.

reconfigurable computing and fpgas | 2013

Design and implementation of a modular, low latency, fault-aware, FPGA-based network interface

Roberto Ammendola; Andrea Biagionil; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

We describe the hands-on experience in developing a network-centric IP core supporting the RDMA protocol which is the engine of an FPGA-based PCIe NIC targeted for GPU-accelerated HPC clusters with a 3D-toroidal network topology. We report on different development areas related to our IP: the optimizations required to evolve the NIC to the current performance level (highlights of this work include the development of a RDMA engine with a dedicated translation-lookaside-buffer and a first-of-its-kind IP module that exploits the peer-to-peer protocol of NVIDIA GPUs); the addition of a component called LO|FA|MO IP that provides systemic fault-awareness to the network; the modifications to the core IP to turn it into low-latency interface called NaNet between a read-out board and a GPU farm in the data acquisition system of the low level trigger of a particle-physics experiment. Taking into account the forecast evolution of the FPGA platform (28 nm, PCIe Gen3, etc.), we conclude with future directions we envision for our IP.

Explore More