Elena Pastorelli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elena Pastorelli is active.

Explore More

Publication

Featured researches published by Elena Pastorelli.

Journal of Systems Architecture | 2016

Dynamic many-process applications on many-tile embedded systems and HPC clusters

Pier Stanislao Paolucci; Andrea Biagioni; Luis Gabriel Murillo; Frédéric Rousseau; Lars Schor; Laura Tosoratto; Iuliana Bacivarov; Robert Lajos Buecs; Clément Deschamps; Ashraf El-Antably; Roberto Ammendola; Nicolas Fournel; Ottorino Frezza; Rainer Leupers; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Elena Pastorelli; Devendra Rai; Davide Rossetti; Francesco Simula; Lothar Thiele; P. Vicini; Jan Henrik Weinstock

In the next decade, a growing number of scientific and industrial applications will require power-efficient systems providing unprecedented computation, memory, and communication resources. A promising paradigm foresees the use of heterogeneous many-tile architectures. The resulting computing systems are complex: they must be protected against several sources of faults and critical events, and application programmers must be provided with programming paradigms, software environments and debugging tools adequate to manage such complexity. The EURETILE (European Reference Tiled Architecture Experiment) consortium conceived, designed, and implemented: 1- an innovative many-tile, many-process dynamic fault-tolerant programming paradigm and software environment, grounded onto a lightweight operating system generated by an automated software synthesis mechanism that takes into account the architecture and application specificities; 2- a many-tile heterogeneous hardware system, equipped with a high-bandwidth, low-latency, point-to-point 3D-toroidal interconnect. The inter-tile interconnect processor is equipped with an experimental mechanism for systemic fault-awareness; 3- a full-system simulation environment, supported by innovative parallel technologies and equipped with debugging facilities. We also designed and coded a set of application benchmarks representative of requirements of future HPC and Embedded Systems, including: 4- a set of dynamic multimedia applications and 5- a large scale simulator of neural activity and synaptic plasticity. The application benchmarks, compiled through the EURETILE software tool-chain, have been efficiently executed on both the many-tile hardware platform and on the software simulator, up to a complexity of a few hundreds of software processes and hardware cores.

Journal of Instrumentation | 2016

NaNet-10: A 10GbE network interface card for the GPU-based low-level trigger of the NA62 RICH detector

Roberto Ammendola; Andrea Biagioni; M. Fiorini; Ottorino Frezza; A. Lonardo; G. Lamanna; F. Lo Cicero; Michele Martinelli; Ilaria Neri; P.S. Paolucci; Elena Pastorelli; R. Piandani; L. Pontisso; Davide Rossetti; Francesco Simula; M. Sozzi; Laura Tosoratto; P. Vicini

A GPU-based low level (L0) trigger is currently integrated in the experimental setup of the RICH detector of the NA62 experiment to assess the feasibility of building more refined physics-related trigger primitives and thus improve the trigger discriminating power. To ensure the real-time operation of the system, a dedicated data transport mechanism has been implemented: an FPGA-based Network Interface Card (NaNet-10) receives data from detectors and forwards them with low, predictable latency to the memory of the GPU performing the trigger algorithms. Results of the ring-shaped hit patterns reconstruction will be reported and discussed.

Scientific Programming | 2017

Power-Efficient Computing: Experiences from the COSA Project

Daniele Cesini; Elena Corni; Antonio Falabella; Andrea Ferraro; Lucia Morganti; Enrico Calore; Sebastiano Fabio Schifano; Michele Michelotto; Roberto Alfieri; Roberto De Pietri; Tommaso Boccali; Andrea Biagioni; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; P. Vicini

Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of processors with a high ratio of performance per watt is necessary to understand how to realize energy-efficient computing systems for scientific applications, using this class of processors. Computing On SOC Architecture (COSA) is a three-year project (2015–2017) funded by the Scientific Commission V of the Italian Institute for Nuclear Physics (INFN), which aims to investigate the performance and the total cost of ownership offered by computing systems based on commodity low-power Systems on Chip (SoCs) and high energy-efficient systems based on GP-GPUs. In this work, we present the results of the project analyzing the performance of several scientific applications on several GPU- and SoC-based systems. We also describe the methodology we have used to measure energy performance and the tools we have implemented to monitor the power drained by applications while running.

Journal of Physics: Conference Series | 2015

Hardware and Software Design of FPGA-based PCIe Gen3 interface for APEnet+ network interconnect system

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; F. Lo Cicero; A. Lonardo; Michele Martinelli; P.S. Paolucci; Elena Pastorelli; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

In the attempt to develop an interconnection architecture optimized for hybrid HPC systems dedicated to scientific computing, we designed APEnet+, a point-to-point, low-latency and high-performance network controller supporting 6 fully bidirectional off-board links over a 3D torus topology.The first release of APEnet+ (named V4) was a board based on a 40 nm Altera FPGA, integrating 6 channels at 34 Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. It has been the first-of-its-kind device to implement an RDMA protocol to directly read/write data from/to Fermi and Kepler NVIDIA GPUs using NVIDIA peer-to-peer and GPUDirect RDMA protocols, obtaining real zero-copy GPU-to-GPU transfers over the network.The latest generation of APEnet+ systems (now named V5) implements a PCIe Gen3 x8 host interface on a 28 nm Altera Stratix V FPGA, with multi-standard fast transceivers (up to 14.4 Gbps) and an increased amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols.Herein we present the APEnet+ V5 architecture, the status of its hardware and its system software design. Both its Linux Device Driver and the low-level libraries have been redeveloped to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware/software co-design.

digital systems design | 2017

The Next Generation of Exascale-Class Systems: The ExaNeSt Project

Roberto Ammendola; Andrea Biagioni; Paolo Cretaro; Ottorino Frezza; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; Francesco Simula; P. Vicini; Giuliano Taffoni; Jose Antonio Pascual; Javier Navaridas; Mikel Luján; John Goodacree; Nikolaos Chrysos; Manolis Katevenis

The ExaNeSt project started on December 2015 and is funded by EU H2020 research framework (call H2020-FETHPC-2014, n. 671553) to study the adoption of low-cost, Linux-based power-efficient 64-bit ARM processors clusters for Exascale-class systems. The ExaNeSt consortium pools partners with industrial and academic research expertise in storage, interconnects and applications that share a vision of an Euro-pean Exascale-class supercomputer. Their goal is designing and implementing a physical rack prototype together with its cooling system, the storage non-volatile memory (NVM) architecture and a low-latency interconnect able to test different options for interconnection and storage. Furthermore, the consortium is to provide real HPC applications to validate the system. Herein we provide a status report of the project initial developments.

Journal of Physics: Conference Series | 2015

A multi-port 10GbE PCIe NIC featuring UDP offload and GPUDirect capabilities.

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; G. Lamanna; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; L. Pontisso; Davide Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

NaNet-10 is a four-ports 10GbE PCIe Network Interface Card designed for low-latency real-time operations with GPU systems. To this purpose the design includes an UDP ooad module, for fast and clock-cycle deterministic handling of the transport layer protocol, plus a GPUDirect P2P/RDMA engine for low-latency communication with NVIDIA Tesla GPU devices. A dedicated module (Multi-Stream) can optionally process input UDP streams before data is delivered through PCIe DMA to their destination devices, re-organizing data from dierent streams guaranteeing computational optimization. NaNet-10 is going to be integrated in the NA62 CERN experiment in order to assess the suitability of GPGPU systems as real-time triggers; results and lessons learned while performing this activity will be reported herein.

Journal of Instrumentation | 2015

Architectural improvements and technological enhancements for the APEnet+ interconnect system

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; A. Lonardo; F Lo Cicero; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; Davide Rossetti; Francesco Simula; Laura Tosoratto; P. Vicini

The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPUs to the communication fabric. For the APEnet v5 board we show characterizing figures as achieved bandwidth and BER obtained by exploiting new high performance ALTERA transceivers and PCIe Gen3 compliancy.

Journal of Instrumentation | 2017

Development of Network Interface Cards for TRIDAQ systems with the NaNet framework

Roberto Ammendola; Andrea Biagioni; Paolo Cretaro; S. Di Lorenzo; M. Fiorini; Ottorino Frezza; G. Lamanna; F. Lo Cicero; A. Lonardo; Michele Martinelli; Ilaria Neri; P.S. Paolucci; Elena Pastorelli; R. Piandani; L. Pontisso; Davide Rossetti; Francesco Simula; M. Sozzi; P. Valente; P. Vicini

NaNet is a framework for the development of FPGA-based PCI Express (PCIe) Network Interface Cards (NICs) with real-time data transport architecture that can be effectively employed in TRIDAQ systems. Key features of the architecture are the flexibility in the configuration of the number and kind of the I/O channels, the hardware offloading of the network protocol stack, the stream processing capability, and the zero-copy CPU and GPU Remote Direct Memory Access (RDMA). Three NIC designs have been developed with the NaNet framework: NaNet-1 and NaNet-10 for the CERN NA62 low level trigger and NaNet3 for the KM3NeT-IT underwater neutrino telescope DAQ system. We will focus our description on the NaNet-10 design, as it is the most complete of the three in terms of capabilities and integrated IPs of the framework.

nuclear science symposium and medical imaging conference | 2015

NaNet: Design of FPGA-based network interface cards for real-time trigger and data acquisition systems in HEP experiments

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; G. Lamanna; F. Lo Cicero; A. Lonardo; Michele Martinelli; Pier Stanislao Paolucci; Elena Pastorelli; L. Pontisso; Davide Rossetti; Francesco Simula; M. Sozzi; Laura Tosoratto; P. Vicini

NaNet is a modular design of a family of FPGA-based PCIe Network Interface Cards specialized for low-latency real-time operations. NaNet features a Network Interface module that implements RDMA-style communications both with the host (CPU) and the GPU accelerators memories (GPUDirect P2P/RDMA) relying on the services of a high performance PCIe Gen3 x8 core. NaNet I/O Interface is highly flexible and is designed for low and predictable communication latency: a dedicated stage manages the network stack protocol in the FPGA logic offloading the host operating system from this task and thus eliminating the associated process jitter effects. Between the two aforementioned modules, stand the data processing and switch modules: the first implements application-dependent processing on streams - e.g. performing compression algorithms - while the second routes data streams between the I/O channels and the Network Interface module. This general architecture has been specialized up to now into three configurations, namely NaNet-1, NaNet3 and NaNet-10 in order to meet the requirements of different experimental setups: NaNet-1 features a GbE channel plus three custom 34 Gbps serial channels and is implemented on the Altera Stratix IV FPGA Development Kit; NaNet3 is implemented on the Terasic DE5-NET Stratix V FPGA development board and supports four custom 2.5 Gbps deterministic latency optical channels; NaNet-10 features four 10GbE SFP+ ports and is also implemented on the Terasic DE5-NET board. We will provide performance results for the three NaNet implementations and describe their usage in the CERN NA62 and KM3NeT-IT underwater neutrino telescope experiments, showing that the architecture is very flexible and yet capable of matching the requirements of low-latency real-time applications with intensive I/O tasks involving the CPU and/or the GPU accelerators.

Journal of Physics: Conference Series | 2018

Real-time heterogeneous stream processing with NaNet in the NA62 experiment

Roberto Ammendola; M Barbanera; Andrea Biagioni; Paolo Cretaro; Ottorino Frezza; G. Lamanna; F Lo Cicero; A. Lonardo; Michele Martinelli; Elena Pastorelli; P.S. Paolucci; R. Piandani; L. Pontisso; D Rossetti; Francesco Simula; M. Sozzi; P. Valente; P. Vicini

The use of GPUs to implement general purpose computational tasks, known as GPGPU since fifteen years ago, has reached maturity. Applications take advantage of the parallel architectures of these devices in many different domains. Over the last few years several works have demonstrated the effectiveness of the integration of GPU-based systems in the high level trigger of various HEP experiments. On the other hand, the use of GPUs in the DAQ and low level trigger systems, characterized by stringent real-time constraints, poses several challenges. In order to achieve such a goal we devised NaNet, a FPGA-based PCI-Express Network Interface Card design capable of direct (zero-copy) data transferring with CPU and GPU (GPUDirect) while online processing incoming and outgoing data streams. The board provides as well support for multiple link technologies (1/10/40GbE and custom ones). The validity of our approach has been tested in the context of the NA62 CERN experiment, harvesting the computing power of last generation NVIDIA Pascal GPUs and of the FPGA hosted by NaNet to build in real-time refined physics-related primitives for the RICH detector (i.e. the Cerenkov rings parameters) that enable the building of more stringent conditions for data selection in the low level trigger.

Explore More