F. Pantaleo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where F. Pantaleo is active.

Explore More

Publication

Featured researches published by F. Pantaleo.

Journal of Instrumentation | 2014

NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

Roberto Ammendola; Andrea Biagioni; Ottorino Frezza; G. Lamanna; A. Lonardo; F Lo Cicero; Pier Stanislao Paolucci; F. Pantaleo; D Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.

arXiv: Instrumentation and Detectors | 2014

NaNet: a low-latency NIC enabling GPU-based, real-time low level trigger systems

Roberto Ammendola; Andrea Biagioni; R. Fantechi; Ottorino Frezza; G. Lamanna; Francesca Lo Cicero; Alessandro Lonardo; Pier Stanislao Paolucci; F. Pantaleo; R. Piandani; L. Pontisso; Davide Rossetti; Francesco Simula; Marco S. Sozzi; Laura Tosoratto; P. Vicini

We implemented the NaNet FPGA-based PCIe Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.

Journal of Physics: Conference Series | 2011

Parallelization of maximum likelihood fits with OpenMP and CUDA

Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak; F. Pantaleo

Data analyses based on maximum likelihood fits are commonly used in the high energy physics community for fitting statistical models to data samples. This technique requires the numerical minimization of the negative log-likelihood function. MINUIT is the most common package used for this purpose in the high energy physics community. The main algorithm in this package, MIGRAD, searches the minimum by using the gradient information. The procedure requires several evaluations of the function, depending on the number of free parameters and their initial values. The whole procedure can be very CPU-time consuming in case of complex functions, with several free parameters, many independent variables and large data samples. Therefore, it becomes particularly important to speed-up the evaluation of the negative log-likelihood function. In this paper we present an algorithm and its implementation which benefits from data vectorization and parallelization (based on OpenMP) and which was also ported to Graphics Processing Units using CUDA.

2012 13th International Workshop on Cellular Nanoscale Networks and their Applications | 2012

Real-time use of GPUs in NA62 experiment

Gianmaria Collazuol; V. Innocente; Gianluca Lamanna; F. Pantaleo; M. Sozzi

We describe a pilot project for the use of GPUs in a real-time triggering application in the early trigger stages at the CERN NA62 experiment, and the results of the first field tests together with a prototype data acquisition (DAQ) system. This pilot project within NA62 aims at integrating GPUs into the central L0 trigger processor, and also to use them as fast online processors for computing trigger primitives. Several TDC-equipped sub-detectors with sub-nanosecond time resolution will participate in the first-level NA62 trigger (L0), fully integrated with the data-acquisition system, to reduce the readout rate of all sub-detectors to 1 MHz, using multiplicity information asynchronously computed over time frames of a few ns, both for positive sub-detectors and for vetos. The online use of GPUs would allow the computation of more complex trigger primitives already at this first trigger level. We describe the architectures of the proposed systems, focusing on measuring the performance (both throughput and latency) of various approaches meant to solve these high energy physics problems. The challenges and the prospects of this promising idea are discussed.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Evaluation of Likelihood Functions for Data Analysis on Graphics Processing Units

Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak; F. Pantaleo

Data analysis techniques based on likelihood function calculation play a crucial role in many High Energy Physics measurements. Depending on the complexity of the models used in the analyses, with several free parameters, many independent variables, large data samples, and complex functions, the calculation of the likelihood functions can require a long CPU execution time. In the past, the continuous gain in performance for each single CPU core kept pace with the increase on the complexity of the analyses, maintaining reasonable the execution time of the sequential software applications. Nowadays, the performance for single cores is not increasing as in the past, while the complexity of the analyses has grown significantly in the Large Hadron Collider era. In this context a breakthrough is represented by the increase of the number of computational cores per computational node. This allows to speed up the execution of the applications, redesigning them with parallelization paradigms. The likelihood function evaluation can be parallelized using data and task parallelism, which are suitable for CPUs and GPUs (Graphics Processing Units), respectively. In this paper we show how the likelihood function evaluation has been parallelized on GPUs. We describe the implemented algorithm and we give some performance results when running typical models used in High Energy Physics measurements. In our implementation we achieve a good scaling with respect to the number of events of the data samples.

Explore More