Marcello Pivanti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcello Pivanti is active.

Explore More

Publication

Featured researches published by Marcello Pivanti.

international conference on conceptual structures | 2013

Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-phi Co-Processor

G. Crimi; F. Mantovani; Marcello Pivanti; Sebastiano Fabio Schifano; R. Tripiccione

Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Boltzmann (LB) code on the Xeon-Phi co-processor, the first generally available version of the new Many Integrated Core (MIC) architecture, developed by Intel. We consider as a test-bed a state-of-the-art LB model, that accurately reproduces the thermo-hydrodynamics of a 2D- fluid obeying the equations of state of a perfect gas. The regular structure of LB algorithms makes it relatively easy to identify a large degree of available parallelism. However, mapping a large fraction of this parallelism onto this new class of processors is not straightforward. The D2Q37 LB algorithm considered in this paper is an appropriate test-bed for this architecture since the critical computing kernels require high performances both in terms of memory bandwidth for sparse memory access patterns and number crunching capability. We describe our implementation of the code, that builds on previous experience made on other (simpler) many-core processors and GPUs, present benchmark results and measure performances, and finally compare with the results obtained by previous implementations developed on state-of-the-art classic multi-core CPUs and GP-GPUs.

Physical Review B | 2013

Critical parameters of the three-dimensional Ising spin glass

Marco Baity-Jesi; Raquel A. Baños; A. Cruz; L. A. Fernandez; J. M. Gil-Narvion; A. Gordillo-Guerrero; D. Iñiguez; A. Maiorano; F. Mantovani; Enzo Marinari; V. Martin-Mayor; J. Monforte-Garcia; A. Muñoz Sudupe; D. Navarro; Giorgio Parisi; S. Perez-Gaviro; Marcello Pivanti; Federico Ricci-Tersenghi; J. J. Ruiz-Lorenzo; Sebastiano Fabio Schifano; B. Seoane; A. Tarancón; R. Tripiccione; D. Yllanes

We report a high-precision finite-size scaling study of the critical behavior of the three-dimensional Ising Edwards-Anderson model (the Ising spin glass). We have thermalized lattices up to L = 40 using the Janus dedicated computer. Our analysis takes into account leading-order corrections to scaling. We obtain Tc = 1.1019(29) for the critical temperature, ν = 2.562(42) for the thermal exponent, η = −0.3900(36) for the anomalous dimension, and ω = 1.12(10) for the exponent of the leading corrections to scaling. Standard (hyper)scaling relations yield α = −5.69(13), β = 0.782(10), and γ = 6.13(11). We also compute several universal quantities at Tc.

European Physical Journal-special Topics | 2012

Reconfigurable computing for Monte Carlo simulations: results and prospects of the Janus project

Marco Baity-Jesi; Raquel A. Baños; A. Cruz; L. A. Fernandez; J. M. Gil-Narvion; A. Gordillo-Guerrero; M. Guidetti; D. Iñiguez; A. Maiorano; F. Mantovani; Enzo Marinari; V. Martin-Mayor; J. Monforte-Garcia; A. Muñoz Sudupe; D. Navarro; Giorgio Parisi; Marcello Pivanti; S. Perez-Gaviro; Federico Ricci-Tersenghi; J. J. Ruiz-Lorenzo; Sebastiano Fabio Schifano; B. Seoane; A. Tarancón; P. Tellez; R. Tripiccione; D. Yllanes

We describe Janus, a massively parallel FPGA-based computer optimized for the simulation of spin glasses, theoretical models for the behavior of glassy materials. FPGAs (as compared to GPUs or many-core processors) provide a complementary approach to massively parallel computing. In particular, our model problem is formulated in terms of binary variables, and floating-point operations can be (almost) completely avoided. The FPGA architecture allows us to run many independent threads with almost no latencies in memory access, thus updating up to 1024 spins per cycle. We describe Janus in detail and we summarize the physics results obtained in four years of operation of this machine; we discuss two types of physics applications: long simulations on very large systems (which try to mimic and provide understanding about the experimental non-equilibrium dynamics), and low-temperature equilibrium simulations using an artificial parallel tempering dynamics. The time scale of our non-equilibrium simulations spans eleven orders of magnitude (from picoseconds to a tenth of a second). On the other hand, our equilibrium simulations are unprecedented both because of the low temperatures reached and for the large systems that we have brought to equilibrium. A finite-time scaling ansatz emerges from the detailed comparison of the two sets of simulations. Janus has made it possible to perform spin-glass simulations that would take several decades on more conventional architectures. The paper ends with an assessment of the potential of possible future versions of the Janus architecture, based on state-of-the-art technology.

arXiv: High Energy Physics - Lattice | 2010

QPACE - a QCD parallel computer based on Cell processors

H. Baier; Hans Boettiger; C. Gomez; Dirk Pleiter; Nils Meyer; A. Nobile; Zoltan Fodor; Joerg-Stephan Vogt; K.-H. Sulanke; Simon Heybrock; Frank Winter; U. Fischer; T. Maurer; Thomas Huth; Ibrahim A. Ouda; M. Drochner; Heiko Schick; F. Schifano; A. Schäfer; H. Simma; J. Lauritsen; Norbert Eicker; Marcello Pivanti; Matthias Husken; Thomas Streuer; Gottfried Goldrian; Tilo Wettig; Thomas Lippert; Dieter Hierl; Benjamin Krill

QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops.

symposium on computer architecture and high performance computing | 2013

Benchmarking GPUs with a Parallel Lattice-Boltzmann Code

Jiri Kraus; Marcello Pivanti; Sebastiano Fabio Schifano; R. Tripiccione; Marco Zanella

Accelerators are an increasingly common option to boost performance of codes that require extensive number crunching. In this paper we report on our experience with NVIDIA accelerators to study fluid systems using the Lattice Boltzmann (LB) method. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism, such as recent multi- and many-core processors and GPUs; however, the challenge of exploiting a large fraction of the theoretically available performance of this new class of processors is not easily met. We consider a state-of-theart two-dimensional LB model based on 37 populations (a D2Q37 model), that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. The computational features of this model make it a significant benchmark to analyze the performance of new computational platforms, since critical kernels in this code require both high memory-bandwidth on sparse memory addressing patterns and floating-point throughput. In this paper we consider two recent classes of GPU boards based on the Fermi and Kepler architectures; we describe in details all steps done to implement and optimize our LB code and analyze its performance first on single- GPU systems, and then on parallel multi-GPU systems based on one node as well as on a cluster of many nodes; in the latter case we use CUDA-aware MPI as an abstraction layer to assess the advantages of advanced GPU-to-GPU communication technologies like GPUDirect. On our implementation, aggregate sustained performance of the most compute intensive part of the code breaks the 1 double-precision Tflops barrier on a single-host system with two GPUs.

parallel processing and applied mathematics | 2011

A Multi-GPU implementation of a d2q37 lattice boltzmann code

Luca Biferale; F. Mantovani; Marcello Pivanti; Fabio Pozzati; Mauro Sbragaglia; Andrea Scagliarini; Sebastiano Fabio Schifano; Federico Toschi; R. Tripiccione

We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster.

Computer Physics Communications | 2014

Janus II: A new generation application-driven computer for spin-system simulations

This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures – that can be implemented with available electronics technologies – may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.

Physical Review E | 2014

Dynamical transition in the D=3 Edwards-Anderson spin glass in an external magnetic field.

We study the off-equilibrium dynamics of the three-dimensional Ising spin glass in the presence of an external magnetic field. We have performed simulations both at fixed temperature and with an annealing protocol. Thanks to the Janus special-purpose computer, based on field-programmable gate array (FPGAs), we have been able to reach times equivalent to 0.01 s in experiments. We have studied the system relaxation both for high and for low temperatures, clearly identifying a dynamical transition point. This dynamical temperature is strictly positive and depends on the external applied magnetic field. We discuss different possibilities for the underlying physics, which include a thermodynamical spin-glass transition, a mode-coupling crossover, or an interpretation reminiscent of the random first-order picture of structural glasses.

international conference on parallel processing | 2013

An Optimized Lattice Boltzmann Code for BlueGene/Q

Marcello Pivanti; F. Mantovani; Sebastiano Fabio Schifano; R. Tripiccione; Luca Zenesini

In this paper we describe an optimized implementation of a Lattice Boltzmann (LB) code on the BlueGene/Q system, the latest generation massively parallel system of the BlueGene family. We consider a state-of-art LB code, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure of LB algorithms offers several levels of algorithmic parallelism that can be matched by a massively parallel computer architecture. However the complex memory access patterns associated to our LB model make it not trivial to efficiently exploit all available parallelism. We describe our implementation strategies, based on previous experience made on clusters of many-core processors and GPUs, present results and analyze and compare performances.

programmable devices and embedded systems | 2013

The Janus project: boosting spin-glass simulations using FPGAs

Marco Baity-Jesi; Raquel A. Baños; A. Cruz; L. A. Fernandez; J. M. Gil-Narvion; A. Gordillo-Guerrero; D. Iñiguez; A. Maiorano; F. Mantovani; E. Marinari; V. Martin-Mayor; J. Monforte-Garcia; A. Muñoz Sudupe; D. Navarro; Giorgio Parisi; S. Perez-Gaviro; Marcello Pivanti; Federico Ricci-Tersenghi; J. J. Ruiz-Lorenzo; Sebastiano Fabio Schifano; B. Seoane; A. Tarancón; R. Tripiccione; D. Yllanes

Abstract Spin-glasses have become one of the most computing-demanding problems of the last 50 years in Statistical Physics. These extremely slow systems represent a clear example of an easy-to-describe but hard-to-simulate numerical problem. We have developed an FPGAs architecture, called Janus, able to exploit the simplicity of the problem by an extensive parallelization of the computing units. In this work we describe the architecture after motivating the problem. We give the performance figures compared with other more usual architectures. We have obtained a clear advantage in terms of computing power which produced several top results in the field. In addition, we describe the current development of the next generation of the infrastructure: Janus II.

Explore More