Is this you? Create Your Porfile

Davide Del Vento

National Center for Atmospheric Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Davide Del Vento is active.

Explore More

Publication

Featured researches published by Davide Del Vento.

Scientific Programming | 2014

Collective mind: Towards practical and collaborative auto-tuning

Grigori Fursin; Renato Miceli; Anton Lokhmotov; Michael Gerndt; Marc Baboulin; Allen D. Malony; Zbigniew Chamski; Diego Novillo; Davide Del Vento

Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changing and complex software and hardware stack, large and multi-dimensional optimization spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material. We present a possible collaborative approach to solve above problems using Collective Mind knowledge management system. In contrast with previous cTuning framework, this modular infrastructure allows to preserve and share through the Internet the whole auto-tuning setups with all related artifacts and their software and hardware dependencies besides just performance data. It also allows to gradually structure, systematize and describe all available research material including tools, benchmarks, data sets, search strategies and machine learning models. Researchers can take advantage of shared components and data with extensible meta-description to quickly and collaboratively validate and improve existing auto-tuning and benchmarking techniques or prototype new ones. The community can now gradually learn and improve complex behavior of all existing computer systems while exposing behavior anomalies or model mispredictions to an interdisciplinary community in a reproducible way for further analysis. We present several practical, collaborative and model-driven auto-tuning scenarios. We also decided to release all material at c-mind.org/repo to set up an example for a collaborative and reproducible research as well as our new publication model in computer engineering where experimental results are continuously shared and validated by the community.

ieee international conference on high performance computing data and analytics | 2011

System-level monitoring of floating-point performance to improve effective system utilization

Davide Del Vento; David L. Hart; Thomas Engel; Rory Kelly; Richard A. Valent; Siddhartha S. Ghosh; Si Liu

NCARs Bluefire supercomputer is instrumented with a set of low-overhead processes that continually monitor the floating point counters of its 3,840 batch-compute cores. We extract performance numbers for each batch job by correlating the data from corresponding nodes. From experience and heuristics for good performance, we use this data, in part, to identify poorly performing jobs and then work with the users to improve their jobs efficiency. Often, the solution involves simple steps such as spawning an adequate number of processes or threads, binding the processes or threads to cores, using large memory pages, or using adequate compiler optimization. These efforts typically result in performance improvements and a wall-clock runtime reduction of 10% to 20%. With more involved changes to codes and scripts, some users have obtained performance improvements of 40% to 90%. We discuss our instrumentation, some successful cases, and its general applicability to other systems.

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure | 2015

Advanced user environment design and implementation on integrated multi-architecture supercomputers

Rory Kelly; Si Liu; Siddhartha S. Ghosh; Davide Del Vento; David L. Hart; Dan Nagle; B. J. Smith; Richard A. Valent

Scientists and engineers using supercomputer clusters should be able to focus on their scientific and technical work instead of worrying about operating their user environment. However, creating a convenient and effective user environment on modern supercomputers becomes more and more challenging due to the complexity of these large-scale systems. In this report, we discuss important design issues and goals in user environment that must support multiple compiler suites, various applications, and diverse libraries on heterogeneous computing architectures. We present our implementation on the latest high-performance computing system, Yellowstone, which is a powerful dedicated resource for earth system science deployed by the National Center for Atmospheric Research. Our newly designed user environment is built upon a hierarchical module structure, customized wrapper scripts, pre-defined system modules, Lmod modules implementation, and several creative tools. The resulting implementation realizes many great features including streamlined control, versioning, user customization, automated documentation, etc., and accommodates both novice and experienced users. The design and implementation also minimize the effort of the administrator and support team in managing users environment. The smooth application and positive feedback from our users demonstrate that our design and implementation on the Yellowstone system have been well accepted and have facilitated thousands of users all over the world.

Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era | 2012

Performance optimization on a supercomputer with cTuning and the PGI compiler

Davide Del Vento

In this paper we show a machine learning based implementation of autotuning, built with the cTuning CC framework. We implemented the PGI compiler in the cTuning CC framework, plugged in a few additional benchmarks and tested it on a Cray XT5m supercomputer. The main contribution of the present paper consists in combining existing autotuning techniques and using them with the PGI production compiler. Although not ready for production workflows yet, our results are encouraging.In this paper we show a machine learning based implementation of autotuning, built with the cTuning CC framework. We implemented the PGI compiler in the cTuning CC framework, plugged in a few additional benchmarks and tested it on a Cray XT5m supercomputer. The main contribution of the present paper consists in combining existing autotuning techniques and using them with the PGI production compiler. Although not ready for production workflows yet, our results are encouraging.

ieee international conference on high performance computing data and analytics | 2011

The NWSC benchmark suite using scientific throughput to measure supercomputer performance

Rory Kelly; Davide Del Vento; Siddartha S. Ghosh; Richard A. Valent; Si Liu

The NCAR-Wyoming Supercomputing Center (NWSC) will begin operating in June 2012, and will house NCARs next generation HPC system. The NWSC will support a broad spectrum of Earth Science research drawn from a user community with diverse requirements for computing, storage, and data analysis resources. To ensure that the NWSC satisfies the needs of this community, the procurement benchmarking process was driven by science requirements from the start. We will discuss the science objectives for NWSC, translating scientific goals into technical requirements for a machine, and assembling a benchmark suite from community science models and synthetic tests to measure the technical capabilities of the proposed HPC systems. We will also talk about the benchmark analysis process, extending the benchmark suite as a testing tool over the life of the machine, and the applicability of the NWSC benchmarking suite to other HPC centers.

parallel computing | 2018

Notified Access in Coarray-based Hydrodynamics Applications on Many-Core Architectures: Design and Performance

Alessandro Fanfarillo; Davide Del Vento

Abstract With the increasing availability of the Remote Direct Memory Access (RDMA) support in computer networks, the so called Partitioned Global Address Space (PGAS) model has evolved in the last few years. Although there are several cases where a PGAS approach can easily solve difficult message passing situations, like in particle tracking and adaptive mesh refinement applications, the producer-consumer pattern, usually adopted in task-based parallelism, can only be implemented inefficiently because of the separation between data transfer and synchronization (which is usually unified in message passing programming models). In this paper, we provide two contributions: (1) we propose an extension for the Fortran language that provides the concept of Notified Access by associating regular coarray variables with event variables. (2) We demonstrate that the MPI extension proposed by foMPI for Notified Access can be used effectively to implement the same concept in a PGAS run-time library like OpenCoarrays. Moreover, for a hydrodynamics mini-application, we found that Fortran 2018 events perform always better than Fortran 2008 sync statements on many-core processors. We finally show how the proposed Notified Access can improve the performance even more.

parallel computing | 2017

Optimizing Communication and Synchronization in CAF Applications

Alessandro Fanfarillo; Davide Del Vento; Patrick Nichols

Since the beginning of distributed computing, overlapping communication with computation has always been an attractive technique used to mask high communication costs. Although easy to detect by a human being, communication/computation overlapping requires knowledge about architectural and network details in order to be performed effectively. When low level details influence performance and productivity, compilers and run-time libraries play the critical role of translating the high level statements understandable by humans into efficient commands suitable for machines. With the advent of PGAS languages, parallelism becomes part of the programming language and communication can be expressed with simple variable assignments. As for serial programs, PGAS compilers should be able to optimize all aspects of the language. That would include communication, but unfortunately this is not yet the case. In this work we consider parallel scientific programs written in Coarray Fortran and we focus on how to build a PGAS compiler capable to optimize the communication, in particular by automatically exploiting opportunities for communication/computation overlapping. We also sketch an extension for the Fortran language that allows one to express the relation between data and synchronization events; we finally show how this relation can be used by the compiler to perform additional communication optimizations.

Proceedings of the 24th European MPI Users' Group Meeting on | 2017

Notified access in coarray fortran

Alessandro Fanfarillo; Davide Del Vento

With the increasing availability of the Remote Direct Memory Access (RDMA) support in computer networks, the so called Partitioned Global Address Space (PGAS) model has evolved in the last few years. Although there are several cases where a PGAS approach can easily solve difficult message passing situations, like in particle tracking and adaptive mesh refinement applications, the producer-consumer pattern, usually adopted in task-based parallelism, can only be implemented inefficiently because of the separation between data transfer and synchronization (which is usually unified in message passing programming models). In this paper, we provide two contributions: 1) we propose an extension for the Fortran language that provides the concept of Notified Access by associating regular coarray variables with event variables. 2) We demonstrate that the MPI extension proposed by foMPI for Notified Access can be used effectively to implement the same concept in a PGAS run-time library like OpenCoarrays.

Proceedings of the Practice and Experience on Advanced Research Computing | 2018