Silvia Mocavero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Silvia Mocavero is active.

Explore More

Publication

Featured researches published by Silvia Mocavero.

Archive | 2011

NEMO-Med: Optimization and Improvement of Scalability

Italo Epicoco; Silvia Mocavero; Aloisio Giovanni

The NEMO oceanic model is widely used among the climate community. It is used with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the model (v3.2), based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm.

CMCC Research Paper | 2010

A distributed infrastructure for ensemble experiments.

Italo Epicoco; Maria Mirto; Silvia Mocavero; Aloisio Giovanni

The WP3/NA2 of the EU IS-ENES project aims at the set up and deploy an e-infrastructure providing climate scientists with the needed virtual proximity to distributed data and distributed compute resources. The access point of this infrastructure is represented by the v.E.R.C. portal: it will allow the ESM scientists to run complex distributed workflows for running ESM experiments and accessing to ESM data. The focus of this report is on the deployment of a grid prototype for running ensembles of multi-model experiments. Considering existing grid infrastructures and services, the design of this grid prototype has been lead by the necessity to build a framework that will leverage the external services offered within the European HPC ecosystem, e.g., today by DEISA2 and in the future by PRACE. The prototype allows exploiting advanced grid services, namely GRB services, developed at the University of Salento, and basic grid services offered by the Globus Toolkit middleware in order for submitting and monitoring of ensemble runs. The prototype has been deployed involving two sites composed of the CMCC and DKRZ nodes. A third node, represented by the BSC, has been considered but its deployment is yet an on going activity. A case study related to the HRT159, a global coupled ocean-atmosphere general circulation model (AOGCM) developed by CMCC-INGV, has been considered and preliminary tests carried out on CMCC and DKRZ sites are reported.

Technical Report - Centro Euro-Mediterraneo per i Cambiamenti Climatici | 2008

ORCA025: Performance Analysis on Scalar Architecture

Italo Epicoco; Silvia Mocavero; Enrico Scoccimarro; Aloisio Giovanni

This technical report describes the porting and performance evaluation activities performed on ORCA025 code, implementing the global ocean general circulation model (OGCM) OPA. The code, currently available and optimized on vector architectures, has been ported on HP XC6000 Itanium2 scalar cluster, provided by the associate partner SPACI. The activity is mainly focused to evaluate how a scalar architecture based on Itanium2 processor behaves with oceanographic model that traditionally run on vector clusters. Performance analysis of the parallel code showed good results in terms of scalability.

CMCC Research Paper | 2013

Optimal Task Mapping for NEMO Model

Italo Epicoco; Francesca Macchia; Silvia Mocavero; Aloisio Giovanni

The climate numerical models require a considerable amount of computing power. The modern parallel architectures provide the needed computing power to perform scientific simulations at acceptable resolutions. However, the efficiency of the exploitation of the parallel architectures by the climate models is often poor. Several factors influence the parallel efficiency such as the parallel overhead due to the communications among concurrent tasks, the memory contention among tasks on the same computing node, the load balancing and the tasks synchronization. The work here described aims at addressing two of the factors influencing the efficiency: the communications and the memory contention. The used approach is based on the optimal mapping of the tasks on the SMP nodes of a parallel cluster. The best mapping can heavily influence the time spent for communications between tasks belonging to the same node either to different nodes. Moreover, if we consider that each parallel task will allocate different amount of memory, the optimal tasks mapping can balance the total amount of main memory allocated on the same node and hence reduce the overall memory contention. The climate model taken into consideration is PELAGOS025 made by coupling the NEMO oceanic model with the BFM biogeochemical model. It has been used in a global configuration with a horizontal resolution of 0.25◦. Three different mapping strategies have been implemented, analyzed and compared with the standard allocation performed by the local scheduler. The parallel architecture used for the evaluation is an IBM iDataPlex with Intel SandyBridge processors located at the CMCC’s Supercomputing Center.

international conference on computational science and its applications | 2012

The performance model of an enhanced parallel algorithm for the SOR method

Italo Epicoco; Silvia Mocavero

The Successive Over Relaxation (SOR) is a variant of the iterative Gauss-Seidel method for solving a linear system of equations Ax=b. The SOR algorithm is used within the NEMO (Nucleus for European Modelling of the Ocean) ocean model for solving the elliptical equation for the barotropic stream function. The NEMO performance analysis shows that the SOR algorithm introduces a significant communication overhead. Its parallel implementation is based on the Red-Black method and foresees a communication step at each iteration. An enhanced parallel version of the algorithm has been developed by acting on the size of the overlap region to reduce the frequency of communications. The overlap size must be carefully tuned for reducing the communication overhead without increasing the computing time. This work describes an analytical performance model of the SOR algorithm that can be used for establishing the optimal size of the overlap region.

CMCC Research Paper | 2012

Common Pitfalls Coding a Parallel Model

Italo Epicoco; Silvia Mocavero; Alessandro D'Anca; Aloisio Giovanni

The process for developing a climate model often involves a wide community of developers. All of the code releases can be classified into two groups: improvements and updates related to modeling aspects (new parameterizations, new and more detailed equations, remove of model approximations, and so on); improvement related to the computational aspects (performance enhancement, porting on new computing architectures, fixing of known bugs, and so on). The developing process involves both programmers, scientific experts, and rarely also computer scientists. The new improvements and developments are mainly focused on the scientific aspects and, in second stage, on the computing performance. The developments to improve the physic model often does not care about its impacts on the computational performances. This poses some issue in the developing process; after a new implementation, the code must be revised after new implementation to face out with the performance issues. In this work we analyze 5 different releases starting from the NEMO v3.2 (to be considered as our reference) and evaluate how new developments impact on the computational performances.

Archive | 2011

Nemo-Med: Extra-Halo Performance Model

Italo Epicoco; Silvia Mocavero; Aloisio Giovanni

The NEMO oceanic model, characterized by a resolution of 1/16◦and tailored on the Mediterranean Basin used at CMCC, has been analyzed to discover possible bottlenecks to the parallel scalability. A detailed analysis of scalability on all of the routines called during a NEMO time step allowed to identify the SOR solver routine as the most expensive from the communication point of view. The function implements the red-black successive-over-relaxation method, an iterative search algorithm used for solving the elliptical equation for the barotropic stream function. The algorithm iterates until reach the convergence; a limit on the maximum number of iteration is also set up. The high frequency of data exchanging within this routine implies a high communication overhead. The NEMO code includes an enhanced version of the routine, that reduce the frequency of communication by adding an extra-halo region. The use of this optimization requires the selection of the optimal value of the extra-halo dimension to trade-off computation and communication. A performance model, allowing the choice of the optimal extra-halo value for a pre-defined decomposition, has been designed. The model has been tested on the MareNostrum cluster at the Barcelona Supercomputing Centre.

CMCC Research Paper | 2011

Nemo Benchmarking: V.3.3 vs V.3.3.1 Comparison

Italo Epicoco; Silvia Mocavero; Alessandro D'Anca; Giovanni Aloisio

The report describes the activities carried out within the NEMO Consortium commitment. They refer to the performance evaluation and the comparison of the NEMO version 3.3 and the NEMO version 3.3.1. The main differences between two versions are related to the memory management and the allocation of the data structures. The NEMO ver. 3.3.1 replaces the static memory array allocation with a dynamic one. This approach brings some relevant benefits, such as the run time evaluation of the best domain decomposition. On the other hand, the dynamic array allocation introduces some lose of computational performance. The aim of this work is to evaluate the difference between the two versions from a computational point of view.

CMCC Research Paper | 2010

A Performance Evaluation Method for Coupled Models

Italo Epicoco; Silvia Mocavero; Aloisio Giovanni

In the High-Performance Computing context, the performance evaluation of a parallel algorithm is made mainly considering the elapsed time running the parallel application with both different number of cores or different problem sizes (for scaled speed-up). Typically, parallel applications embed mechanisms for efficiently using the allocated resources, guarantying for example a good load balancing and reducing the parallel overhead. Unfortunately, this assumption is not true for coupled models.These models are born from the coupling of stand-alone climate applications. The component models are developed independently from each other and they follow different development roadmaps. Moreover, they are characterized by different levels of parallelization, different requirements in terms of workload and they have their own scalability curve. Considering a coupled model as a single parallel application, we can note the lacking of a policy for balancing the computational load on the available resources. This work tries to address the issues related to performance evaluation of a coupled model, and to answer to the following questions: allocated a given number of processors for the whole coupled model, how to configure the run in order to balance the workload? How many processors must be assigned to each of the component models? The methodology here described has been applied for evaluating the scalability of the CMCC-MED coupled model designed by INGV and the ANS Division of the CMCC. The evaluation has been carried out on two different computational architectures: a scalar cluster based on IBM Power6 processors; and a vector cluster based on NEC-SX9 processors.

CMCC Research Paper | 2010

Definition of an ESM Benchmark for Evaluating Parallel Architectures

Italo Epicoco; Silvia Mocavero; Aloisio Giovanni

Different approaches exist for evaluating the computational performance of a parallel system. These approaches are based on the appliance of some benchmarking tools for evaluating either the whole system or some of its sub-components (i.e. I/O system, memory bandwidth, node interconnection, etc). Different kind of benchmarks can be considered: real program benchmarks are based on real applications; kernel benchmarks include some key codes normally abstracted from actual programs (i.e. linear algebra operations); component benchmarks are focused on the evaluation of computer’s basic components; synthetic benchmarks are built taking statistics of all types of operation from many application programs and writing a program based on a proportional invocation of such operations.This report describes the development of an ESM (Earth System Model) benchmark, based on real applications, for evaluating the performance of a parallel system and its suitability for running climate models. The development of the ESM benchmark started from the composition of an evaluation suite that includes some of the most significant ESM models adopted in the climate community. The selection of the ESM models has been made within the ENES community involving all the main climate centers in Europe. Finally, we have defined a metric as index for measuring the system’s performance. The benchmark will be used for both comparing different parallel architectures and highlighting the hotspots of the target one. The benchmark’s results can provide useful hints for tuning and better configuring the analyzed system.

Explore More