Massimo Sgaravatto
Istituto Nazionale di Fisica Nucleare
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Massimo Sgaravatto.
Journal of Physics: Conference Series | 2008
Paolo Andreetto; Sergio Andreozzi; G Avellino; S Beco; A Cavallini; M Cecchi; V. Ciaschini; A Dorise; Francesco Giacomini; A. Gianelle; U Grandinetti; A Guarise; A Krop; R Lops; Alessandro Maraschini; V Martelli; Moreno Marzolla; M Mezzadri; E Molinari; Salvatore Monforte; F Pacini; M Pappalardo; A Parrini; G Patania; L. Petronzio; R Piro; M Porciani; F Prelz; D Rebatto; E Ronchieri
The gLite Workload Management System (WMS) is a collection of components that provide the service responsible for distributing and managing tasks across computing and storage resources available on a Grid. The WMS basically receives requests of job execution from a client, finds the required appropriate resources, then dispatches and follows the jobs until completion, handling failure whenever possible. Other than single batch-like jobs, compound job types handled by the WMS are Directed Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs), Parametric Jobs (multiple jobs with one parametrized description), and Collections (multiple jobs with a common description). Jobs are described via a flexible, high-level Job Definition Language (JDL). New functionality was recently added to the system (use of Service Discovery for obtaining new service endpoints to be contacted, automatic sandbox files archival/compression and sharing, support for bulk-submission and bulk-matchmaking). Intensive testing and troubleshooting allowed to dramatically increase both job submission rate and service stability. Future developments of the gLite WMS will be focused on reducing external software dependency, improving portability, robustness and usability.
Archive | 2004
P. Andreetto; Daniel Kouřil; Valentina Borgia; Aleš Křenek; A. Dorigo; Luděk Matyska; A. Gianelle; Miloš Mulač; M. Mordacchini; Jan Pospíšil; Massimo Sgaravatto; Miroslav Ruda; L. Zangrando; Zdeněk Salvet; S. Andreozzi; Jiří Sitera; Vincenzo Ciaschini; Jiří Škrabal; C. Di Giusto; Michal Voců; Francesco Giacomini; V. Martelli; V. Medici; Massimo Mezzadri; Elisabetta Ronchieri; Francesco Prelz; V. Venturi; D. Rebatto; Giuseppe Avellino; Salvatore Monforte
Resource management and scheduling of distributed, data-driven applications in a Grid environment are challenging problems. Although significant results were achieved in the past few years, the development and the proper deployment of generic, reliable, standard components present issues that still need to be completely solved. Interested domains include workload management, resource discovery, resource matchmaking and brokering, accounting, authorization policies, resource access, reliability and dependability. The evolution towards a service-oriented architecture, supported by emerging standards, is another activity that will demand attention. All these issues are being tackled within the EU-funded EGEE project (Enabling Grids for E-science in Europe), whose primary goals are the provision of robust middleware components and the creation of a reliable and dependable Grid infrastructure to support e-Science applications. In this paper we present the plans and the preliminary activities aiming at providing adequate workload and resource management components, suitable to be deployed in a production-quality Grid.
Future Generation Computer Systems | 2010
Cristina Aiftimiei; Paolo Andreetto; Sara Bertocco; Simone Dalla Fina; Alvise Dorigo; Eric Frizziero; A. Gianelle; Moreno Marzolla; Mirco Mazzucato; Massimo Sgaravatto; Sergio Traldi; Luigi Zangrando
Job execution and management is one of the most important functionalities provided by every modern Grid systems. In this paper we describe how the problem of job management has been addressed in the gLite middleware by means of the CREAM and CEMonitor services. CREAM (Computing Resource Execution and Management) provides a job execution and management capability for Grids, while CEMonitor is a general purpose asynchronous event notification framework. Both components expose a Web Service interface allowing conforming clients to submit, manage and monitor computational jobs to a Local Resource Management System.
Journal of Physics: Conference Series | 2008
Cristina Aiftimiei; Paolo Andreetto; Sara Bertocco; Simone Dalla Fina; S D Ronco; Alvise Dorigo; A. Gianelle; Moreno Marzolla; Mirco Mazzucato; Massimo Sgaravatto; M Verlato; Luigi Zangrando; M Corvo; V Miccio; A Sciaba; D Cesini; D Dongiovanni; C Grandi
Modern Grid middleware is built around components providing basic functionality, such as data storage, authentication, security, job management, resource monitoring and reservation. In this paper we describe the Computing Resource Execution and Management (CREAM) service. CREAM provides a Web service-based job execution and management capability for Grid systems; in particular, it is being used within the gLite middleware. CREAM exposes a Web service interface allowing conforming clients to submit and manage computational jobs to a Local Resource Management System. We developed a special component, called ICE (Interface to CREAM Environment) to integrate CREAM in gLite. ICE transfers job submissions and cancellations from the Workload Management System, allowing users to manage CREAM jobs from the gLite User Interface. This paper describes some recent studies aimed at assessing the performance and reliability of CREAM and ICE; those tests have been performed as part of the acceptance tests for integration of CREAM and ICE in gLite. We also discuss recent work towards enhancing CREAM with a BES and JSDL compliant interface.
Archive | 2004
Daniel Kouřil; Aleš Křenek; Luděk Matyska; Miloš Mulač; Jan Pospíšil; Miroslav Ruda; Zdeněk Salvet; Jiří Sitera; Jiří Škrabal; Michal Voců; P. Andreetto; Valentina Borgia; A. Dorigo; A. Gianelle; M. Mordacchini; Massimo Sgaravatto; L. Zangrando; S. Andreozzi; Vincenzo Ciaschini; C. Di Giusto; Francesco Giacomini; V. Medici; Elisabetta Ronchieri; Giuseppe Avellino; Stefano Beco; Alessandro Maraschini; Fabrizio Pacini; Annalisa Terracina; Andrea Guarise; G. Patania
The Logging and Bookkeeping service tracks jobs passing through the Grid. It collects important events generated by both the grid middleware components and applications, and processes them at a chosen LB server to provide the job state. The events are transported through secure and reliable channels. Job tracking is fully distributed and does not depend on a single information source, the robustness is achieved through speculative job state computation in case of reordered, delayed or lost events. The state computation is easily adaptable to modified job control flow.
Journal of Physics: Conference Series | 2012
Paolo Andreetto; Sara Bertocco; Fabio Capannini; Marco Cecchi; Alvise Dorigo; Eric Frizziero; A. Gianelle; Massimo Mezzadri; Salvatore Monforte; Francesco Prelz; David Rebatto; Massimo Sgaravatto; Luigi Zangrando
The EU-funded project EMI aims at providing a unified, standardized, easy to install software for distributed computing infrastructures. CREAM is one of the middleware products part of the EMI middleware distribution: it implements a Grid job management service which allows the submission, management and monitoring of computational jobs to local resource management systems. In this paper we discuss about some new features being implemented in the CREAM Computing Element. The implementation of the EMI Execution Service (EMI-ES) specification (an agreement in the EMI consortium on interfaces and protocols to be used in order to enable computational job submission and management required across technologies) is one of the new functions being implemented. New developments are also focusing in the High Availability (HA) area, to improve performance, scalability, availability and fault tolerance.
Journal of Physics: Conference Series | 2010
Cristina Aiftimiei; Paolo Andreetto; Sara Bertocco; S Dalla Fina; Alvise Dorigo; Eric Frizziero; A. Gianelle; Moreno Marzolla; Mirco Mazzucato; P. Mendez Lorenzo; V Miccio; Massimo Sgaravatto; Sergio Traldi; Luigi Zangrando
In this paper we describe the use of CREAM and CEMonitor services for job submission and management within the gLite Grid middleware. Both CREAM and CEMonitor address one of the most fundamental operations of a Grid middleware, that is job submission and management. Specifically, CREAM is a job management service used for submitting, managing and monitoring computational jobs. CEMonitor is an event notification framework, which can be coupled with CREAM to provide the users with asynchronous job status change notifications. Both components have been integrated in the gLite Workload Management System by means of ICE (Interface to CREAM Environment). These software components have been released for production in the EGEE Grid infrastructure and, for what concerns the CEMonitor service, also in the OSG Grid. In this paper we report the current status of these services, the achieved results, and the issues that still have to be addressed.
Proceedings of International Symposium on Grids and Clouds (ISGC) 2017 — PoS(ISGC2017) | 2017
Marco Verlato; Paolo Andreetto; Fabrizio Chiarello; Fulvia Costa; Alberto Crescente; Alvise Dorigo; Sergio Fantinel; Federica Fanzago; Ervin Konomi; Matteo Segatta; Massimo Sgaravatto; Sergio Traldi; Nicola Tritto; Lisa Zangrando
The Cloud Area Padovana is an OpenStack-based scientific cloud, spread across two different sites - the INFN Padova Unit and the INFN Legnaro National Labs - located 10 km away but connected with a dedicated 10 Gbps optical link. In the last two years its hardware resources have been scaled horizontally by adding new ones: currently it provides about 1100 logical cores and 50 TB of storage. Special in-house developments were also integrated in the OpenStack dashboard, such as a tool for user and project registrations with direct support for Single Sign-On via the INFN-AAI Identity Provider as a new option for the user authentication. The collaboration with the EU-funded INDIGO-DataCloud project, started one year ago, allowed to experiment the integration of Docker-based containers and the fair-share scheduling: a new resource allocation mechanism analogous to the ones available in the batch system schedulers for maximizing the usage of shared resources among concurrent users and projects. Both solutions are expected to be available in production soon. The entire computing facility now satisfies the computational and storage demands of more than 100 users afferent to about 30 research projects. In this paper we’ll present the architecture of the Cloud infrastructure, the tools and procedures used to operate it ensuring reliability and fault-tolerance. We’ll especially focus on the lessons learned in these two years, describing the challenges identified and the subsequent corrective actions applied. From the perspective of scientific applications, we’ll show some concrete use cases on how this Cloud infrastructure is being used. In particular we’ll focus on two big physics experiments which are intensively exploiting this computing facility: CMS and SPES. CMS deployed on the cloud a complex computational infrastructure, composed of several user interfaces for job submission in the Grid environment/local batch queues or for interactive processes; this is fully integrated with the local Tier-2 facility. To avoid a static allocation of the resources, an elastic cluster, initially based only on cernVM, has been configured: it allows to automatically create and delete virtual machines according to the user needs. SPES is using a client-server system called TraceWin to exploit INFNs virtual resources performing a very large number of simulations on about a thousand nodes elastically managed.
Proceedings of International Symposium on Grids and Clouds (ISGC) 2016 — PoS(ISGC 2016) | 2017
Giuseppe Codispoti; Riccardo Di Maria; Cristina Aiftimiei; D. Bonacorsi; Patrizia Calligola; Vincenzo Ciaschini; Alessandro Costantini; Stefano Dal Pra; Claudio Grandi; Diego Michelotto; Matteo Panella; Gianluca Peco; Vladimir Sapunenko; Massimo Sgaravatto; Sonia Taneja; Giovanni Zizzi; Donato De Girolamo
LHC experiments are now in Run-II data taking and approaching new challenges in the operation of the computing facilities in future Runs. Despite having demonstrated to be able to sustain operations at scale during Run-I, it has become evident that the computing infrastructure for RunII already is dimensioned to cope at most with the average amount of data recorded, and not for peak usage. The latter are frequent and may create large backlogs and have a direct impact on data reconstruction completion times, hence to data availability for physics analysis. Among others, the CMS experiment is exploring (since the first Long Shutdown period after Run-I) the access and utilisation of Cloud resources provided by external partners or commercial providers. In this work we present proof of concepts of the elastic extension of a CMS Tier-3 site in Bologna (Italy), on an external OpenStack infrastructure. We start from presenting the experience on a first work on the “Cloud Bursting” of a CMS Grid site using a novel LSF configuration to dynamically register new worker nodes. Then, we move to an even more recent work on a “Cloud Site as-aService” prototype, based on a more direct access/integration of OpenStack resources into the CMS workload management system. Results with real CMS workflows and future plans are also presented and discussed.
Proceedings of International Symposium on Grids and Clouds (ISGC) 2016 — PoS(ISGC 2016) | 2017
Lisa Zangrando; Marco Verlato; Federica Fanzago; Massimo Sgaravatto
In OpenStack, the current resources allocation model provides to each user group a fixed amount of resources. This model based on fixed quotas, accurately reflects the economic model, pay-per-use, on which the Cloud paradigm is built. However it is not pretty suited to the computational model of the scientific computing whose demands of resources consumption can not be predetermined, but vary greatly in time. Usually the size of the quota is agreed with the Cloud Infrastructure manager, contextually with the creation of a new project and it just rarely changes over the time. The main limitation due to the static partitioning of resources occurs mainly in a scenario of full quota utilization. In this context, the project can not exceed its own quota even if, in the cloud infrastructure, there are several unused resources but assigned to different groups. It follows that the overall efficiency in a Data Centre is often rather low. The European project INDIGO DataCloud is addressing this issue with “Synergy”, a new service that provides to OpenStack an advanced provisioning model based on scheduling algorithms known by the name of “fair-share”. In addition to maximizing the usage, the fair-share ensures that these resources are equitably distributed between users and groups. In this paper will be discussed the solution offered by INDIGO with Synergy, by describing its features, architecture and the selected algorithm limitations confirmed by the preliminary results of tests performed in the Padua testbed integrated with EGI Federated Cloud.