Matteo Turilli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matteo Turilli is active.

Explore More

Publication

Featured researches published by Matteo Turilli.

international parallel and distributed processing symposium | 2016

Integrating Abstractions to Enhance the Execution of Distributed Applications

Matteo Turilli; Feng Liu; Zhao Zhang; Andre Merzky; Michael Wilde; Jon B. Weissman; Daniel S. Katz; Shantenu Jha

One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding of the general properties and performance of the coupling between applications and dynamic resources. This paper addresses this issue by integrating abstractions representing distributed applications, resources, and execution processes into a pilot-based middleware. The middleware provides a platform that can specify distributed applications, execute them on multiple resource and for different configurations, and is instrumented to support investigative analysis. We analyzed the execution of distributed applications using experiments that measure the benefits of using multiple resources, the late-binding of scheduling decisions, and the use of backfill scheduling.

international conference on e-science | 2017

Evaluating Distributed Execution of Workloads

Matteo Turilli; Yadu Nand Babuji; Andre Merzky; Ming Tai Ha; Michael Wilde; Daniel S. Katz; Shantenu Jha

Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems

ACM Computing Surveys | 2018

A Comprehensive Perspective on Pilot-Job Systems

Matteo Turilli; Mark Santcroos; Shantenu Jha

Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.

international conference on e-science | 2017

High-Throughput Computing on High-Performance Platforms: A Case Study

Danila Oleynik; S. Panitkin; Matteo Turilli; Alessio Angius; Sarp H. Oral; K. De; Alexei Klimentov; J. C. Wells; Shantenu Jha

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size re-source. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan - a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.

international conference on e-science | 2015

Federating Infrastructure as a Service Cloud Computing Systems to Create a Uniform E-infrastructure for Research

David Wallom; Matteo Turilli; Michel Drescher; Diego Scardaci; Steven Newhouse

This paper details the state of the art, the design, development and deployment of the EGI Federated Cloud platform, an e-infrastructure offering scalable and flexible models of utilization to the European research community. While continuing support for the traditional High Throughput Computing model, the EGI Cloud Platform extends its reach to other models of utilization such as long-lived services and on demand computation. Following a two-year period of development, the EGI Federated Cloud platform was officially launched in May 2014 offering resources provided by trusted academic and research organisations from within the user communities and consistently with their standard funding regime. Since then, the use cases supported have significantly increased both in total number and diversity of model of service required, validating both the choice of enforcing cloud technology agnosticism and of supporting service mobility and portability by means of open standards. These design choices have also allowed for the inclusion of commercial cloud providers into an infrastructure previously supported only by academic institutions. This contributes to a wider goal of funding agencies to create economic and social impact from supported research activities.

Journal of Computational Science | 2018

Synapse: Synthetic application profiler and emulator

Andre Merzky; Ming Tai Ha; Matteo Turilli; Shantenu Jha

Abstract Motivated by the need to emulate workload execution characteristics on high-performance and distributed heterogeneous resources, we introduce Synapse. Synapse is used as a proxy application (or “representative application”) for real workloads, with the advantage that it can be tuned in different ways and dimensions, and also at levels of granularity that are not possible with real applications. Synapse has a platform-independent application profiler, and has the ability to emulate profiled workloads on a variety of resources. Experiments show that the automated profiling performed using Synapse captures an applications characteristics with high fidelity. The emulation of an application using Synapse can reproduce the applications execution behavior in the original runtime environment, and can also reproduce those behaviors on different run-time environments.

Archive | 2016

From Abstractions to MODELS: MOdels for Distributed and Extremely Large-scale Science

Shantenu Jha; Jon S Weissman; Matteo Turilli; Daniel S. Katz

Many important advances in science and engineering are due to large-scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. These gaps underlie the following observations:

arXiv: Distributed, Parallel, and Cluster Computing | 2015