Fabrizio Marozzo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabrizio Marozzo is active.

Explore More

Publication

Featured researches published by Fabrizio Marozzo.

Journal of Computer and System Sciences | 2012

P2P-MapReduce: Parallel data processing in dynamic Cloud environments

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

MapReduce is a programming model for parallel data processing widely used in Cloud computing environments. Current MapReduce implementations are based on centralized master-slave architectures that do not cope well with dynamic Cloud infrastructures, like a Cloud of clouds, in which nodes may join and leave the network at high rates. We have designed an adaptive MapReduce framework, called P2P-MapReduce, which exploits a peer-to-peer model to manage node churn, master failures, and job recovery in a decentralized but effective way, so as to provide a more reliable MapReduce middleware that can be effectively exploited in dynamic Cloud infrastructures. This paper describes the P2P-MapReduce system providing a detailed description of its basic mechanisms, a prototype implementation, and an extensive performance evaluation in different network scenarios. The performance results confirm the good fault tolerance level provided by the P2P-MapReduce framework compared to a centralized implementation of MapReduce, as well as its limited impact in terms of network overhead.

grid computing | 2014

ServiceSs: An Interoperable Programming Framework for the Cloud

Francesc Lordan; Enric Tejedor; Jorge Ejarque; Roger Rafanell; Javier Alvarez; Fabrizio Marozzo; Daniele Lezzi; Raül Sirvent; Domenico Talia; Rosa M. Badia

The rise of virtualized and distributed infrastructures has led to new challenges to accomplish the effective use of compute resources through the design and orchestration of distributed applications. As legacy, monolithic applications are replaced with service-oriented applications, questions arise about the steps to be taken in order to maximize the usefulness of the infrastructures and to provide users with tools for the development and execution of distributed applications. One of the issues to be solved is the existence of multiple cloud solutions that are not interoperable, which forces the user to be locked to a specific provider or to continuously adapt applications. With the objective of simplifying the programmers challenges, ServiceSs provides a straightforward programming model and an execution framework that helps on abstracting applications from the actual execution environment. This paper presents how ServiceSs transparently interoperates with multiple providers implementing the appropriate interfaces to execute scientific applications on federated clouds.

Archive | 2010

A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

MapReduce is a programming model widely used in Cloud computing environments for processing large data sets in a highly parallel way. MapReduce implementations are based on a master-slave model. The failure of a slave is managed by re-assigning its task to another slave, while master failures are not managed by current MapReduce implementations, as designers consider failures unlikely in reliable Cloud systems. On the contrary, node failures – including master failures – are likely to happen in dynamic Cloud scenarios, where computing nodes may join and leave the network at an unpredictable rate. Therefore, providing effective mechanisms to manage master failures is fundamental to exploit the MapReduce model in the implementation of data-intensive applications in those dynamic Cloud environments where current MapReduce implementations could be unreliable. The goal of our work is to extend the master-slave architecture of current MapReduce implementations to make it more suitable for dynamic Cloud scenarios. In particular, in this chapter, we present a Peer-to-Peer (P2P)-MapReduce framework that exploits a P2P model to manage participation of intermittent nodes, master failures, and MapReduce job recovery in a decentralized but effective way.

ieee international conference on cloud computing technology and science | 2011

A Cloud Framework for Parameter Sweeping Data Mining Applications

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

Data mining techniques are used in many application areas to extract useful knowledge from large datasets. Very often, parameter sweeping is used in data mining applications to explore the effects produced on the data analysis result by different values of the algorithm parameters. Parameter sweeping applications can be highly computing demanding, since the number of single tasks to be executed increases with the number of swept parameters and the range of their values. Cloud technologies can be effectively exploited to provide end-users with the computing and storage resources, and the execution mechanisms needed to efficiently run this class of applications. In this paper, we present a Data Mining Cloud App framework that supports the execution of parameter sweeping data mining applications on a Cloud. The framework has been implemented using the Windows Azure platform, and evaluated through a set of parameter sweeping clustering and classification applications. The experimental results demonstrate the effectiveness of the proposed framework, as well as the scalability that can be achieved through the parallel execution of parameter sweeping applications on a pool of virtual servers.

Concurrency and Computation: Practice and Experience | 2015

JS4Cloud: script-based workflow programming for scalable data analysis on cloud platforms

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

Workflows are an effective paradigm to model complex data analysis processes, such as knowledge discovery in databases applications, which can be efficiently executed on distributed computing systems such as a Cloud platform. Data analysis workflows can be designed through visual programming, which is a convenient design approach for high‐level users. On the other hand, script‐based workflows are a useful alternative to visual workflows, because they allow expert users to program complex applications more effectively. In order to provide Cloud users with an effective script‐based data analysis workflow formalism, we designed the JS4Cloud language. The main benefits of JS4Cloud are as follows: (i) it extends the well‐known JavaScript language while using only its basic functions (arrays, functions, and loops); (ii) it implements both a data‐driven task parallelism that automatically spawns ready‐to‐run tasks to the Cloud resources and data parallelism through an array‐based formalism; and (iii) these two types of parallelism are exploited implicitly so that workflows can be programmed in a fully sequential way, which frees users from duties like work partitioning, synchronization, and communication. We describe how JS4Cloud has been integrated within the data mining cloud framework (DMCF), a system supporting the scalable execution of data analysis workflows on Cloud platforms. In particular, we describe how data analysis workflows modeled as JS4Cloud scripts are processed by DMCF by exploiting parallelism to enable their scalable execution on Clouds. Finally, we present some data analysis workflows developed with JS4Cloud and the performance results obtained by executing such workflows on DMCF. Copyright

international conference on parallel processing | 2012

Enabling cloud interoperability with COMPSs

Fabrizio Marozzo; Francesc Lordan; Roger Rafanell; Daniele Lezzi; Domenico Talia; Rosa M. Badia

The advent of Cloud computing has given to researchers the ability to access resources that satisfy their growing needs, which could not be satisfied by traditional computing resources such as PCs and locally managed clusters. On the other side, such ability, has opened new challenges for the execution of their computational work and the managing of massive amounts of data into resources provided by different private and public infrastructures. COMP Superscalar (COMPSs) is a programming framework that provides a programming model and a runtime that ease the development of applications for distributed environments and their execution on a wide range of computational infrastructures. COMPSs has been recently extended in order to be interoperable with several cloud technologies like Amazon, OpenNebula, Emotive and other OCCI compliant offerings. This paper presents the extensions of this interoperability layer to support the execution of COMPSs applications into the Windows Azure Platform. The framework has been evaluated through the porting of a data mining workflow to COMPSs and the execution on an hybrid testbed.

international conference on parallel processing | 2012

Using clouds for scalable knowledge discovery applications

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

Cloud platforms provide scalable processing and data storage and access services that can be exploited for implementing high-performance knowledge discovery systems and applications. This paper discusses the use of Clouds for the development of scalable distributed knowledge discovery applications. Service-oriented knowledge discovery concepts are introduced, and a framework for supporting high-performance data mining applications on Clouds is presented. The system architecture, its implementation, and current work aimed at supporting the design and execution of knowledge discovery applications modeled as workflows are described.

international conference on spatial data mining and geographical knowledge services | 2015

Following soccer fans from geotagged tweets at FIFA World Cup 2014

Eugenio Cesario; Chiara Congedo; Fabrizio Marozzo; Gianni Riotta; Alessandra Spada; Domenico Talia; Paolo Trunfio; Carlo Turri

The world-wide size of social networks, such as Facebook and Twitter, is making possible to analyse the realtime behaviour of large groups of people, such those attending popular events. This paper presents work and results on the analysis of geotagged tweets carried out to understand the behaviour of people attending the 2014 FIFA World Cup. We monitored the Twitter users attending the World Cup matches to discover the most frequent movements of fans during the competition. The data source is represented by all geotagged tweets collected during the 64 matches of the World Cup from June 12 to July 13, 2014. For each match we considered only the geotagged tweets whose coordinates fallen within the area of stadiums, during the matches. Then, we carried out a trajectory pattern mining analysis on the set of the tweets considered. Original results were obtained in terms of number of matches attended by groups of fans, clusters of most attended matches, and most frequented stadiums.

international conference on bioinformatics | 2013

Cloud4SNP: Distributed Analysis of SNP Microarray Data on the Cloud

Giuseppe Agapito; Mario Cannataro; Pietro Hiram Guzzi; Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

Pharmacogenomics studies the impact of genetic variation of patients on drug responses and searches for correlations between gene expression or Single Nucleotide Polymorphisms (SNPs) of patients genome and the toxicity or efficacy of a drug. SNPs data, produced by microarray platforms, need to be preprocessed and analyzed in order to find correlation between the presence/absence of SNPs and the toxicity or efficacy of a drug. Due to the large number of samples and the high resolution of instruments, the data to be analyzed can be very huge, requiring high performance computing. The paper presents the design and experimentation of Cloud4SNP, a novel Cloud-based bioinformatics tool for the parallel preprocessing and statistical analysis of pharmacogenomics SNP microarray data. Experimental evaluation shows good speed-up and scalability. Moreover, the availability on the Cloud platform allows to face in an elastic way the requirements of small as well as very large pharmacogenomics studies.

parallel, distributed and network-based processing | 2011

A Framework for Managing MapReduce Applications in Dynamic Distributed Environments

Fabrizio Marozzo; Domenico Talia; Paolo Trunfio

MapReduce is a programming model widely used in data centers for processing large data sets in a highly parallel way. Current MapReduce systems are based on master-slave architectures that do not cope well with dynamic node participation, since they are mostly designed for conventional parallel computing platforms. On the contrary, in Internet-based computing environments, node churn and failures - including master failures - are likely to happen since nodes join and leave the network at an unpredictable rate. The goal of this work is enabling the use of MapReduce in dynamic distributed environments so as to combine the effectiveness of a well-established programming model with the scalability of a large-scale computing infrastructure. This paper presents an adaptive MapReduce framework, called P2P-MapReduce, which exploits a peer-to-peer model to manage intermittent node participation, master failures and job recovery in a decentralized but effective way, so as to provide a more robust MapReduce middleware that can be effectively exploited in Internet-scale dynamic distributed environments.

Explore More