Flavio Vella | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Flavio Vella is active.

Explore More

Publication

Featured researches published by Flavio Vella.

international conference on computational science and its applications | 2010

The AES Implantation Based on OpenCL for Multi/many Core Architecture

Osvaldo Gervasi; Diego Russo; Flavio Vella

In this article we present a study on an implementation, named clAES, of the symmetric key cryptography algorithm Advanced Encryption Standard (AES) using the Open Computing Language (OpenCL) emerging standard. We will show a comparison of the results obtained benchmarking clAES on various multi/many core architectures. We will also introduce the basic concepts of AES and OpenCL in order to describe the details of clAES implementation. This study represents a first step in a broader project which final goal is to develop a full OpenSSL library implementation on heterogeneous computing devices such as multi-core CPUs and GPUs.

computing frontiers | 2016

Scalable betweenness centrality on multi-GPU systems

Massimo Bernaschi; Giancarlo Carbone; Flavio Vella

Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortest-paths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonable amount of time. Our approach combines bi-dimensional (2-D) decomposition of the graph and multi-level parallelism together with a suitable data-thread mapping that overcomes most of the difficulties caused by the irregularity of the computation on GPUs. In order to reduce time and space requirements of BC computation, a heuristics based on 1-degree reduction technique is developed as well. Experimental results on synthetic and real-world graphs show that the proposed techniques are well suited to compute BC scores in graphs which are too large to fit in the memory of a single computational node.

Journal of Parallel and Distributed Computing | 2015

Solutions to the st-connectivity problem using a GPU-based distributed BFS

Massimo Bernaschi; Giancarlo Carbone; Enrico Mastrostefano; Flavio Vella

The st-connectivity problem (ST-CON) is a decision problem that asks, for vertices s and t in a graph, if t is reachable from s . Although originally defined for directed graphs, it can also be studied on undirected graphs and used as a building block for solving more complex tasks on large scale graphs. We present solutions to ST-CON based on a high performance Breadth First Search (BFS) executed on clusters of Graphics Processing Units (GPUs) using the Nvidia CUDA platform. To measure performances, we use the number of ST-CONs per second. We present the results for two different implementations that highlight the impact of atomic operations in CUDA. A parallel distributed algorithm to solve st-connectivity problem.Based on an enhanced distributed Breadth First Search for GPUs.Two different implementations are proposed and compared.Number of st-connectivity problems solved per second (NSTPS): a metric to evaluate performances.

international conference on computational science and its applications | 2012

A simulation framework for scheduling performance evaluation on CPU-GPU heterogeneous system

Flavio Vella; I. Neri; Osvaldo Gervasi; Sergio Tasso

Modern PCs are equipped with multi-many core capabilities which enhance their computational power and address important issues related to the efficiency of the scheduling processes of the modern operating system in such hybrid architectures. The aim of our work is to implement a simulation framework devoted to the study of the scheduling process in hybrid systems in order to improve the system performance. Through the simulator we are able to model events and to evaluate the scheduling policy for heterogeneous systems. We implemented as a use case a simple scheduling discipline, a non-prehemptive priority queue.

Future Generation Computer Systems | 2017

Strategies and Systems Towards Grids and Clouds Integration: A DBMS-Based Solution

Mirko Mariotti; Osvaldo Gervasi; Flavio Vella; Alfredo Cuzzocrea; Alessandro Costantini

Abstract Cloud and Grid computing share some essential driving ideas although the computing and economic models are very different. In this paper, we propose different strategies for the Batch-oriented and Service-oriented computing models interoperability. In particular, we describe an innovative approach to connect together Computational Grids and IaaS providers. This is achieved via introducing a simple and powerful DBMS-based system of deploying VM images from a Cloud environment in order to fulfill particular requests of task execution coming from a Grid environment. From a user point of view, resource authorization and access are kept unchanged, thus preserving the user experience related to the Grid. From the accounting point of view, in order to inform the Grid sites that a certain resource is available on a given Cloud-enabled Grid site, the information is published on the Grid information system. In this so-delineated scenario, we are able of using the powerful capability of distributing jobs of the Grid in order to allocate resources not only belonging to Grid clusters, but also with different architectures like GPUs, FPGAs and other systems. The target DBMS-based system has been designed for orchestrate a set of computing systems able to provide physical and virtual resources, creating a unified system, in which the various users-submitted computing tasks are managed and optimized. The goodness of the proposed system is demonstrated by a series of experiments highlighting the benefits of our approach.

practical aspects of declarative languages | 2016

A GPU implementation of the ASP computation

Agostino Dovier; Andrea Formisano; Enrico Pontelli; Flavio Vella

General Purpose Graphical Processing Units (GPUs) are affordable multi-core platforms, providing access to large number of cores, but at the price of a complex architecture with non-trivial synchronization and communication costs. This paper presents the design and implementation of a conflict-driven ASP solver, that is capable of exploiting the parallelism offered by GPUs. The proposed system builds on the notion of ASP computation, that avoids the generation of unfounded sets, enhanced by conflict analysis and learning. The proposed system uses the CPU exclusively for input and output, in order to reduce the negative impact of the expensive data transfers between the CPU and the GPU. All the solving components, i.e., the management of nogoods, the search strategy, backjumping, the search heuristics, conflict analysis and learning, and unit propagation, are performed on the GPU, by exploiting Single Instruction Multiple Threads (SIMT) parallelism. The preliminary experimental results confirm the feasibility and scalability of the approach, and the potential to enhance performance of ASP solvers.

arXiv: Distributed, Parallel, and Cluster Computing | 2018

Dynamic Merging of Frontiers for Accelerating the Evaluation of Betweenness Centrality

Flavio Vella; Massimo Bernaschi; Giancarlo Carbone

Betweenness Centrality (BC) is a widely used metric of the relevance of a node in a network. The fastest-known algorithm for the evaluation of BC on unweighted graphs builds a tree representing information about the shortest paths for each vertex to calculate its contribution to the BC score. Actually, for specific vertices, the shortest-path trees of neighboring nodes could be leveraged to reduce the computational burden, but existing BC algorithms do not exploit that information and carry out redundant computations. We propose a new algorithm, called dynamic merging of frontiers, which makes use of such information to derive the BC score of degree-2 vertices by re-using the results of the sub-trees of the neighbors. We implemented our idea in parallel fashion exploiting Graphics Processing Units. Compared to state-of-the-art implementations, our approach achieves a linear improvement in the number of degree-2 vertices and an average improvement of × over a variety of real-world graphs.Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortestpaths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonable amount of time. Our approach combines bi-dimensional (2-D) decomposition of the graph and multi-level parallelism together with a suitable data-thread mapping that overcomes most of the difficulties caused by the irregularity of the computation on GPUs. Furthermore, we propose novel heuristics which exploit the topology information of the graph in order to reduce time and space requirements of BC computation. Experimental results on synthetic and real-world graphs show that the proposed techniques allow the BC computation of graphs which are too large to fit in the memory of a single computational node along with a significant reduction of the computing time.

ieee international conference on high performance computing data and analytics | 2017

Scaling betweenness centrality using communication-efficient sparse matrix multiplication

Edgar Solomonik; Maciej Besta; Flavio Vella; Torsten Hoefler

Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We propose Maximal Frontier Betweenness Centrality (MFBC): a succinct BC algorithm based on novel sparse matrix multiplication routines that performs a factor of p1/3 less communication on p processors than the best known alternatives, for graphs with n vertices and average degree k = n/p2/3. We formulate, implement, and prove the correctness of MFBC for weighted graphs by leveraging monoids instead of semirings, which enables a surprisingly succinct formulation. MFBC scales well for both extremely sparse and relatively dense graphs. It automatically searches a space of distributed data decompositions and sparse matrix multiplication algorithms for the most advantageous configuration. The MFBC implementation outperforms the well-known CombBLAS library by up to 8x and shows more robust performance. Our design methodology is readily extensible to other graph problems.

irregular applications: architectures and algorithms | 2015

Betweenness centrality on Multi-GPU systems

Massimo Bernaschi; Giancarlo Carbone; Flavio Vella

Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The exact BC computation for a large scale graph is an extraordinary challenging and requires high performance computing techniques to provide results in a reasonable amount of time. Here, we present the techniques we developed to speed-up the computation of the BC on Multi-GPU systems. Our approach combines the bi-dimensional (2-D) decomposition of the graph and multi-level parallelism. Experimental results show that the proposed techniques are well suited to compute BC scores in graphs which are too large to fit in single GPU memory. In particular, the computation time of a 234 million edges graph is reduced to less than 2 hours.

irregular applications: architectures and algorithms | 2017

Accelerating Energy Games Solvers on Modern Architectures

Andrea Formisano; Raffaella Gentilini; Flavio Vella

Quantitative games, where quantitative objectives are defined on weighted game arenas, provide natural tools for designing faithful models of embedded controllers. Instances of these games are the so called Energy Games. Starting from a sequential baseline implementation, we investigate the use of massively data computation capabilities supported by modern GPUs to solve the initial credit problem for Energy Games. We present different parallel implementations on multi-core CPU and GPU systems. Our solution outperforms the baseline implementation by up to 36x speedup and obtains a faster convergence time on real-world graphs.

Explore More