Vito Giovanni Castellana

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vito Giovanni Castellana is active.

Explore More

Publication

Featured researches published by Vito Giovanni Castellana.

adaptive hardware and systems | 2011

A runtime adaptive controller for supporting hardware components with variable latency

Christian Pilato; Vito Giovanni Castellana; Silvia Lovergine; Fabrizio Ferrandi

Nowadays, the design of hardware cores has to necessarily deal with unpredictable components, due to process variation or to the interaction with external modules (e.g., memories, sensors, IP cores). Adaptive systems are, thus, one of the most important solutions to substitute traditional approaches, based on analysis at design time, especially in critical environments. In this paper, we present an innovative lightweight controller architecture able to automatically adjust its behavior at run-time. It interacts with the surrounding environment by means of a simple token-based communication schema. We examine the capabilities of the proposed architectural model to adapt its behavior during the execution, compared to classical ones, such as the finite state machine.

IEEE Micro | 2014

Scaling Semantic Graph Databases in Size and Performance

Alessandro Morari; Vito Giovanni Castellana; Oreste Villa; Antonino Tumeo; Jesse Weaver; David J. Haglin; Sutanay Choudhury; John Feo

GEMS is a full software system that implements a large-scale, semantic graph database on commodity clusters. Its framework comprises a SPARQL-to-C++ compiler, a library of distributed data structures, and a custom multithreaded runtime library. The authors evaluated their software stack on the Berlin SPARQL benchmark with datasets of up to 10 billion graph edges, demonstrating scaling in dataset size and performance as they added cluster nodes.

IEEE Computer | 2015

In-Memory Graph Databases for Web-Scale Data

Vito Giovanni Castellana; Alessandro Morari; Jesse Weaver; Antonino Tumeo; David J. Haglin; Oreste Villa; John Feo

A software stack relies primarily on graph-based methods to implement scalable resource description framework databases on top of commodity clusters, providing an inexpensive way to extract meaning from volumes of heterogeneous data.

ieee symposium on large data analysis and visualization | 2015

A visual analytics paradigm enabling trillion-edge graph exploration

Pak Chung Wong; David J. Haglin; David S. Gillen; Daniel Chavarria; Vito Giovanni Castellana; Cliff Joslyn; Alan R. Chappell; Song Zhang

We present a visual analytics paradigm and a system prototype for exploring Web-scale graphs. A web-scale graph is described as a graph with ~one trillion edges and ~50 billion vertices. While there is an aggressive R&D effort in processing and exploring Web-Scale graphs among Internet vendors such as Facebook and Google, visualizing a graph of that scale still remains an underexplored R&D area. The paper describes a nontraditional peek-and-filter strategy that facilitates the exploration of a graph database of unprecedented size for visualization and analytics. We demonstrate that our system prototype can (1) preprocess a graph with ~25 billion edges in less than two hours and (2) support database query and interactive visualization on the processed graph database afterward. Based on our computational performance results, we argue that we most likely will achieve the one trillion edge mark (a computational performance improvement of 40 times) for graph visual analytics in the near future.

design, automation, and test in europe | 2013

Scheduling independent liveness analysis for register binding in high level synthesis

Vito Giovanni Castellana; Fabrizio Ferrandi

Classical techniques for register allocation and binding require the definition of the program execution order, since a partial ordering relation between operations must be induced to perform liveness analysis through data-flow equations. In High Level Synthesis (HLS) flows this is commonly obtained through the scheduling task. However for some HLS approaches, such a relation can be difficult to be computed, or not statically computable at all, and adopting conventional register binding techniques, even when feasible, cannot guarantee maximum performances. To overcome these issues we introduce a novel scheduling-independent liveness analysis methodology, suitable for dynamic scheduling architectures. Such liveness analysis is exploited in register binding using standard graph coloring techniques, and unlike other approaches it avoids the insertion of structural dependencies, introduced to prevent run-time resource conflicts in dynamic scheduling environments. The absence of additional dependencies avoids performance degradation and makes parallelism exploitation independent from the register binding task, while on average not impacting on area, as shown through the experimental results.

international conference on big data | 2013

Accelerating semantic graph databases on commodity clusters

Alessandro Morari; Vito Giovanni Castellana; David J. Haglin; John Feo; Jesse Weaver; Antonino Tumeo; Oreste Villa

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

ieee international conference on high performance computing data and analytics | 2012

Abstract: Speeding-Up Memory Intensive Applications through Adaptive Hardware Accelerators

Vito Giovanni Castellana; Fabrizio Ferrandi

Heterogeneous architectures are becoming an increasingly relevant component for High-Performance Computing: they combine the computational power of multi-core processors with the flexibility of reconfigurable co-processor boards. Such boards are often composed of a set of standard Field Programmable Gate Arrays (FPGAs), coupled with a distributed memory architecture. This allows the concurrent execution of memory access operations. Nevertheless, since the execution latency of these operations may be unknown at compile-time, the synthesis of such parallelizing accelerators becomes a complex task. In fact, standard approaches require the construction of Finite State Machines (FSMs) whose complexity, in terms of number of states and transitions, increases exponentially with respect to the number of unbounded operations that may execute concurrently. We propose an adaptive architecture for such accelerators which overcome this limitation, while exploiting the available parallelism. The proposed design methodology is compared with FSM-based approaches by means of a motivational example.

design, automation, and test in europe | 2014

An adaptive Memory Interface Controller for improving bandwidth utilization of hybrid and reconfigurable systems

Vito Giovanni Castellana; Antonino Tumeo; Fabrizio Ferrandi

Data mining, bioinformatics, knowledge discovery, social network analysis, are emerging irregular applications that exploits data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructured grids. These applications are characterized by unpredictable memory accesses and generally are memory bandwidth bound, but also presents large amounts of inherent dynamic parallelism because they can potentially spawn concurrent activities for each one of the element they are exploring. Hybrid architectures, which integrate general purpose processors with reconfigurable devices, appears promising target platforms for accelerating irregular applications. These systems often connect to distributed and multi-ported memories, potentially enabling parallel memory operations. However, these memory architectures introduce several challenges, such as the necessity to manage concurrency and synchronization to avoid structural conflicts on shared memory locations and to guarantee consistency. In this paper we present an adaptive Memory Interface Controller (MIC) that addresses these issues. The MIC is a general and customizable solution that can target several different memory structures, and is suitable for High Level Synthesis frameworks. It implements a dynamic arbitration scheme, which avoids conflicts on memory resources at runtime, and supports atomic memory operations, commonly exploited for synchronization directives in parallel programming paradigms. The MIC simultaneously maps multiple accesses to different memory ports, allowing fine grained parallelism exploitation and ensuring correctness also in the presence of irregular and statically unpredictable memory access patterns. We evaluated the effectiveness of our approach on a typical irregular kernel, graph Breadth First Search (BFS), exploring different design alternatives.

international parallel and distributed processing symposium | 2016

GraQL: A Query Language for High-Performance Attributed Graph Databases

Daniel G. Chavarría-Miranda; Vito Giovanni Castellana; Alessandro Morari; David J. Haglin; John Feo

Graph databases are becoming a critical tool for the analysis of graph-structured data in the context of multiple scientific and technical domains, including cybersecurity and computational biology. In particular, the storage, analysis and querying of attributed graphs is a very important capability. Attributed graphs contain properties attached to the vertices and edges of the graph structure. Queries over attributed graphs do not only include structural pattern matching, but also conditions over the values of the attributes. In this work, we present GraQL, a query language designed for high-performance attributed graph databases hosted on a high memory capacity cluster. GraQL is designed to be the front-end language for the attributed graph data model for the GEMS database system.

field-programmable technology | 2013

An automated flow for the High Level Synthesis of coarse grained parallel applications

Vito Giovanni Castellana; Fabrizio Ferrandi

High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded system designers, by enabling the automatic or semiautomatic generation of hardware accelerators starting from high level descriptions with (usually software) programming languages. Typical HLS approaches build a centralized Finite State Machine (FSM) to control the generated datapath, performing the operations according to a pre-determined, static schedule. However, FSM-based approaches are only able to extract parallelism within a single execution flow. In the presence of coarse grained parallelism, in the form of concurrent function calls or parallel control structures, they either serialize all the operations, or build excessively complex controllers, aiming at executing as many operation as possible in a single control step (i.e., they try to extract as much instruction level parallelism as possible). The resulting controllers occupy an excessive amount of area or lead to very low operating frequencies. In this paper we propose a methodology for the HLS of accelerators supporting parallel execution and dynamic scheduling. The approach exploits an adaptive distributed controller, composed of a set of communicating elements associated with each operation. This controller design enables supporting multiple concurrent execution flows, thus increasing parallelism exploitation beyond instruction level parallelism. The approach also supports variable latency operations, such as memory accesses and speculative operations. We apply our methodology on a set of typical HLS benchmarks, and demonstrate valuable speed ups with limited area overheads with respect to conventional FSM-based flows.

Explore More