Alessandro Marongiu
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alessandro Marongiu.
IEEE Transactions on Software Engineering | 2000
Alessandro Marongiu; Paolo Palazzari
The automatic extraction of parallelism from algorithms, and the consequent parallel code generation, is a challenging problem. We present a procedure for automatic parallel code generation in the case of algorithms described through a SARE (Set of Affine Recurrence Equations). Starting from the original SARE description in an N-dimensional iteration space, the algorithm is converted into a parallel code for an (eventually virtual) m-dimensional distributed memory parallel machine (m<N). We demonstrate some theorems which are the mathematical basis for the proposed parallel generation tool. The projection technique used in the tool is based on the polytope model. Some affine transformations are introduced to project the polytope from the original iteration space onto another polytope, preserving the SARE semantic, in the time-processor (t,p) space. Points in (t,p) are individuated through the m-dimensional p coordinate and the n-dimensional t coordinate, resulting in N=n+m. Along with polytope transformation, a methodology to generate the code within processors is given. Finally, a cost function, used to guide the heuristic search for the polytope transformation and derived from the actual implementation of the method on an MPP SIMD machine, is introduced.
ieee international conference on high performance computing data and analytics | 2000
Alessandro Marongiu; Paolo Palazzari; Luigi Cinque; Ferdinando Mastronardo
In this work a High Level Software Synthesis (HLSS) methodology is presented. HLSS allows the automatic generation of a parallel program starting from a sequential C program. HLSS deals with a significant class of iterative algorithms, the one expressible through nested loops with affine dependencies, and integrates several techniques to achieve the final parallel program. The computational model of the System of Affine Recurrence Equations (SARE) is used. As first step in HLSS, the iterative C program is converted into SARE form; parallelism is extracted from the SARE through allocation and scheduling functions which are represented as unimodular matrices and are determined by means of an optimization process. A clustering phase is applied to fit the parallel program onto a parallel machine with a fixed amount of resources (number of processors, main memory, communication channels). Finally, the parallel program to be executed on the target parallel system is generated.
conference on high performance computing (supercomputing) | 2001
Alessandro Marongiu; Paolo Palazzari; Vittorio Rosato
We describe a design methodology which allows a fast design and prototyping of dedicated hardware devices to be used in heterogeneous computations. The platforms used in heterogeneous computations consist of a general-purpose COTS architecture which hosts a dedicated hardware device; parts of the computation are mapped onto the former, parts onto the latter, in a way to improve the overall computation efficiency. We report the design and the prototyping of a FPGA-based hardware board to be used in the search of low-autocorrelation binary sequences. The circuit has been designed by using a recently developed Parallel Hardware Generator (PHG) package which produces a synthesizable VHDL code starting from the specific algorithm expressed as a System of Affine Recurrence Equations (SARE). The performance of the realized devices has been compared to those obtained on the same numerical application on several computational platforms.
international parallel processing symposium | 1999
Alessandro Marongiu; Paolo Palazzari
In this work we present a procedure for automatic parallel code generation in the case of algorithms described through Set of Affine Recurrence Equations (SARE); starting from the original SARE description in an N-dimensional iteration space, the algorithm is converted into a parallel code for an m-dimensional distributed memory parallel machine (m<N). The used projection technique is based on the polytope model. Some affine transformations are introduced to project the polytope from the original iteration space onto another polytope, preserving the SARE semantic, in the processor-time (t,p) space. Along with polytope transformation, we give a methodology to generate the code within processors and a technique to avoid the memory wasting typical of SARE implementations. Finally a cost function, used to guide the heuristic search for the polytope transformation and derived from the actual implementation of the method on an MPP SIMD machine, is introduced.
international parallel and distributed processing symposium | 2005
Giovanni Lavorgna; Alessandro Marongiu; Simone Melchionna; Paolo Palazzari; Vittorio Rosato; Paolo Verrecchia
Discovering co-operative transcription factors (TFs) within the genome is a computationally challenging problem, tackled through Monte Carlo-like analysis by the Co-Bind code, developed at the Department of Genetics of the St. Louis Washington University. Due to its statistical nature, co-bind is characterized by very long execution times, order of days on current high-end workstations, and could benefit from parallelization and a wise optimization, performed at both the algorithmic and coding levels. This paper presents the results achieved by parallelizing co-bind and optimising the parallel code and shows that, on a 16-processor architecture, a speedup greater than two orders of magnitude is achieved with respect to the serial version released by the codes authors.
international parallel and distributed processing symposium | 2002
Alessandro Marongiu; Paolo Palazzari; Vittorio Rosato
FPGAs allow the implementation of very complex designs (~1million of gates); they are good candidates to host special purpose systems designed to boost conventional computing architectures. Several computationally intensive algorithms are poorly supported by standard computing architectures, so the design of dedicated devices implementing the intensive parts of such algorithms could significantly speedup the overall performances. (Re-)programmability, allowing the reusing of the same chip for different applications and avoiding the costly and cumbersome design of ASIC systems, is a key issue for the design of specialized computing architectures. Further crucial factors for the success of FPGA based coprocessors are both the possibility of achieving significantly larger performances than those attainable with conventional processors and the ability to produce a working prototype in very short times. This work presents the results achieved in the HADES (HArdware DEsign in Scientific applications) project, aimed at automatically extracting parallelism from affine iterative algorithms and at generating the synthesizable VHDL which describes the parallelized version of the algorithm. In the paper, along with the global HADES design flow, we present two cases, from the signal processing and the proteomic domains, in which FPGA based designs allowed to significantly increase the overall system performances. Thanks to the nearly global automation of all the steps of the design flow, in both cases, a working prototype has been realized in one working week.
formal methods | 2001
Fabrizio Cleri; Alessandro Marongiu; Vittorio Rosato
Abstract The locality of the interactions in a Hamiltonian model gives origin to the linearization of the algorithms expressing the calculation of the interactions. This specific property, often used in condensed matter physics, has originated approximate models which, although preserving most of the physical insights of the parent exact models, display attractive computational properties which has determined their use in several scientific applications. We review the main issues at the basis of the linearization property arising in two different problems in condensed matter physics: the projection method to compute total energies in the Tight Binding approximation and the calculation of the pair-correlation function of weakly interacting bosons, in the Hypernetted-Chain expansion. We also remark how linearized numerical models could be mapped into “Systems of Affine Recurrence Equations” (SARE). The SARE structures revealed to be tractable with recently developed tools for hardware/software automatic synthesis. These tools could be used to purposely design dedicated hardware devices which efficiently perform those numerical calculations.
international symposium on circuits and systems | 2000
Alessandro Marongiu; Valerio Cimagalli
Since their introduction, Cellular Neural Networks have been constantly developed to include a broad class of problems. Despite their theoretical success, CNN implementations still suffer size limitations. In fact while the biggest CNN chips,due to VLSI constraints, have no more than few thousands of cells distributed on a 2D array, real problems may be multi-dimensional and may require millions of cells. In this paper we introduce a theoretical result allowing the emulation of a large DTCNN on a smaller and/or lower dimensional one. The smaller DTCNN will be equipped with some additional memory with respect to a standard DTCNN. Due to the theoretical formulation of the problem the DTCNN emulation has exactly the same behavior as the original one.
ieee international workshop on cellular neural networks and their applications | 2000
Alessandro Marongiu; Valerio Cimagalli
The development of the cellular neural network (CNN) paradigm, and its wide use in many application fields, has shown that CNN is a complementary, and in some cases alternative, approach to classical computing machines. Despite their theoretical success, CNN VLSI implementations still suffer from size and dimension limitations. In fact, while the biggest CNN chips, due to VLSI constraints and to planar technology, have no more than few thousands of cells arranged on a 2D array, real problems may require millions of cells and may be multidimensional. We focus on the implementation of an m-dimensional DT-CNN with a limited number of lower (m-i)-dimensional DT-CNN circuits. As the target dimension is (m-i), we choose i=m-2 or i=m-1. In order to obtain an architecture using 2D or ID DT-CNN circuits which were proven to be feasible.
Bioinformatics | 2003
Alessandro Marongiu; Paolo Palazzari; Vittorio Rosato