L. Di Nunzio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where L. Di Nunzio is active.

Explore More

Publication

Featured researches published by L. Di Nunzio.

workshop on intelligent solutions in embedded systems | 2010

Algorithm acceleration on LEON-2 processor using a reconfigurable bit manipulation unit

G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Marco Re

Advanced bit manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem [1], [3] and [4]. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO [1] (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.

asilomar conference on signals, systems and computers | 2010

Butterfly and Inverse Butterfly nets integration on Altera NIOS-II embedded processor

G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Marco Re; Ruby B. Lee

The Instruction Set Architecure (ISA) of micro-processors is usually word oriented, so it is not optimized to perform bit level operations. A functional unit oriented to the bit manipulation could accelerate the computation increasing the microprocessor performance in terms of execution time. This work presents the experimental results of the integration between the Bit Manipulation Unit (BMU) described in [1], [2] and the Altera NIOS-II processor [5]. The BMU, described in VHDL, has been integrated in the processor using the Custom Logic feature [6] and implemented on an Altera-Stratix FPGA.

international conference on electronics, circuits, and systems | 2008

A full-adder based reconfigurable architecture for fine grain applications: ADAPTO

G.C. Cardarilli; L. Di Nunzio; Marco Re

Microprocessor and DSP are optimized to perform operations on data having the same size of native wordlength. Their performances decrease when shorter data must be processed. In fact, operations on a short data have the same complexity native wordlength data and data resources are not fully exploited. Recently different solutions have been proposed to overcome this problem. Great attention has been posed on architectures based on a main processor supported by a reconfigurable unit (RU) - typically based on LUTs - used as coprocessor. In the work of Cardarilli et al., (2008), we presented ADAPTO (adder-based dynamic architecture for processing tailored operators) a new reconfigurable architecture that replacing LUTs with another computational element and using a simplified interconnect network allows, in one clock cycle of the main microprocessor, both hardware reconfiguration and instruction execution. In this paper we present a modified version of ADAPTO that achieves more flexibility.

international symposium on circuits and systems | 2008

ADAPTO: full-adder based reconfigurable architecture for bit level operations

G.C. Cardarilli; L. Di Nunzio; Marco Re; Alberto Nannarelli

Low cost microprocessors and DSPs are optimized to perform general arithmetic and logic operations on native wordlength. On the other hand, the efficiency decreases when they process shorter data (more clock cycles per operation are required). Recently different solutions have been proposed to overcome this problem. Among those, the one based on a main processor with a reconfigurable unit (RU) used as coprocessor (to speed up fine grained operations) is the most common. Typically those coprocessors, similar to FPGA, are composed by look-up tables (LUTs) and pass transistors interconnects. In this way, due to the great number of reconfiguration bits, it is impossible to obtain together a run-time reconfiguration and an efficient implementation, avoiding idle hardware resources . This paper proposes a new dynamic reconfigurable architecture that can be embedded in microprocessors or low cost DSPs to accelerate the execution of the above mentioned operations. The goal of ADAPTO (adder-based dynamic architecture for processing tailored operators) is to reduce the hardware complexity and the reconfiguration time, with respect to typical LUT based reconfigurable array. ADAPTO supports both hardware reconfiguration and instruction execution in the same processor clock cycle. This goal has been obtained by using a new reconfigurable unit based on full adders, instead LUTs, and simplifying the network interconnect.

international symposium on signals, circuits and systems | 2011

Implementation of the AES algorithm using a Reconfigurable Functional Unit

G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Salvatore Pontarelli; Marco Re; Adelio Salsano

Nowadays programmable devices (microprocessors and DSPs) are based on complex architectures optimized for obtaining maximum speed performances that degrades when the implemented application is mostly based on operations on single bit or subset of bits. This kind of data processing and bit manipulation operations can be accelerated by using a Reconfigurable Functional Unit (RFU). In this paper the benefits of using the ADAPTO RFU (Adder-Based Dynamic Architecture for Processing Tailored Operators) [1] [2] to speed up the Advanced Encryption Standard algorithm (AES) is investigated. The paper shows how the ADAPTO architecture is useful for the acceleration the AES algorithm due the efficient implementation of the most complex operations of the algorithm. A comparison in terms of number of assembly instructions is given.

international conference on electronics, circuits, and systems | 2014

TDES cryptography algorithm acceleration using a reconfigurable functional unit

G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Marco Re

Many cryptography algorithm contain a lots of data bit manipulation operations. Unfortunately, the Instruction Set Architecure (ISA) of general purpose microprocessors is usually word oriented. Consequently the execution of this kind of algorithms is not optimized and the computation of data represented by single bits or sub-words can require several clock cycles. Reconfigurable hardware accelerators oriented to the bit manipulation could accelerate the computation of these algorithms increasing the microprocessor performance in terms of execution time. This work presents the experimental results of the speed-up factor obtained for the implementation of TDES (Triple Data Encryption Standard) algorithm when a Reconfigurable Functional Unit ADAPTO [1] is integrated with a RISC microprocessor (the Altera NIOS-II soft processor [2]). The ADAPTO unit, described in VHDL (VHSIC Hardware Description Language), has been implemented on an Altera-Stratix II FPGA and integrated with the Nios soft processor using the Custom Logic feature [4]. The objective is the measurement of the speed-up factor related to the introduction of the reconfigurable hardware accelerator.

asilomar conference on signals, systems and computers | 2012

Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving

G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Marco Re; Ruby B. Lee

Many software functions are not efficiently executed by standard microprocessors. This happens when the operation granularity and data wordlength are different with respect to those of the microprocessors architecture. Important improvements in speed and power can be obtained by integrating hardware accelerators in standard microprocessor architectures. This work, based on [1], shows that the integration of a Bit Manipulation Unit (BMU) [2] in an Altera NIOS-2 soft processor architecture [3] allows very interesting speed-up and power saving factors.

international symposium on signals, circuits and systems | 2011

FPGA implementation of a low-area/high-SFDR DDFS architecture

G.C. Cardarilli; M. D'Alessio; L. Di Nunzio; Rocco Fazzolari; D. Murgia; Marco Re

This paper describes the FPGA implementation of a low area and high Spurious Free Dynamic Range (SFDR) Direct Digital Frequency Synthesizer (DDFS). The proposed architecture derives from the one proposed in [1] and fits perfectly in modern FPGA having DSP Blocks and/or embedded multipliers. The DDFS model in [1] was modified in order to reduce further the ROM size by a factor of 2 without worsen the SFDR and was implemented on a XILINX Virtex 5 FPGA. In this work we show that using the proposed hardware architecture, it is possible to reach very high SFDR (more than 157 dB) without impacting on the area occupancy. In fact traditional LUT-based DDFS has an exponential relationship between the ROM size and the number of phase bits.

International Conference on Applications in Electronics Pervading Industry, Environment and Society | 2016

Compressive Sensing Reconstruction for Complex System: A Hardware/Software Approach.

Simone Acciarito; G.C. Cardarilli; L. Di Nunzio; Rocco Fazzolari; Gaurav Mani Khanal; Marco Re

Today, a number of applications need to process large bandwidth signals. These applications frequently require the use of fast ADCs and very efficient DSP structures that are difficult to design. An interesting solution for facing these issues is the Compressive Sensing (CS) method, which, assuming to know some properties of the signal, allows to reduce the sampling rate well below the Nyquist rate. A negative aspect of CS is the need to introduce an additional element for the reconstruction the sampled signal. This reconstruction requires techniques that generally have an high computational cost, representing a critical element for a real-time implementation of CS systems. In this work we present the implementation of one of these reconstruction algorithms, named Orthogonal Matching Pursuit (OMP). This algorithm involves heavy computational cost (in particular for the matrix computation), which limits its use in the case of a strictly real-time applications, as in the case of radar systems. To overcome this limitation authors propose a solution that uses for the implementation a mixed software/hardware approach. The proposed architecture was implemented on the Xilinx ZYNQ FPGA. The experimental results show a significant speed-up of the algorithm.

international symposium on circuits and systems | 2009

Speed-up of RISC processor computation using ADAPTO

G.C. Cardarilli; L. Di Nunzio; Marco Re

In previous works ([1], [2] and [3]) the authors presented ADAPTO (Adder-based Dynamic Architecture for Processing Tailored Operators), a Reconfigurable Functional Unit (RFU) that accelerates computations on data of shorter size than the native processor wordlength. ADAPTO is a reconfigurable array inserted directly in the data-path of the microprocessor in order to reduce the communication overhead between the reconfigurable unit and the microprocessor. An important feature of ADAPTO is the capacity to reconfigure itself and execute operations in one clock cycle. ADAPTO, differently from other architectures presented in the literature ([6] [7]) is based on Full-Adders (FA) instead of LUTs. The FA can be configured to perform logical and arithmetical operations with the advantage of a less number of transistors than in the case of a LUT approach. In this paper we show how ADAPTO increases the performance of a RISC processor in the executions of algorithm processing short size data.

Explore More