Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Claudio Mucci is active.

Publication


Featured researches published by Claudio Mucci.


international solid-state circuits conference | 2005

XiSystem: a XiRisc-based SoC with a reconfigurable IO module

Andrea Cappelli; Andrea Lodi; Massimo Bocchi; Claudio Mucci; Massimiliano Innocenti; C. De Bartolomeis; Luca Ciccarelli; Roberto Giansante; Antonio Deledda; Fabio Campi; Mario Toma; Roberto Guerrieri

In the nanometer era, the increase in nonrecurring engineering costs is a challenge for SoCs that can be faced through a standardization process. Hardware specialization of a standard platform to a given application can be achieved by exploiting reconfigurable technology. This paper presents a XiSystem SoC, which integrates two different field-programmable devices to provide application-specific computing blocks and IOs. A XiRisc reconfigurable processor is exploited to achieve more than one order of magnitude speed-up and energy consumption reduction vis-a/spl grave/-vis a DSP-like processor, while an eFPGA is integrated in the system in order to make it flexible enough to support various IO ports and protocols. The reconfigurable IO device is also utilized for pre/post data processing and implementation of some standard computational blocks.


design, automation, and test in europe | 2007

A dynamically adaptive DSP for heterogeneous reconfigurable platforms

Fabio Campi; Antonio Deledda; Matteo Pizzotti; Luca Ciccarelli; Pier Luigi Rolandi; Claudio Mucci; Andrea Lodi; Arseni Vitkovski; Luca Vanzolini

This paper describes a digital signal processor based on a multi-context, dynamically reconfigurable datapath, suitable for inclusion as an IP-block in complex SoC design projects. The IP was realized in CMOS 090 nm technology. The most relevant features offered by the proposed architecture with respect to state of the art are zero overhead for switching between successive configurations, relevant area and energy computational density on computational kernels (average of 2 GOPS/mm2, 0.2GOPS/mW) and relatively small area occupation (18 mm2), making it suitable for acceleration or upgrade of multi-core heterogeneous embedded platforms. The processor is delivered with a software tool chain providing the application developer algorithmic analysis and design space exploration based on ANSI C, with no utilization of hardware-related constructs or description languages


international symposium on system-on-chip | 2003

A C-based algorithm development flow for a reconfigurable processor architecture

Claudio Mucci; Carlo Chiesa; Andrea Lodi; Mario Toma; Fabio Campi

Reconfigurable processors are an appealing option to achieve high performance and low energy consumption in digital signal processing, but their utilization often involves hardware issues not usual for algorithm developers proficient in high level languages. This paper presents a C-based algorithm development flow for XiRisc, a reconfigurable processor architecture targeted at embedded systems, that couples a VLIW risc core with a custom designed programmable hardware unit optimized for being programmed starting from data flow graph (DFG) descriptions. Starting from C-language, the flow produces both executable codes for the processor core and configuration bits for the embedded programmable unit. The proposed flow was utilized for implementing a set of DSP algorithms on a prototypal 0.18 /spl mu/m XiRisc test-chip obtaining performance speed-ups up to 10x and energy consumption reduction up to 75%.


field-programmable custom computing machines | 2004

A dataflow control unit for C-to-configurable pipelines compilation flow

Andrea Cappelli; Andrea Lodi; Claudio Mucci; Mario Toma; Fabio Campi

In the field of embedded systems, reconfigurable processors, composed of a standard processor core coupled with a reconfigurable device, are gaining more and more importance. Algorithm developers are facing the issue of mapping applications on configurable hardware, without a specific knowledge of the underlying architecture. In this paper, we present a modular data flow control unit for a reconfigurable datapath, which can be easily programmed starting from the C description of the required functionality.


IEEE Transactions on Parallel and Distributed Systems | 2015

Power-Aware Job Scheduling on Heterogeneous Multicore Architectures

Matteo Chiesi; Luca Vanzolini; Claudio Mucci; Eleonora Franchi Scarselli; Roberto Guerrieri

This paper presents a power-aware scheduling algorithm based on efficient distribution of the computing workload to the resources on heterogeneous CPU-GPU architectures. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The algorithm can be used in concert with adjustable power state software services in order to further reduce the computing cost during high demand periods. Although our study relies on GPU workloads, the approach can be extended to other heterogeneous computer architectures. The algorithm has been implemented in a real CPU-GPU heterogeneous system. Experiments prove that the approach presented reduces peak power by 10 percent compared to a system without any power-aware policy and by up to 24 percent with respect to the worst case scenario with an execution time increase in the range of 2 percent. This leads to a reduction in the system and service costs.


field-programmable logic and applications | 2006

A Multi-Context Pipelined Array for Embedded Systems

Andrea Lodi; Claudio Mucci; Massimo Bocchi; Andrea Cappelli; Mario De Dominicis; Luca Ciccarelli

The integration of a reconfigurable device into complex SoCs is a common request aimed at adding software programmable efficient computational blocks to a system. In such environment a traditional approach in FPGA design could not meet the need for an easy-to-use and easy-to-integrate device. This paper presents the PiCoGA-II reconfigurable datapath which has been designed as a multi-context array to provide fast dynamic reconfiguration. Architectural choices to reduce the area overhead of this approach are described. A reconfigurable dedicated control unit provides a clear interface for an easy integration of the device together with a hardware support for a programming Mow starting from a sequential high-level language. The logic cells have been redesigned with respect to the previous version, to improve their computational efficiency and flexibility. The PiCoGA-II has been fabricated in 0.13mum CMOS technology. The implementation of several MPEG-2 kernels shows that the multi-context array has a computational density which is 2times higher than an equivalent single-context one and is 2times higher than a Virtex-II FPGA when all the 4 contexts are utilized


design, automation, and test in europe | 2008

Design of a HW/SW communication infrastructure for a heterogeneous reconfigurable processor

Antonio Deledda; Claudio Mucci; Arseni Vitkovski; M. Kuehnle; F. Ries; Michael Huebner; Jürgen Becker; Philippe Bonnot; A. Grasset; Philippe Millet; Marcello Coppola; Lorenzo Pieralisi; Riccardo Locatelli; Giuseppe Maruccia; Fabio Campi; T. DeMarco

Reconfigurable architectures and NoC (Network-on- Chip) have introduced new research directions for technology and flexibility issues, which have been largely investigated in the last decades. Exploiting run-time adaptivity opens a new area of research by considering dynamic reconfiguration. In this paper, we present the architecture and associated development tools of an heterogeneous reconfigurable SoC focusing on the chosen communication infrastructure. The SOC integrates units of various sizes of reconfiguration granularity. The included NoC approach demonstrates the mentioned benefits and scalability for actual and future SoC design. On a reference CMOS090 implementation the described interconnect system works at the system reference frequency of 200 MHZ sustaining the required run-time bandwidth on a set of reference applications, at a price < 10% in area in power consumption with respect to the overall system.


custom integrated circuits conference | 2004

A XiRisc-based SoC for embedded DSP applications

Massimo Bocchi; C. De Bartolomeis; Claudio Mucci; Fabio Campi; Andrea Lodi; Mario Toma; Roberto Canegallo; Roberto Guerrieri

Reconfigurable computing can face many of the current embedded systems design issues, providing a high degree of flexibility and increasing energy efficiency of computation. This paper introduces the architecture of a system on chip for signal processing applications, including an XiRisc reconfigurable processor as the main computational core. This RISC processor features an extensible instruction set, obtained through dynamic reconfiguration of a programmable gate-array embedded as a processor datapath function unit. A prototype chip has been implemented in 0.13 /spl mu/m CMOS technology. The SoC operates at 166 MHz clock speed and the test of several DSP algorithms showed speed-ups ranging from 5/spl times/ to 80/spl times/ with 65%-95% energy savings. As proof of the architectural improvement, energy and area computational efficiency has grown by a factor ranging from 3/spl times/ to 35/spl times/.


international symposium on system-on-chip | 2007

Intelligent cameras and embedded reconfigurable computing: a case-study on motion detection

Claudio Mucci; Luca Vanzolini; Antonio Deledda; Fabio Campi; Gerard Gaillat

Image processing for intelligent cameras like those used in video surveillance applications implies computational demanding algorithms activated in function of non predictable events, such as the content of the image or user requests. For such applications, hardwired acceleration must be restricted to a minimum subset of kernels, due to the increasing NREs when application update become necessary. Embedded reconfigurable processors, coupling in the same computing engine a general-purpose embedded processor and field-programmable fabrics, provide an appealing trade-off point between pure software and dedicated hardware acceleration. As a case-study, this paper presents the implementation of a set of image processing operators utilized for motion detection on the DREAM adaptive DSP. With respect to pure software solutions, the proposed implementation achieves a performance improvement of 2-3 orders of magnitude, while retaining the same degree of programmability and the same economical perspectives from the end-user point of view of processor-based approaches.


design, automation, and test in europe | 2007

Implementation of AES/Rijndael on a dynamically reconfigurable architecture

Claudio Mucci; Luca Vanzolini; Andrea Lodi; Antonio Deledda; Roberto Guerrieri; Fabio Campi; Mario Toma

Reconfigurable architectures provide the user the capability to couple performance typical of hardware design with the flexibility of the software. This paper presents the design of AES/Rijndael on a dynamically reconfigurable architecture. A performance improvement of three order of magnitude was shown compared to the reference code and up to 24times speed-up figure wrt fast C implementations over a RISC processor. A maximum throughput of 546 Mbit/sec is achieved. Compared to prior art, a better energy efficiency with respect to the other programmable solutions was shown, obtaining up to 3 Mbit/sec/mW

Collaboration


Dive into the Claudio Mucci's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrea Lodi

École Polytechnique de Montréal

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge