Daniele Ludovici
University of Ferrara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniele Ludovici.
design, automation, and test in europe | 2011
Alessandro Strano; Crispín Gómez; Daniele Ludovici; Michele Favalli; María Engracia Gómez; Davide Bertozzi
This paper proposes a built-in self-test/self-diagnosis procedure at start-up of an on-chip network (NoC). Concurrent BIST operations are carried out after reset at each switch, thus resulting in scalable test application time with network size. The key principle consists of exploiting the inherent structural redundancy of the NoC architecture in a cooperative way, thus detecting faults in test pattern generators too. At-speed testing of stuck-at faults can be performed in less than 1200 cycles regardless of their size, with an hardware overhead of less than 11%.
design, automation, and test in europe | 2009
Daniele Ludovici; F. Gilabert; Simone Medardoni; Crispín Gómez; María Engracia Gómez; Pedro López; Georgi Gaydadjiev; Davide Bertozzi
Most of past evaluations of fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, and very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment of physical synthesis efficiency of fat-trees and at extrapolating silicon-aware performance figures to back-annotate in the system-level performance analysis. A 2D mesh is used as a reference architecture for comparison, and a 65 nm technology is targeted by our study. Finally, in an attempt to mitigate the implementation cost of k-ary n-tree topologies, we also review an alternative unidirectional multi-stage interconnection network which is able to simplify the fat-tree architecture and to minimally impact performance.
complex, intelligent and software intensive systems | 2009
F. Gilabert; Daniele Ludovici; Simone Medardoni; Davide Bertozzi; L. Benini; Georgi Gaydadjiev
Regular multi-core processors are appearing in the embedded system market as high performance software programmable solutions. The use of regular interconnect fabrics for them allows fast design time, ease of routing, predictability of electrical parameters and good scalability. k-ary n-mesh topologies are candidate solutions for these systems, borrowed from the domain of off-chip interconnection networks. However, the on-chip integration has to deal with unique challenges at different levels of abstraction. From a technology viewpoint, interconnect reverse scaling causes critical paths to go across global links. Poor interconnect performance might also impact IP core speed depending on the synchronization mechanism at the interface. Finally, this might also conflict with the requirements that communication libraries employed in the MPSoC domain pose on the underlying interconnect fabric. This paper provides a comprehensive overview of these topics, by characterizing physical feasibility of representative k-ary n-mesh topologies and by providing silicon-aware system-level performance figures.
networks on chips | 2009
Daniele Ludovici; Alessandro Strano; Davide Bertozzi; Luca Benini; Georgi Gaydadjiev
With the advent of Networks-on-Chip (NoCs), the interest for mesochronous synchronizers is again on the rise due to the intricacies of skew-controlled chip-wide clock tree distribution. Recently proposed schemes agree on a source synchronous design style with some form of ping-pong buffering to counter timing and metastability concerns. However, the integration issues of such synchronizers in a NoC setting are still largely uncovered. Most schemes are in fact placed between communicating switches, thus neglecting the abrupt increase of buffering resources needed at switch input stages. This paper goes a step forward and aims at deep integration of the synchronizer in the switch architecture, thus merging key tasks such as synchronization, buffering and flow control into a unique architecture block. This paper compares the integrated and the loosely coupled solutions from a performance and area viewpoint, while devoting special attention to their robustness with respect to physical design parameters.
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Alessandro Strano; Daniele Ludovici; Davide Bertozzi
Customization of IP blocks in a multi-processor system-on-chip (MPSoC) is the historical approach to the cost-effective implementation of such systems. A recent trend consists of structuring a MPSoC into loosely coupled voltage and frequency islands to meet tight power budgets. In this context, synchronization between islands of synchronicity becomes a major design issue. Dual-clock FIFOs compare favorably with respect to synchronizer-based designs and pausible clocking interfaces from a performance viewpoint, but incur a significant area, power and latency overhead. This paper proposes a library of dual-clock FIFOs for cost-effective MPSoC design, where each architecture variant in the library has been designed to match well-defined operating conditions at the minimum implementation cost. Each FIFO synchronizer is suitable for plug-and-play insertion into the NoC architecture and selection depends on the performance requirements of the synchronization interface at hand. Above all, components of our synchronization library have not been conceived in isolation, but have been tightly co-designed with the switching fabric of the on-chip interconnection network, thus making a conscious use of power-hungry buffering resources and leading to affordable implementations in the resource constrained MPSoC domain.
Proceedings of the Fifth International Workshop on Interconnection Network Architecture | 2011
Daniele Ludovici; Alessandro Strano; Georgi Gaydadjiev; Davide Bertozzi
MPSoCs are today frequently designed as the composition of multiple voltage/frequency islands, thus calling for a GALS clocking style. In this context, the on-chip interconnection network can be either inferred as a single and independent clock domain or it can be distributed among cores domains. This paper targets the former scenario, since it results in the homogeneous speed of the NoC switching elements. From a physical design viewpoint, the main issues lie however in the chip-wide extension of the network domain and in the growing uncertainties affecting nanoscale silicon technologies. This paper proves that partitioning the network into mesochronous domains and merging synchronizers with NoC building blocks, two main advantages can be achieved. First, it is possible to evolve synchronous networks to mesochronous ones with marginal performance and area overhead. Second, the mesochronous NoC exposes more degrees of freedom for power optimization.
ACM Transactions in Embedded Computing Systems | 2013
Alberto Ghiribaldi; Daniele Ludovici; Francisco Triviño; Alessandro Strano; Jose Flich; José L. Sánchez; Francisco José Alfaro; Michele Favalli; Davide Bertozzi
Networks-on-chip need to survive to manufacturing faults in order to sustain yield. An effective testing and configuration strategy however implies two opposite requirements. One one hand, a fast and scalable built-in self-testing and self-diagnosis procedure has to be carried out concurrently at NoC switches. On the other hand, programming the NoC routing mechanism to go around faulty links and switches can be optimally performed by a centralized controller with global network visibility. To the best of our knowledge, this article proposes for the first time a global network testing and configuration strategy that meets the opposite requirements by means of a fault-tolerant dual network architecture and a fast configuration algorithm for the most common failure patterns. Experimental results report an area overhead as low as 12.5% with respect to the baseline switch architecture while achieving a high degree of fault tolerance. In fact, even when multiple stuck-at faults are considered, the capability of fault masking by the dual network is always over 80%, and the support for multiple link failures is more than 90% in presence of two unusable links in the main network with minimum set-up times.
2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip | 2011
Alberto Ghiribaldi; Daniele Ludovici; Michele Favalli; Davide Bertozzi
Networks-on-chip need to survive to manufacturing faults in order to sustain yield. An effective testing and configuration strategy however implies two opposite requirements. On one hand, a fast and scalable built-in self-testing and self-diagnosis procedure has to be carried out concurrently at NoC switches. On the other hand, programming the NoC routing mechanism to go around faulty links and switches can be optimally performed by a centralized controller with global network visibility. This paper proposes a global hardware infrastructure that meets such requirements by means of a fault-tolerant dual network architecture and a configuration strategy for reprogramming the routing mechanism of each switch. This is the first complete infrastructure for testing and reconfiguring a NoC based on reprogrammable routing logic.
great lakes symposium on vlsi | 2009
Daniele Ludovici; Georgi Gaydadjiev; Davide Bertozzi; Luca Benini
In the context of nanoscale networks-on-chip (NoCs), each link implementation solution is not just a specific synthesis optimization technique with local performance and power implications, but gives rise to a well-differentiated point in the architecture design space. This in an effect of the tight interaction existing between architecture and physical design layers in nanoscale technologies. This work assesses several NoC link inference techniques (buffering options, link pipelining) by means of commercial backend synthesis tools, taking the system-level perspective. In fact, performance speed-ups and power overhead are not evaluated for the links in isolation but for the network topology as a whole, thus showing their sensitivity to the link inference strategy. k-ary n-mesh topologies are considered for the sake of analysis, in that they provide a range of topologies with increasing total wirelength.
network on chip architectures | 2011
Hervé Tatenguem; Daniele Ludovici; Alessandro Strano; Davide Bertozzi; Helmut Reinig
Fine-grained (per-core) multi-synchronous systems calls for new clocking strategies and new architecture design techniques. This paper compares two fundamental multi-synchronous implementation variants based on the extensive use of dual-clock FIFOs vs mesochronous synchronizers respectively. The architecture-homogeneous experimental setting, the cost-effective merging of synchronizers with NoC switch buffers, the sharing of as many physical synthesis steps as possible between the two architectures and the requirements of a realistic full-HD video playback application are the key innovations of this study.