Alessandro Strano
University of Ferrara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alessandro Strano.
design, automation, and test in europe | 2011
Alessandro Strano; Crispín Gómez; Daniele Ludovici; Michele Favalli; María Engracia Gómez; Davide Bertozzi
This paper proposes a built-in self-test/self-diagnosis procedure at start-up of an on-chip network (NoC). Concurrent BIST operations are carried out after reset at each switch, thus resulting in scalable test application time with network size. The key principle consists of exploiting the inherent structural redundancy of the NoC architecture in a cooperative way, thus detecting faults in test pattern generators too. At-speed testing of stuck-at faults can be performed in less than 1200 cycles regardless of their size, with an hardware overhead of less than 11%.
international conference on embedded computer systems architectures modeling and simulation | 2012
Alessandro Strano; Davide Bertozzi; Francisco Triviño; José L. Sánchez; Francisco José Alfaro; Jose Flich
Current and future on-chip networks will feature an enhanced degree of reconfigurability. Power management and virtualization strategies as well as the need to survive to the progressive onset of wear-out faults are root causes for that. In all these cases, a non-intrusive and efficient reconfiguration method is needed to allow the network to function uninterruptedly over the course of the reconfiguration process while remaining deadlock-free. This paper is inspired by the overlapped static reconfiguration (OSR) protocol developed for off-chip networks. However, in its native form its implementation in NoCs is out-of-reach. Therefore, we provide a careful engineering of the NoC switch architecture and of the system-level infrastructure to support a cost-effective, complete and transparent reconfiguration process. Performance during the reconfiguration process is not affected and implementation costs (critical path and area overhead) are proved to be fully affordable for a constrained system. Less than 250 cycles are needed for the reconfiguration process of an 8×8 2D mesh with marginal impact on system performance.
networks on chips | 2009
Daniele Ludovici; Alessandro Strano; Davide Bertozzi; Luca Benini; Georgi Gaydadjiev
With the advent of Networks-on-Chip (NoCs), the interest for mesochronous synchronizers is again on the rise due to the intricacies of skew-controlled chip-wide clock tree distribution. Recently proposed schemes agree on a source synchronous design style with some form of ping-pong buffering to counter timing and metastability concerns. However, the integration issues of such synchronizers in a NoC setting are still largely uncovered. Most schemes are in fact placed between communicating switches, thus neglecting the abrupt increase of buffering resources needed at switch input stages. This paper goes a step forward and aims at deep integration of the synchronizer in the switch architecture, thus merging key tasks such as synchronization, buffering and flow control into a unique architecture block. This paper compares the integrated and the loosely coupled solutions from a performance and area viewpoint, while devoting special attention to their robustness with respect to physical design parameters.
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Alessandro Strano; Daniele Ludovici; Davide Bertozzi
Customization of IP blocks in a multi-processor system-on-chip (MPSoC) is the historical approach to the cost-effective implementation of such systems. A recent trend consists of structuring a MPSoC into loosely coupled voltage and frequency islands to meet tight power budgets. In this context, synchronization between islands of synchronicity becomes a major design issue. Dual-clock FIFOs compare favorably with respect to synchronizer-based designs and pausible clocking interfaces from a performance viewpoint, but incur a significant area, power and latency overhead. This paper proposes a library of dual-clock FIFOs for cost-effective MPSoC design, where each architecture variant in the library has been designed to match well-defined operating conditions at the minimum implementation cost. Each FIFO synchronizer is suitable for plug-and-play insertion into the NoC architecture and selection depends on the performance requirements of the synchronization interface at hand. Above all, components of our synchronization library have not been conceived in isolation, but have been tightly co-designed with the switching fabric of the on-chip interconnection network, thus making a conscious use of power-hungry buffering resources and leading to affordable implementations in the resource constrained MPSoC domain.
Proceedings of the Fifth International Workshop on Interconnection Network Architecture | 2011
Daniele Ludovici; Alessandro Strano; Georgi Gaydadjiev; Davide Bertozzi
MPSoCs are today frequently designed as the composition of multiple voltage/frequency islands, thus calling for a GALS clocking style. In this context, the on-chip interconnection network can be either inferred as a single and independent clock domain or it can be distributed among cores domains. This paper targets the former scenario, since it results in the homogeneous speed of the NoC switching elements. From a physical design viewpoint, the main issues lie however in the chip-wide extension of the network domain and in the growing uncertainties affecting nanoscale silicon technologies. This paper proves that partitioning the network into mesochronous domains and merging synchronizers with NoC building blocks, two main advantages can be achieved. First, it is possible to evolve synchronous networks to mesochronous ones with marginal performance and area overhead. Second, the mesochronous NoC exposes more degrees of freedom for power optimization.
ACM Transactions in Embedded Computing Systems | 2013
Alberto Ghiribaldi; Daniele Ludovici; Francisco Triviño; Alessandro Strano; Jose Flich; José L. Sánchez; Francisco José Alfaro; Michele Favalli; Davide Bertozzi
Networks-on-chip need to survive to manufacturing faults in order to sustain yield. An effective testing and configuration strategy however implies two opposite requirements. One one hand, a fast and scalable built-in self-testing and self-diagnosis procedure has to be carried out concurrently at NoC switches. On the other hand, programming the NoC routing mechanism to go around faulty links and switches can be optimally performed by a centralized controller with global network visibility. To the best of our knowledge, this article proposes for the first time a global network testing and configuration strategy that meets the opposite requirements by means of a fault-tolerant dual network architecture and a fast configuration algorithm for the most common failure patterns. Experimental results report an area overhead as low as 12.5% with respect to the baseline switch architecture while achieving a high degree of fault tolerance. In fact, even when multiple stuck-at faults are considered, the capability of fault masking by the dual network is always over 80%, and the support for multiple link failures is more than 90% in presence of two unusable links in the main network with minimum set-up times.
application specific systems architectures and processors | 2011
Alessandro Strano; Davide Bertozzi; Arnaud Grasset; Sami Yehia
Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wear-out inevitably make margining uneconomical or impossible. In this context, new design approaches are required. The inherent regularity and redundancy of SIMD architectures make them suitable to address the challenges posed by new semiconductor technologies at the architecture level. This paper proposes a built-in self-test/self-diagnosis procedure for a SIMD processor. Concurrent BIST operations are carried out after reset at each PE, thus resulting in scalable test application time with processor size. The key principle consists of exploiting the inherent structural redundancy of the SIMD architecture in a cooperative way, thus strongly reducing the testing framework latency and area overhead. Once the faults are detected, a reconfiguration technique is then proposed in order to preserve correct operation. Testing of single stuck-at faults is performed at-speed in 240 cycles regardless of the accelerator size, with a hardware overhead of less than 10%. Finally, the fault-tolerant tile integrating both BIST, reconfiguration logic and spare PE requires a 25% of total area overhead.
Proceedings of the 2012 Interconnection Network Architecture on On-Chip, Multi-Chip Workshop | 2012
Simone Terenzi; Alessandro Strano; Davide Bertozzi
Most BIST architectures use pseudo-random test pattern generators. However, whenever this technique has been applied to on-chip interconnection networks, overly large testing latencies have been reported. On the other hand, alternative approaches either suffer from large area penalties (like scan-based testing or the use of deterministic test patterns) or poor coverage of faults in the control path (functional testing). This paper presents the optimization of a built-in self-testing framework based on pseudo-random test patterns to the microarchitecture of network-on-chip switches. As a result, fault coverage and testing latency approach those achievable with deterministic test patterns while materializing relevant area savings and enhanced flexibility.
2012 International Green Computing Conference (IGCC) | 2012
Alberto Ghiribaldi; Alessandro Strano; Michele Favalli; Davide Bertozzi
The increasingly parallel landscape of embedded computing platforms is bringing the reliability concern for the on-chip interconnection network (NoC) to the forefront. While very few works in the open literature bring their error recovery mechanisms down to microarchitectural and physical implementation, this paper documents the effort of optimizing a baseline NoC switch architecture for different fault-tolerant strategies against single-event upsets. As key contributions achieved, we not only come up with a new efficient fault-tolerant flow control protocol, but also we contrast correction vs. retransmission oriented switch microarchitectures, each implementing both data and control path protection, with physical implementation awareness. The accuracy of the analysis methodology enables us to report counterintuitive power-reliability trade-offs between the design points, serving as guidelines for implementing fault-tolerant communication in a power-constrained environment.
network on chip architectures | 2011
Hervé Tatenguem; Daniele Ludovici; Alessandro Strano; Davide Bertozzi; Helmut Reinig
Fine-grained (per-core) multi-synchronous systems calls for new clocking strategies and new architecture design techniques. This paper compares two fundamental multi-synchronous implementation variants based on the extensive use of dual-clock FIFOs vs mesochronous synchronizers respectively. The architecture-homogeneous experimental setting, the cost-effective merging of synchronizers with NoC switch buffers, the sharing of as many physical synthesis steps as possible between the two architectures and the requirements of a realistic full-HD video playback application are the key innovations of this study.