Simone Medardoni
University of Ferrara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simone Medardoni.
Iet Computers and Digital Techniques | 2009
Samuel Rodrigo; Simone Medardoni; Jose Flich; Davide Bertozzi; José Duato
The design of NoCs for multi-core chips introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are preferred, heterogeneous blocks, fabrication faults, reliability issues, and chip virtualization may lead to the need of irregular topologies or regions. In this situation, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. LBDR (logic-based distributed routing) is proposed as a new routing method that removes the need of using routing tables at all. LBDR enables the implementation of many routing algorithms on most of the practical topologies we might find in the near future in a multi-core system. From an initial topology and routing algorithm, a set of three bits per switch/output port is computed. Evaluation results show that, by using a small logic, LBDR mimics the performance of routing algorithms when implemented with routing tables, both in regular and irregular topologies.
networks on chips | 2010
Samuel Rodrigo; Jose Flich; Antoni Roca; Simone Medardoni; Davide Bertozzi; Jesus Camacho; Federico Silla; José Duato
The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge.In this paper, uLBDR (Universal Logic-Based Distributed Routing) is proposed as an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alternative to the use of routing tables (either at routers or at end-nodes). uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the trade-off between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30\% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the trade-off between fault tolerance and performance.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011
Samuel Rodrigo; Jose Flich; Antoni Roca; Simone Medardoni; Davide Bertozzi; Jesus Camacho; Federico Silla; José Duato
The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area, and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism, or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge. This paper presents universal logic-based distributed routing (uLBDR), an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of using routing tables. uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the tradeoff between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the tradeoff between fault tolerance and performance. Power consumption, area, and delay estimates are also provided highlighting the efficiency of the mechanism. To do this, different router models (one for CMPs and one for MPSoCs) have been designed as a proof concept.
design, automation, and test in europe | 2009
Daniele Ludovici; F. Gilabert; Simone Medardoni; Crispín Gómez; María Engracia Gómez; Pedro López; Georgi Gaydadjiev; Davide Bertozzi
Most of past evaluations of fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, and very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment of physical synthesis efficiency of fat-trees and at extrapolating silicon-aware performance figures to back-annotate in the system-level performance analysis. A 2D mesh is used as a reference architecture for comparison, and a 65 nm technology is targeted by our study. Finally, in an attempt to mitigate the implementation cost of k-ary n-tree topologies, we also review an alternative unidirectional multi-stage interconnection network which is able to simplify the fat-tree architecture and to minimally impact performance.
networks on chips | 2010
F. Gilabert; María Engracia Gómez; Simone Medardoni; Davide Bertozzi
Virtual channels are an appealing flow control technique for on-chip interconnection networks (NoCs), in that they can potentially avoid deadlock and improve link utilization and network throughput. However, their use in the resource constrained multi-processor system-on-chip (MPSoC) domain is still controversial, due to their significant overhead in terms of area, power and cycle time degradation. This paper proposes a simple yet efficient approach to VC implementation, which results in more area- and power-saving solutions than conventional design techniques. While these latter replicate only buffering resources for each physical link, we replicate the entire switch and prove that our solution is counter intuitively more area/power efficient while potentially operating at higher speeds. This result builds on a well-known principle of logic synthesis for combinational circuits (the area-performance trade-off when inferring a logic function into a gate-level netlist), and proves that when a designer is aware of this, novel architecture design techniques can be conceived.
networks on chips | 2008
F. Gilabert; Simone Medardoni; Davide Bertozzi; Luca Benini; María Engracia Gómez; Pedro López; José Duato
Networks-on-chip (NoCs) address the challenge to provide scalable communication bandwidth to tiled architectures in a power-efficient fashion. The 2-D mesh is currently the most popular regular topology used for on-chip networks in tile-based architectures, because it perfectly matches the 2-D silicon surface and is easy to implement. However, a number of limitations have been proved in the open literature, especially for long distance traffic. Two relevant variants of 2-D meshes are explored in this paper: high-dimensional and concentrated topologies. The novelty of our exploration framework includes the use of fast and accurate transaction level simulation to provide constraints to the physical synthesis flow, which is integrated with standard industrial toolchains for accurate physical implementation. Interestingly, this work illustrates how effectively the compared topologies can handle synchronization-intensive traffic patterns and accounts for chip I/O interfaces.
international symposium on system-on-chip | 2009
Tor Skeie; Frank Olaf Sem-Jacobsen; Samuel Rodrigo; Jose Flich; Davide Bertozzi; Simone Medardoni
The expected increase in number of cores on a single chip leads to the necessity of high-performance on chip interconnects (NoC). Furthermore, in order to fully utilize the abundance of cores, the chip is expected to support a number of applications running on the chip simultaneously. It is therefore necessary to partition the chip to support numerous applications without any risk of interference between them. The success of this depends on the flexibility of the underlying routing algorithm. This paper presents a flexible routing algorithm based on dimension ordered routing, which supports a large variety of irregular (2-D and 3-D) mesh topologies. The algorithm provides high efficiency at very low additional complexity, as is confirmed by experimental results.
complex, intelligent and software intensive systems | 2009
F. Gilabert; Daniele Ludovici; Simone Medardoni; Davide Bertozzi; L. Benini; Georgi Gaydadjiev
Regular multi-core processors are appearing in the embedded system market as high performance software programmable solutions. The use of regular interconnect fabrics for them allows fast design time, ease of routing, predictability of electrical parameters and good scalability. k-ary n-mesh topologies are candidate solutions for these systems, borrowed from the domain of off-chip interconnection networks. However, the on-chip integration has to deal with unique challenges at different levels of abstraction. From a technology viewpoint, interconnect reverse scaling causes critical paths to go across global links. Poor interconnect performance might also impact IP core speed depending on the synchronization mechanism at the interface. Finally, this might also conflict with the requirements that communication libraries employed in the MPSoC domain pose on the underlying interconnect fabric. This paper provides a comprehensive overview of these topics, by characterizing physical feasibility of representative k-ary n-mesh topologies and by providing silicon-aware system-level performance figures.
digital systems design | 2008
Alberto Ferrante; Simone Medardoni; Davide Bertozzi
Although preliminary analysis frameworks point out the performance speed-ups achievable by on-chip networks with respect to state-of-the-art interconnects, the area concern remains one of the most daunting challenges to make this interconnect technology mainstream. A common approach to relieve the problem consists of sharing most of network interface resources among a number of processor cores. However, buffering resources need to be replicated and control logic reaches a complexity that limits maximum achievable frequency. This paper proposes full sharing of network interface resources, including buffers, thus trading performance for area. While area improvements are significant, a number of physical and system-level effects might mitigate performance degradation, making our technique a promising solution for area efficient network-on-chip realizations across a range of operating conditions.
international symposium on system-on-chip | 2009
Samuel Rodrigo; Carles Hernandez; Jose Flich; Federico Silla; José Duato; Simone Medardoni; Davide Bertozzi; Andres Mejia; D. Dai
Network-on-Chip technology is gaining wide popularity for the interconnection of an increasing number of processor cores on the same silicon die. However, growing process variations cause interconnect malfunction or prevent the network from working at the intended frequency, directly impacting yield and manufacturing cost. Topology agnostic routing algorithms have the potential to tolerate process variations without degrading performance. We propose a three step methodology for evaluating routing algorithms in their ability to deal with variability. Using yield enhancement and operation speed preservation as the criteria, we demonstrate how this methodology can be used to select the best design choice among several plausible combinations of routing algorithms and implementations. Also, we show how an efficient table-less routing implementation can be used to minimise the impact of variability on manufacturing and operating frequency.