Manuel Comparetti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manuel Comparetti is active.

Explore More

Publication

Featured researches published by Manuel Comparetti.

International Journal of High Performance Systems Architecture | 2010

Way adaptable D-NUCA caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditional cache. The addition of a migration mechanism to move the most frequently accessed data towards the cache controller (D-NUCA) further improves the average access latency. In this work we propose a last-level cache design, based on the D-NUCA scheme, which is able to significantly limit its static power consumption by dynamically adapting to the needs of the running application: the way adaptable D-NUCA cache. This design leads to a fast and power-efficient memory hierarchy with an average reduction by 31.2% in energy-delay product (EDP) with respect to a traditional D-NUCA. We propose and discuss a methodology for tuning the intrinsic parameters of our design and investigate the adoption of the way adaptable D-NUCA scheme as a shared L2 cache in a chip multiprocessor (CMP) system (24% reduction of EDP).

digital systems design | 2008

Leveraging Data Promotion for Low Power D-NUCA Caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete; Per Stenström

D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion/demotion mechanism, are able to tolerate the increasing wire delay effects introduced by technology scaling. As a consequence, they will outperform conventional caches (UCA, Uniform CacheArchitectures) in future generation cores. Due to the promotion/ demotion mechanism, we observed that the distribution of hits across the ways of a D-NUCA cache varies across applications as well as across different execution phases within a single application. In this work, we show how such a behavior can be leveraged to improve the D-NUCA power efficiency as well as to decrease its access latency.In particular, we propose: 1) A new micro-architectural technique to reduce the static power consumption of a D-NUCA cache by dynamically adapting the number of active (i.e. powered-on) ways to the need of the running application; our evaluation shows that a strong reduction of the average number of active ways (37.1%) is achievable, without significantly affecting the IPC (-2.83%), leading to a resultant reduction of the Energy Delay Product (EDP) of 30.9%. 2) A strategy to estimate the characteristic parameters of the proposed technique. 3) An evaluation of the effectiveness of the proposed technique in the multicore environment.

design, automation, and test in europe | 2009

A power-efficient migration mechanism for D-NUCA caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

D-NUCA L2 caches are able to tolerate the increasing wire delay effects due to technology scaling thanks to their banked organization, broadcast line search and data promotion/demotion mechanism. Data promotion mechanism aims at moving frequently accessed data near the core, but causes additional accesses on cache banks, hence increasing dynamic energy consumption. We shown how, in some cases, this migration mechanism is not successful in reducing data access latency and can be selectively and dynamically inhibited, thus reducing dynamic energy consumption without affecting performances.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Evaluation of Leakage Reduction Alternatives for Deep Submicron Dynamic Nonuniform Cache Architecture Caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Cosimo Antonio Prete

Wire delays and leakage energy consumption are both growing problems in designing large on-chip caches. Nonuniform cache architecture (NUCA) is a wire-delay aware design paradigm based on the sub-banking of a cache, which allows the banks closer to the controller to be accessed with reduced latencies with respect to the other banks. This feature is leveraged by dynamic NUCA (D-NUCA) caches via a migration mechanism which speeds up frequently used data access, further reducing the effect wire delays have on performance. To reduce leakage power consumption of static random access memory caches, various micro-architectural techniques have been proposed. In this brief, we compare the benefits and limits of the application of some of these techniques to a D-NUCA cache memory, and propose a novel hybrid scheme based on the Drowsy and Way Adaptable techniques. Such a scheme allows further improvement in leakage reduction and limits the impact of process variation on the effectiveness of the Drowsy technique.

Iet Computers and Digital Techniques | 2009

Impact of on-chip network parameters on nuca cache performances

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

Non-uniform cache architectures (NUCAs) are a novel design paradigm for large last-level on-chip caches, which have been introduced to deliver low access latencies in wire-delay-dominated environments. Their structure is partitioned into sub-banks and the resulting access latency is a function of the physical position of the requested data. Typically, NUCA caches employ a switched network, made up of links and routers with buffered queues, to connect the different sub-banks and the cache controller, and the characteristics of the network elements may affect the performance of the entire system. This work analyses how different parameters for the network routers, namely cut-through latency and buffering capacity, affect the overall performance of NUCA-based systems for the single processor case, assuming a reference NUCA organisation proposed in literature. The entire analysis is performed utilising a cycle-accurate execution-driven simulator of the entire system and real workloads. The results indicate that the sensitivity of the system to the cut-through latency is very high, thus limiting the effectiveness of the NUCA solution, and that modest buffering capacity is sufficient to achieve a good performance level. As a consequence, in this work we propose an alternative clustered NUCA organisation that limits the average number of hops experienced by cache accesses. This organisation is better performing and scales better as the cut-through latency increases, thus simplifying the implementation of routers, and it is also more effective than another latency reduction solution proposed in literature (hybrid network).

symposium on computer architecture and high performance computing | 2008

Performance Sensitivity of NUCA Caches to On-Chip Network Parameters

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

Non uniform cache architectures (NUCA) are a novel design paradigm for large last-level on-chip caches that has been introduced to deliver low access latencies in wire delay dominated environments. Their structure is partitioned into sub-banks and the resulting access latency is a function of the physical position of the requested data. Typically, to connect the different sub-banks and the cache controller, NUCA caches employ a switched network, made up of links and routers with buffered queues; the characteristics of such switched network may affect the performance of the entire system. This work analyzes how different parameters for the routers, namely cut-through latency and buffering capacity, affect the overall performance of NUCA based systems for the single processor case, assuming a reference organization proposed in literature. The results indicate that the sensitivity of the system to the cut-through latency is very high and that limited buffering capacity is sufficient to achieve a good performance level. As a consequence, we propose an alternative NUCA organization that limits the average number of hops experienced by cache accesses. This organization is better performing in most of the cases and scales better as the cut-through latency increases, thus simplifying the implementation of routers.

The Journal of Supercomputing | 2014