Alessandro Bardine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alessandro Bardine is active.

Explore More

Publication

Featured researches published by Alessandro Bardine.

memory performance dealing with applications systems and architecture | 2007

Analysis of static and dynamic energy consumption in NUCA caches: initial results

Alessandro Bardine; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

NUCA caches are large L2 on-chip cache memories characterized by multi-bank partitioning and designed to hide wire delay effects. They exhibit high hit rates while keeping access latency low. Proposed designs for such caches are Static NUCA, in which data are statically allocated to the cache banks, and Dynamic NUCA, in which data may reside in different banks, and a migration mechanism is introduced to better tolerate wire delay effects. The two architectures permit to achieve different performances by acting on architectural parameters and data management policies, at the cost of different balances between static and dynamic power consumption and energy dissipation. In this work, we propose preliminary results of the characterization of such balances, by presenting an evaluation of performance and energy consumption of conventional UCAs, and Static and Dynamic NUCA caches. All the considered caches architectures are equal sized and they are supposed to be used in an aggressive high frequency system running some applications from the SPEC CPU2000 and the NAS Parallel Benchmarks suites. The experimental results obtained indicate that, although the migration of data contributes to increase the dynamic energy consumption in Dynamic NUCA caches, the higher IPC achieved permits to save static energy, which dominates the power/energy balance in all the considered architectures. As a consequence, such results would designate NUCA caches as the most performing and energy saving architectures. Besides, according to the obtained results, future power improvements for NUCA caches should concentrate on static energy, while, for the dynamic energy, the on-chip network is the most critical element. Migration of data is acceptable, since it has a positive impact on performance, and the increased dynamic energy is overwhelmed by the static energy savings resulting from the shorter execution time. In order to give a general validity to such statements, we need to explore more design space points for each architecture (by varying the running clock rate and other design parameters) and to evaluate them considering a larger set of benchmarks.

Computer-aided Design | 2012

A real-time configurable NURBS interpolator with bounded acceleration, jerk and chord error

Massimiliano Annoni; Alessandro Bardine; Stefano Campanelli; Pierfrancesco Foglia; Cosimo Antonio Prete

Advances in manufacturing technologies and in machine tools allow for unprecedented quality and efficiency in production lines, but also ask for new and increasing requirements on the motion planning and control systems. The increase of CPU processing power has permitted, in traditional CNC systems, the introduction of NURBS interpolation capabilities, thus determining a further increase in machining quality and efficiency. This has posed new and still unsolved issues, such as the need to satisfy multiple opposite constraints like limiting chord error, acceleration and jerk and offering real-time guarantees. In addition, the ability of privileging the production throughput by relaxing one or more of the previous constraints in a simple way, has emerged as another requirement of modern manufacturing plants. Nevertheless, none of the existing NURBS interpolators have these characteristics. In this work, we propose a NURBS interpolator that is able to satisfy all the manufacturing technology requirements and is able to respect, thanks to its bounded computational complexity, the position control real-time constraints. Such an interpolator is easily reconfigurable, i.e., it can relax some of the constraints while maintaining performances better than previously proposed solutions, and can be adapted in order to include constraints that were not originally considered. Performances of the proposed algorithm have been evaluated both by simulations and by real milling experiments.

International Journal of High Performance Systems Architecture | 2010

Way adaptable D-NUCA caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditional cache. The addition of a migration mechanism to move the most frequently accessed data towards the cache controller (D-NUCA) further improves the average access latency. In this work we propose a last-level cache design, based on the D-NUCA scheme, which is able to significantly limit its static power consumption by dynamically adapting to the needs of the running application: the way adaptable D-NUCA cache. This design leads to a fast and power-efficient memory hierarchy with an average reduction by 31.2% in energy-delay product (EDP) with respect to a traditional D-NUCA. We propose and discuss a methodology for tuning the intrinsic parameters of our design and investigate the adoption of the way adaptable D-NUCA scheme as a shared L2 cache in a chip multiprocessor (CMP) system (24% reduction of EDP).

digital systems design | 2008

Leveraging Data Promotion for Low Power D-NUCA Caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete; Per Stenström

D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion/demotion mechanism, are able to tolerate the increasing wire delay effects introduced by technology scaling. As a consequence, they will outperform conventional caches (UCA, Uniform CacheArchitectures) in future generation cores. Due to the promotion/ demotion mechanism, we observed that the distribution of hits across the ways of a D-NUCA cache varies across applications as well as across different execution phases within a single application. In this work, we show how such a behavior can be leveraged to improve the D-NUCA power efficiency as well as to decrease its access latency.In particular, we propose: 1) A new micro-architectural technique to reduce the static power consumption of a D-NUCA cache by dynamically adapting the number of active (i.e. powered-on) ways to the need of the running application; our evaluation shows that a strong reduction of the average number of active ways (37.1%) is achievable, without significantly affecting the IPC (-2.83%), leading to a resultant reduction of the Energy Delay Product (EDP) of 30.9%. 2) A strategy to estimate the characteristic parameters of the proposed technique. 3) An evaluation of the effectiveness of the proposed technique in the multicore environment.

design, automation, and test in europe | 2009

A power-efficient migration mechanism for D-NUCA caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

D-NUCA L2 caches are able to tolerate the increasing wire delay effects due to technology scaling thanks to their banked organization, broadcast line search and data promotion/demotion mechanism. Data promotion mechanism aims at moving frequently accessed data near the core, but causes additional accesses on cache banks, hence increasing dynamic energy consumption. We shown how, in some cases, this migration mechanism is not successful in reducing data access latency and can be selectively and dynamically inhibited, thus reducing dynamic energy consumption without affecting performances.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Evaluation of Leakage Reduction Alternatives for Deep Submicron Dynamic Nonuniform Cache Architecture Caches

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Cosimo Antonio Prete

Wire delays and leakage energy consumption are both growing problems in designing large on-chip caches. Nonuniform cache architecture (NUCA) is a wire-delay aware design paradigm based on the sub-banking of a cache, which allows the banks closer to the controller to be accessed with reduced latencies with respect to the other banks. This feature is leveraged by dynamic NUCA (D-NUCA) caches via a migration mechanism which speeds up frequently used data access, further reducing the effect wire delays have on performance. To reduce leakage power consumption of static random access memory caches, various micro-architectural techniques have been proposed. In this brief, we compare the benefits and limits of the application of some of these techniques to a D-NUCA cache memory, and propose a novel hybrid scheme based on the Drowsy and Way Adaptable techniques. Such a scheme allows further improvement in leakage reduction and limits the impact of process variation on the effectiveness of the Drowsy technique.

ACM Sigarch Computer Architecture News | 2007

Improving power efficiency of D-NUCA caches

Alessandro Bardine; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete; Per Stenström

D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion/demotion mechanism, are able to tolerate the increasing wire delay effects introduced by technology scaling. As a consequence, they will outperform conventional caches (UCA, Uniform Cache Architectures) in future generation cores. Due to the promotion/demotion mechanism, we have found that, in a D-NUCA cache, the distribution of hits on the ways varies across applications as well as across different execution phases within a single application. In this paper, we show how such a behavior can be utilized to improve D-NUCA power efficiency as well as to decrease its access latencies. In particular, we propose a new D-NUCA structure, called Way Adaptable D-NUCA cache, in which the number of active (i.e. powered-on) ways is dynamically adapted to the need of the running application. Our initial evaluation shows that a consistent reduction of both the average number of active ways (42% in average) and the number of bank access requests (29% in average) is achieved, without significantly affecting the IPC.

Iet Computers and Digital Techniques | 2009

Impact of on-chip network parameters on nuca cache performances

Alessandro Bardine; Manuel Comparetti; Pierfrancesco Foglia; Giacomo Gabrielli; Cosimo Antonio Prete

Non-uniform cache architectures (NUCAs) are a novel design paradigm for large last-level on-chip caches, which have been introduced to deliver low access latencies in wire-delay-dominated environments. Their structure is partitioned into sub-banks and the resulting access latency is a function of the physical position of the requested data. Typically, NUCA caches employ a switched network, made up of links and routers with buffered queues, to connect the different sub-banks and the cache controller, and the characteristics of the network elements may affect the performance of the entire system. This work analyses how different parameters for the network routers, namely cut-through latency and buffering capacity, affect the overall performance of NUCA-based systems for the single processor case, assuming a reference NUCA organisation proposed in literature. The entire analysis is performed utilising a cycle-accurate execution-driven simulator of the entire system and real workloads. The results indicate that the sensitivity of the system to the cut-through latency is very high, thus limiting the effectiveness of the NUCA solution, and that modest buffering capacity is sufficient to achieve a good performance level. As a consequence, in this work we propose an alternative clustered NUCA organisation that limits the average number of hops experienced by cache accesses. This organisation is better performing and scales better as the cut-through latency increases, thus simplifying the implementation of routers, and it is also more effective than another latency reduction solution proposed in literature (hybrid network).

international conference on ultra modern telecommunications | 2010

NURBS interpolator with confined chord error and tangential and centripetal acceleration control

Alessandro Bardine; Stefano Campanelli; Pierfrancesco Foglia; Cosimo Antonio Prete

NURBS interpolation is highly requested in CNC systems because it allows high speed and high accuracy machining. In this work, an algorithm for NURBS interpolation capable of limiting chord error, centripetal acceleration and tangential acceleration is proposed. The algorithm is composed of two stages that may be executed simultaneously. In the first stage, the algorithm breaks down the curve into segments and, for each segment, calculates the feedrate limit that allows to respect both chord error tolerance and maximum centripetal acceleration limit. The second stage is a speed-controlled interpolator with a tangential acceleration limited feedrate profile generated using information provided by first stage. Software simulations are performed to verify the fulfillment of constraints and performance is compared to those of speed-controlled interpolator and variable feedrate interpolator.

digital systems design | 2011

Energy Behaviour of NUCA Caches in CMPs

Alessandro Bardine; Pierfrancesco Foglia; Francesco Panicucci; Marco Solinas; Julio Sahuquillo

Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would result in unacceptable response latencies. NUCA (Non Uniform Cache Access) architecture has been proposed as a viable solution to hide the adverse impact of wires delay on performance. Many previous studies have focused on the effectiveness of NUCA architectures, but the study of the energy and power aspects of NUCA caches is still limited. In this work, we present an energy model specifically suited for NUCA-based CMP systems, together with a methodology to employ the model to evaluate the NUCA energy consumption. Moreover, we present a performance and energy dissipation analysis for two 8-core CMP systems with an S-NUCA and a D-NUCA, respectively. Experimental results show that, similarly to the monolithic processor, the static power also dominates the total power budget in the CMP system.

Explore More