Dan Nicolaescu
University of California, Irvine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dan Nicolaescu.
design, automation, and test in europe | 2003
Dan Nicolaescu; Alexander V. Veidenbaum; Alexandru Nicolau
Modern embedded processors use data caches with higher and higher degrees of associativity in order to increase performance. A set-associative data cache consumes a significant fraction of the total power budget in such embedded processors. This paper describes a technique for reducing the D-cache power consumption and shows its impact on power and performance of an embedded processor. The technique utilizes cache line address locality to determine (rather than predict) the cache way prior to the cache access. It thus allows only the desired way to be accessed for both tags and data. The proposed mechanism is shown to reduce the average L1 data cache power consumption when running the MiBench embedded benchmark suite for 8, 16 and 32-way set-associate caches by, respectively, an average of 66%, 72% and 76%. The absolute power savings from this technique increase significantly with associativity. The design has no impact on performance and, given that it does not have mis-prediction penalties, it does not introduce any new non-deterministic behavior in program execution.
international conference on computer design | 2006
Dan Nicolaescu; Babak Salamat; Alexander V. Veidenbaum; Mateo Valero
L1 data caches in high-performance processors continue to grow in set associativity. Higher associativity can significantly increase the cache energy consumption. Cache access latency can be affected as well, leading to an increase in overall energy consumption due to increased execution time. At the same time, the static energy consumption of the cache increases significantly with each new process generation. This paper proposes a new approach to reduce the overall L1 cache energy consumption using a combination of way caching and fast, speculative address generation. A 16-entry way cache storing a 3-bit way number for recently accessed L1 data cache lines is shown sufficient to significantly reduce both static and dynamic energy consumption of the L1 cache. Fast speculative address generation helps to hide the way cache access latency and is highly accurate. The L1 cache energy-delay product is reduced by 10% compared to using the way cache alone and by 37% compared to the use of multiple MRU technique.
international conference on computer design | 2004
Alexander V. Veidenbaum; Dan Nicolaescu
Many embedded processors use highly associative data caches implemented using a CAM-based tag search. When high-associativity is desirable, CAM designs can offer performance advantages due to fast associative search. However, CAMs are not energy efficient. This paper describes a CAM-based cache design which uses prediction to reduce energy consumption. A last used prediction is shown to achieve an 86% prediction accuracy, on average. A new design integrating such predictor in the CAM tag store is described. A 30% average D-cache energy reduction is demonstrated for the MiBench programs with little additional hardware or impact on processor performance. Even better results can be achieved with another predictor design which increases prediction accuracy. Significant static energy reduction is also possible using this approach for the RAM data store.
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems | 2000
Dan Nicolaescu; Xiaomei Ji; Alexander V. Veidenbaum; Alexandru Nicolau; Rajesh K. Gupta
The performance of a computer system is highly dependent on the performance of the cache memory system. The traditional cache memory system has an organization with a line size that is fixed at design time. Miss rates for different applications can be improved if the line size could be adjusted dynamically at run time.We propose a system where the compiler can set the cache line size for different portions of the program and we show that the miss rate is greatly reduced as a result of this dynamic resizing.
ieee international conference on high performance computing data and analytics | 2005
Dan Nicolaescu; Alexander V. Veidenbaum; Alexandru Nicolau
Modern high-performance out-of-order processors use L1 caches with increasing degree of associativity to improve performance. Higher associativity is not always feasible for two reasons: it increases cache hit latency and energy consumption. One of the main reasons for the increased latency is a multiplexor delay to select one of the lines in a set. The multiplexor is controlled by a hit signal, which means that tag comparison needs to be completed before the multiplexor can be enabled. This paper proposes a new mechanism called Way Cache for setting the multiplexor ahead of time in order to reduce the hit latency. The same mechanism allows access to only one of the tag stores and only one corresponding data store per cache access, which reduces the energy consumption. Unlike way prediction, the Way Cache always contains correct way information - but has misses. The performance of Way Cache is evaluated and compared with Way Prediction for data and instruction caches. The performance of the Way Cache is also evaluated in the presence of a Cached Load/Store Queue, an integrated L0 cache-Load/Store Queue which significantly reduces the number of accesses to the L1 cache.
design, automation, and test in europe | 2004
Juan L. Aragón; Dan Nicolaescu; Alexander V. Veidenbaum; Ana-Maria Badulescu
This paper proposes a low-energy solution for CAM-based highly associative I-caches using a segmented word-line and a predictor-based instruction fetch mechanism. Not all instructions in a given I-cache fetch are used due to branches. The proposed predictor determines which instructions in a cache access will be used and does not fetch any other instructions. Results show an average I-cache energy savings of 44% over the baseline case and 6% over the segmented case with no negative impact on performance.
international symposium on low power electronics and design | 2003
Dan Nicolaescu; Alexander V. Veidenbaum; Alexandru Nicolau
ieee international conference on high performance computing data and analytics | 2000
Xiaomei Ji; Dan Nicolaescu; Alexander V. Veidenbaum; Alexandru Nicolau; Rajesh K. Gupta
modeling, analysis, and simulation on computer and telecommunication systems | 2004
Dan Nicolaescu; Alexander V. Veidenbaum; Alexandru Nicolau
Archive | 2006
Alexander V. Veidenbaum; Dan Nicolaescu