Alejandro Valero
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alejandro Valero.
international symposium on microarchitecture | 2009
Alejandro Valero; Julio Sahuquillo; Salvador Petit; Vicente Lorente; Ramon Canal; Pedro López; José Duato
SRAM and DRAM cells have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not destructive. In contrast, DRAM cells provide higher density and minimal leakage energy since there are no paths within the cell from Vdd to ground. Recently, DRAM cells have been embedded in logic-based technology, thus overcoming the speed limit of typical DRAM cells. In this paper we propose an n-bit macrocell that implements one static cell, and n-1 dynamic cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Our study shows that in a four-way set-associative cache with this macrocell compared to an SRAM based with the same capacity, leakage is reduced by about 75% and area more than half with a minimal impact on performance. Architectural mechanisms have also been devised to avoid refresh logic. Experimental results show that no performance is lost when the retention time is larger than 50 K processor cycles. In addition, the proposed delayed writeback policy that avoids refreshing performs a similar amount of writebacks than a conventional cache with the same organization, so no power wasting is incurred.
ieee antennas and propagation society international symposium | 2008
Esperanza Alfonso; Per-Simon Kildal; Alejandro Valero; Jose I. Herranz
An oversized rectangular waveguide with one hard wall was previously demonstrated to be suitable for feeding a planar slot array because it has higher-order-mode-killing properties, i.e. it supports propagation of a single quasi-TEM parallel-plate-type mode, and no higher order modes. In the present paper we detect instead many local quasi-TEM waves that follow the hard surface with identical propagation constants. The local quasi-TEM waves can be excited to produce a combined mode that is the single quasi-TEM parallel-plate-type mode. We propose to characterize this interesting feed guide in terms of the performance of the local quasi-TEM waves.
IEEE Transactions on Computers | 2015
Alejandro Valero; Julio Sahuquillo; Salvador Petit; Pedro López; José Duato
In recent years, embedded dynamic random-access memory (eDRAM) technology has been implemented in last-level caches due to its low leakage energy consumption and high density. However, the fact that eDRAM presents slower access time than static RAM (SRAM) technology has prevented its inclusion in higher levels of the cache hierarchy. This paper proposes to mingle SRAM and eDRAM banks within the data array of second-level (L2) caches. The main goal is to achieve the best trade-off among performance, energy, and area. To this end, two main directions have been followed. First, this paper explores the optimal percentage of banks for each technology. Second, the cache controller is redesigned to deal with performance and energy. Performance is addressed by keeping the most likely accessed blocks in fast SRAM banks. In addition, energy savings are further enhanced by avoiding unnecessary destructive reads of eDRAM blocks. Experimental results show that, compared to a conventional SRAM L2 cache, a hybrid approach requiring similar or even lower area speedups the performance on average by 5.9 percent, while the total energy savings are by 32 percent. For a 45 nm technology node, the energy-delay-area product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless of the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is an eighth of the total number of cache banks.
IEEE Transactions on Computers | 2012
Alejandro Valero; Salvador Petit; Julio Sahuquillo; Pedro López; José Duato
SRAM and DRAM have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not destructive. In contrast, DRAM cells provide higher density and minimal leakage energy since there are no paths within the cell from Vdd to ground. Recently, DRAM cells have been embedded in logic-based technology (eDRAM), thus overcoming the speed limit of typical DRAM cells. In this paper, we propose a hybrid n-bit macrocell that implements one SRAM cell and n-1 eDRAM cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Architectural mechanisms (e.g., special writeback policies) have been devised to completely avoid refresh logic. Performance, energy, and area have been analyzed in detail. Experimental results show that using typical eDRAM capacitors, and compared to a conventional cache, a 4-way set-associative hybrid cache reduces both energy consumption and area up to 54 and 29 percent, respectively, while having negligible impact on performance (less than 2 percent).
design, automation, and test in europe | 2013
Vicente Lorente; Alejandro Valero; Julio Sahuquillo; Salvador Petit; Ramon Canal; Pedro López; José Duato
Low-power modes in modern microprocessors rely on low frequencies and low voltages to reduce the energy budget. Nevertheless, manufacturing induced parameter variations can make SRAM cells unreliable producing hard errors at supply voltages below Vccmin.
ieee antennas and propagation society international symposium | 2005
C. Sudrez-Fajardo; M. Ferrando-Batallur; Alejandro Valero; Vicent M. Rodrigo
The switched beam antenna, a smart antenna technology, consists of an array of antennas, a beamforming network and a switching matrix. It may be used to generate n beams to improve the carrier to interference ratio (CIR) and the reuse of frequency in cellular systems, increasing system capacity. This paper describes simplified hardware and software techniques to improve antenna parameters such as: SLL, crossover, and beam orthogonality, by means of a pair of Butler matrices.
ACM Transactions on Architecture and Code Optimization | 2012
Alejandro Valero; Julio Sahuquillo; Salvador Petit; Pedro López; José Duato
Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache hierarchies with large or huge Last-Level Cache (LLC) organizations. LLC memories are implemented with a high number of ways (e.g., 16) to reduce conflict misses. Typically, caches have implemented the LRU algorithm to exploit temporal locality, but its performance goes away from the optimal as the number of ways increases. In addition, the implementation of a strict LRU algorithm is costly in terms of area and power. This article focuses on a family of low-cost replacement strategies, whose implementation scales with the number of ways while maintaining the performance. The proposed strategies track the accessing order for just a few blocks, which cannot be replaced. The victim is randomly selected among those blocks exhibiting poor locality. Although, in general, the random policy helps improving the performance, in some applications the scheme fails with respect to the LRU policy leading to performance degradation. This drawback can be overcome by the addition of a small victim cache of the large LLC. Experimental results show that, using the best version of the family without victim cache, MPKI reduction falls in between 10% and 11% compared to a set of the most representative state-of-the-art algorithms, whereas the reduction grows up to 22% with respect to LRU. The proposal with victim cache achieves speedup improvements, on average, by 4% compared to LRU. In addition, it reduces dynamic energy, on average, up to 8%. Finally, compared to the studied algorithms, hardware complexity is largely reduced by the baseline algorithm of the family.
international conference on computer design | 2012
Alejandro Valero; Julio Sahuquillo; Salvador Petit; Pedro López; José Duato
Cache memories have been typically implemented with Static Random Access Memory (SRAM) technology. This technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared embedded Dynamic RAM (eDRAM) technology allows caches to be built with lower energy and area, although with a slower access time. The eDRAM technology provides important leakage and area savings, especially in huge Last-Level Caches (LLCs), which occupy almost half the silicon area in some recent microprocessors. This paper proposes a novel hybrid LLC, which combines SRAM and eDRAM banks to address the trade-off among performance, energy, and area. To this end, we explore the optimal percentage of SRAM and eDRAM banks that achieves the best target trade-off. Architectural mechanisms have been devised to keep the most likely accessed blocks in fast SRAM banks as well as to avoid unnecessary destructive reads. Experimental results show that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5% of banks built with SRAM technology), whereas area savings can be as high as 46% for a 1MB-16way LLC. For a 45nm technology node, the energy-delay squared product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is a quarter or an eighth of the cache banks.
IEEE Transactions on Very Large Scale Integration Systems | 2012
Alejandro Valero; Julio Sahuquillo; Vicente Lorente; Salvador Petit; Pedro López; José Duato
Cache memories dissipate an important amount of the energy budget in current microprocessors. This is mainly due to cache cells are typically implemented with six transistors. To tackle this design concern, recent research has focused on the proposal of new cache cells. An n-bit cache cell, namely macrocell, has been proposed in a previous work. This cell combines SRAM and eDRAM technologies with the aim of reducing energy consumption while maintaining the performance. The capacitance of eDRAM cells impacts on energy consumption and performance since these cells lose their state once the retention time expires. On such a case, data must be fetched from a lower level of the memory hierarchy, so negatively impacting on performance and energy consumption. As opposite, if the capacitance is too high, energy would be wasted without bringing performance benefits. This paper identifies the optimal capacitance for a given processor frequency. To this end, the tradeoff between performance and energy consumption of a macrocell-based cache has been evaluated varying the capacitance and frequency. Experimental results show that, compared to a conventional cache, performance losses are lower than 2% and energy savings are up to 55% for a cache with 10 fF capacitors and frequencies higher than 1 GHz. In addition, using trench capacitors, a 4-bit macrocell reduces by 29% the area of four conventional SRAM cells.
symposium on computer architecture and high performance computing | 2011
Alejandro Valero; Julio Sahuquillo; Salvador Petit; Pedro López; José Duato
Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, LLCs are implemented as large memory structures with high associativities. To exploit temporal locality, LRU is the replacement algorithm usually implemented in caches. However, for a high-associative cache, its implementation is costly in terms of area and power consumption. Indeed, LRU is not well suited for the LLC, because as this cache level does not see all memory accesses, it cannot cope with temporal locality. In addition, blocks must descend down to the LRU position of the stack before eviction, even when they are not longer useful. In this paper, we show that most of the blocks are not referenced again once they leave the MRU position. Moreover, the probability of being referenced again does not depend on the location on the LRU stack. Based on these observations, we define the number of MRU-Tours (MRUTs) of a block as the number of times that a block occupies the MRU position while it is stored in the cache, and propose the MRUT replacement algorithm, which selects the block to be replaced among the blocks that show only one MRUT. Variations of this algorithm have been also proposed to exploit both MRUT behavior and recency of information. Experimental results show that, compared to LRU, the proposal reduces the MPKI up to 22%, while IPC is improved by 48%.