Tanausu Ramirez
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tanausu Ramirez.
international symposium on microarchitecture | 2005
Adrian Cristal; Oliverio J. Santana; Francisco J. Cazorla; Marco Galluzzi; Tanausu Ramirez; Miquel Pericàs; Mateo Valero
Historically, advances in integrated circuit technology have driven improvements in processor microarchitecture and led to todays microprocessors with sophisticated pipelines operating at very high clock frequencies. However, performance improvements achievable by high-frequency microprocessors have become seriously limited by main-memory access latencies because main-memory speeds have improved at a much slower pace than microprocessor speeds. Its crucial to deal with this performance disparity, commonly known as the memory wall, to enable future high-frequency microprocessors to achieve their performance potential. To overcome the memory wall, we propose kilo-instruction processors-superscalar processors that can maintain a thousand or more simultaneous in-flight instructions. Doing so means designing key hardware structures so that the processor can satisfy the high resource requirements without significantly decreasing processor efficiency or increasing energy consumption.
high-performance computer architecture | 2008
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Mateo Valero
In this paper, we propose runahead threads (RaT) as a valuable solution for both reducing resource contention and exploiting memory-level parallelism in simultaneous multithreaded (SMT) processors. Our technique converts a resource intensive memory-bound thread to a speculative light thread under long-latency blocking memory operations. These speculative threads prefetch data and instructions with minimal resources, reducing critical resource conflicts between threads. We compare an SMT architecture using RaT to both state-of-the-art static fetch policies and dynamic resource control policies. In terms of throughput and fairness, our results show that RaT performs better than any other policy. The proposed mechanism improves average throughput by 37% regarding previous static fetch policies and by 28% compared to previous dynamic resource scheduling mechanisms. RaT also improves fairness by 36% and 30% respectively. In addition, the proposed mechanism permits register file size reduction of up to 60% in a SMT processor without performance degradation.
high-performance computer architecture | 2011
Javier Carretero; Xavier Vera; Jaume Abella; Tanausu Ramirez; Matteo Monchiero; Antonio González
The increasing device count and design complexity are posing significant challenges to post-silicon validation. Bug diagnosis is the most difficult step during post-silicon validation. Limited reproducibility and low testing speeds are common limitations in current testing techniques. Moreover, low observability defies full-speed testing approaches. Modern solutions like on-chip trace buffers alleviate these issues, but are unable to store long activity traces. As a consequence, the cost of post-Si validation now represents a large fraction of the total design cost. This work describes a hybrid post-Si approach to validate a modern load-store queue. We use an effective error detection mechanism and an expandable logging mechanism to observe the microarchitectural activity for long periods of time, at processor full-speed. Validation is performed by analyzing the log activity by means of a diagnosis algorithm. Correct memory ordering is checked to root the cause of errors.
international on line testing symposium | 2011
Nivard Aymerich; Asen Asenov; Andrew R. Brown; Ramon Canal; Binjie Cheng; Joan Figueras; Antonio González; Enric Herrero; Stanislav Markov; Miguel Miranda; Peyman Pouyan; Tanausu Ramirez; Antonio Rubio; I. Vatajelu; Xavier Vera; Xingsheng Wang; Paul Zuber
The TRAMS (Terascale Reliable Adaptive MEMORY Systems) project addresses in an evolutionary way the ultimate CMOS scaling technologies and paves the way for revolutionary, most promising beyond-CMOS technologies. In this abstract we show the significant variability levels of future 18 and 13nm device bulk-CMOS technologies as well as its dramatic effect on the yield of memory cells, and what kind of circuit solution would be required to maintain the current yield level. Later, we discuss the impact of errors at the system level, and different approaches at system level to adapt the heterogeneous systems to users requirements.
design, automation, and test in europe | 2013
Javier Carretero; Enric Herrero; Matteo Monchiero; Tanausu Ramirez; Xavier Vera
Soft error rates are estimated based on worst-case architectural vulnerability factor (AVF). Therefore, it makes tracking real-time accurate AVF very attractive to computer designers: more accurate AVF numbers will allow turning on more features at runtime while keeping the promised SDC and DUE rates. This paper presents a hardware mechanism based on linear regressions to estimate the AVF (SDC and DUE) of the register file for out-of-order cores. Our results show that we are able to have a high correlation factor at low cost.
international conference on parallel architectures and compilation techniques | 2010
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Onur Mutlu; Mateo Valero
Runahead Threads (RaT) is a promising solution that enables a thread to speculatively run ahead and prefetch data instead of stalling for a long-latency load in a simultaneous multithreading processor. With this capability, RaT can reduces resource monopolization due to memory-intensive threads and exploits memory-level parallelism, improving both system performance and single-thread performance. Unfortunately,the benefits of RaT come at the expense of increasing the number of executed instructions, which adversely affects its energy efficiency. In this paper, we propose Runahead Distance Prediction (RDP), a simple technique to improve the efficiency of Runahead Threads. The main idea of the RDP mechanism is to predict how far a thread should run ahead speculatively such that speculative execution is useful. By limiting the runahead distance of a thread, we generate efficient runahead threads that avoid unnecessary speculative execution and enhance RaT energy efficiency. By reducing runahead-based speculation when it is predicted to be not useful, RDP also allows shared resources to be efficiently used by non-speculative threads. Our results show that RDP significantly reduces power consumption while maintaining the performance of RaT, providing better performance and energy balance than previous proposals in the field.
international conference on parallel processing | 2009
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Mateo Valero
Memory-intensive threads can hoard shared resources without making progress on a multithreading processor (SMT), thereby hindering the overall system performance. A recent promising solution to overcome this important problem in SMT processors is Runahead Threads (RaT). RaT employs runahead execution to allow a thread to speculatively execute instructions and prefetch data instead of stalling for a long-latency load. The main advantage of this mechanism is that it exploits memory-level parallelism under long latency loads without clogging up shared resources. As a result, RaT improves the overall processor performance reducing the resource contention among threads. In this paper, we propose simple code semantic based techniques to increase RaT efficiency. Our proposals are based on analyzing the prefetch opportunities (usefulness) of loops and subroutines during runahead thread executions. We dynamically analyze these particular program structures to detect when it is useful or not to control the runahead thread execution. By means of this dynamic information, the proposed techniques make a control decision either to avoid or to stall the loop or subroutine execution in runahead threads. Our experimental results show that our best proposal significantly reduces the speculative instruction execution (33% on average) while maintaining and, even improving the performance of RaT (up to 3%) in some cases.
international conference on parallel architectures and compilation techniques | 2007
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Mateo Valero
In this work, we propose Runahead threads as a valuable solution for both exploiting memory-level parallelism and reducing resource contention in simultaneous multithreaded processors.
ACM Sigarch Computer Architecture News | 2007
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Mateo Valero
To alleviate the memory wall problem, current architectural trends suggest implementing large instruction windows able to maintain a high number of in-flight instructions. However, the benefits achieved by these recent proposals may be limited because more instructions are executed down the wrong path of a mispredicted branch. The larger number of misspeculated instructions involves increasing the energy consumed compared to traditional designs with smaller instruction windows. Our analysis shows that, for some SPEC2000 integer benchmarks, up to 2, 5X wrong-path load instructions are executed when the instruction window of a 4-way superscalar processor is increased from 256 to 1024 entries. This paper describes a simple speculative control technique to prevent wrong-path load instructions from being executed. Our technique extends the functionality of the load-store queue to block those load instructions that depend on a hard-to-predict conditional branch until it is resolved. If the branch is actually mispredicted, unnecessary cache misses can be avoided, saving energy down the wrong path. Furthermore, instructions that depend on a blocked load are not issued because their source values are not available, which also saves dynamic energy. Our results show that the proposed mechanism reduces, on average, up to 26% misspeculated load instructions and 18% wrong-path instructions without any performance loss.
Medea | 2006
Tanausu Ramirez; Alex Pajuelo; Oliverio J. Santana; Mateo Valero
To alleviate the memory wall problem, current architectural trends suggest implementing large instruction windows able to maintain a high number of in-flight instructions. However, the benefits achieved by these recent proposals may be limited because more instructions are executed down the wrong path of a mispredicted branch, likely polluting the processor caches. The larger number of misspeculated instructions involves increasing the energy consumed compared to traditional designs with smaller instruction windows. Our analysis shows that, for some SPEC2000 integer benchmarks, up to 2,5X wrong-path load instructions are executed when the instruction window of a 4-way superscalar processor is increased from 256 to 1024 entries.This paper describes a simple speculative control technique to prevent wrong-path load instructions from being executed. Our technique extends the functionality of the load-store queue to block those load instructions that depend on a hard-to-predict conditional branch until it is resolved. If the branch is actually mispredicted, unnecessary cache misses can be avoided, saving energy down the wrong path. Furthermore, instructions that depend on a blocked load are not issued because their source values are not available, which also saves energy. Our results show that the proposed mechanism reduces, on average, up to 26% misspeculated load instructions and 18% wrong-path instructions without any performance loss.