Xiaoyao Liang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoyao Liang is active.

Explore More

Publication

Featured researches published by Xiaoyao Liang.

international symposium on microarchitecture | 2007

Process Variation Tolerant 3T1D-Based Cache Architectures

Xiaoyao Liang; Ramon Canal; Gu-Yeon Wei; David M. Brooks

Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These variations are especially detrimental to 6T SRAM (6-transistor static memory) structures and will become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic memory) cells. We provide a detailed comparison between 6T and 3T1D designs in the context of a L1 data cache. The effects of physical device variation on a 3T1D cache can be lumped into variation of data retention times. This paper proposes a range of cache refresh and placement schemes that are sensitive to retention time, and we show that most of the retention time variations can be masked by the microarchitecture when using these schemes. We have performed detailed circuit and architectural simulations assuming different degrees of variability in advanced technology nodes, and we show that the resulting memory architecture can tolerate large process variations with little or even no impact on performance when compared to ideal 6T SRAM designs. Furthermore, these designs are robust to memory cell stability issues and can achieve large power savings. These advantages make the new memory architectures a promising choice for on-chip variation-tolerant cache structures required for next generation microprocessors.

international symposium on microarchitecture | 2006

Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Xiaoyao Liang; David M. Brooks

Design variability due to die-to-die and within-die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high-performance microprocessors in future process technology generations. One serious manifestation of this increased variability is a reduction in the mean frequency of fabricated chips due to fluctuations in device characteristics causing reduced circuit performance. In this paper, we propose to mitigate the impact of variations through variable-latency register files and execution units which are key architectural components that may encounter variability problems. We also illustrate the importance of closing the gap in expected delay of these distinct structures. A post fabrication test and configuration strategy is proposed. We find that 23% mean frequency improvement with an average IPC loss of 3% (and never exceeding 5% for worst case chips) is possible for the 65nm technology node by properly adopting the proposed schemes

international symposium on computer architecture | 2013

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing; Yao Shen; Yao Lu; Shrikanth Ganapathy; Zhigang Mao; Minyi Guo; Ramon Canal; Xiaoyao Liang

The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional SRAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new challenges. Periodic refresh operations are needed to maintain data integrity. This is exacerbated with the scaling of eDRAM density, process variations and temperature. Unlike conventional CPUs which make use of multi-ported RF, most of the RFs in modern GPGPU are heavily banked but not multi-ported to reduce the hardware cost. This provides a unique opportunity to hide the refresh overhead. We propose two different eDRAM implementations based on 3T1D and 1T1C memory cells. To mitigate the impact of periodic refresh, we propose two novel refresh solutions using bank bubble and bank walk-through. Plus, for the 1T1C RF, we design an interleaved bank organization together with an intelligent warp scheduling strategy to reduce the impact of the destructive reads. The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.

international symposium on microarchitecture | 2008

Replacing 6T SRAMs with 3T1D DRAMs in the L1 Data Cache to Combat Process Variability

Xiaoyao Liang; Ramon Canal; Gu-Yeon Wei; David M. Brooks

With continued technology scaling, process variations will be especially detrimental to six-transistor static memory structures (6T SRAMs). A memory architecture using three-transistor, one-diode DRAM (3T1D) cells in the L1 data cache tolerates wide process variations with little performance degradation, making it a promising choice for on-chip cache structures for next-generation microprocessors.

high-performance computer architecture | 2012

AgileRegulator: A hybrid voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture

Guihai Yan; Yingmin Li; Yinhe Han; Xiaowei Li; Minyi Guo; Xiaoyao Liang

The widening gap between the fast-increasing transistor budget but slow-growing power delivery and system cooling capability calls for novel architectural solutions to boost energy efficiency. Leveraging the fact of surging “dark silicon” area, we propose a hybrid scheme to use both on-chip and off-chip voltage regulators, called “AgileRegulator”, for a multicore system to explore both coarse-grain and fine-grain power phases. We present two complementary algorithms: Sensitivity-Aware Application Scheduling (SAAS) and Responsiveness-Aware Application Scheduling (RAAS) to maximally achieve the energy saving potential of the hybrid regulator scheme. Experimental results show that the hybrid scheme achieves performance-energy efficiency close to per-core DVFS, without imposing much design cost. Meanwhile, the silicon overhead of this scheme is well contained into the “dark silicon”. Unlike other application specific schemes based on accelerators, the proposed scheme itself is a simple and universal solution for chip area and energy trade-offs.

international symposium on microarchitecture | 2009

Revival: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency

Xiaoyao Liang; Gu-Yeon Wei; David M. Brooks

Process variations will significantly degrade the performance benefits of future microprocessors as they move toward nanoscale technology. Device parameter fluctuations can introduce large variations in peak operation among chips, cores on a single chip, and microarchitectural blocks within one core. The revival technique combines the post-fabrication tuning techniques voltage interpolation (VI) and variable latency (VL) to reduce such frequency variations.

international conference on computer aided design | 2007

Architectural power models for SRAM and CAM structures based on hybrid analytical/empirical techniques

Xiaoyao Liang; Kerem Turgay; David M. Brooks

The need to perform power analysis in the early stages of the design process has become critical as power has become a major design constraint. Embedded and high-performance microprocessors incorporate large on-chip cache and similar SRAM-based or CAM-based structures, and these components can consume a significant fraction of the total chip power. Thus an accurate power modeling method for such structures is important in early architecture design studies. We present a unified architecture-level power modeling methodology for array structures which is highly-accurate, parameterizable, and technology scalable. We demonstrate the applicability of the model to different memory structures (SRAMs and CAMs) and include leakage-variability in advanced technologies. The power modeling approach is validated against HSPICE power simulation results, and we show power estimation accuracy within 5% of detailed circuit simulations.

international conference on computer aided design | 2006

Microarchitecture parameter selection to optimize system performance under process variation

Xiaoyao Liang; David M. Brooks

Design variability due to within-die and die-to-die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high-performance microprocessors in future process technology generations. This variability manifests itself by increasing the number and criticality of long delay paths. To quantify this impact, we use an architectural process variation model that is appropriate for the analysis of system performance in the early-stages of the design process. We propose a method of selecting microarchitectural parameters to mitigate the frequency impact due to process variability for distinct structures, while minimizing IPC (instructions-per-cycle) loss. We propose an optimization procedure to be used for system-level design decisions, and we find that joint architecture and statistical timing analysis can be more advantageous than pure circuit level optimization. Overall, the technique can improve the 90% yield frequency by about 14% with 3% IPC loss for a baseline machine with a 20FO4 logic depth per pipestage. This approach is sensitive to the selection of processor pipeline depth, and we demonstrate that machines with aggressive pipelines will experience greater challenges in coping with process variability

international symposium on low power electronics and design | 2013

Compiler assisted dynamic register file in GPGPU

Naifeng Jing; Haopeng Liu; Yao Lu; Xiaoyao Liang

The large Register File (RF) in General Purpose Graphic Processing Units (GPGPUs) demands tremendous chip area and energy consumption. For a sustainable growth of the size of RF in future GPGPUs, emerging on-chip memory technologies such as embedded-DRAM (eDRAM) have been proposed to replace the conventional SRAM for higher density and lower leakage but with the possible penalty from the periodic refresh operations. This paper explicitly shows that the refresh penalty can be effectively mitigated by leveraging the uniqueness of GPGPU operations. A compiler assisted refresh rescheduling policy can greatly reduce the refresh overhead for maintaining the correctness of the RF operations. The proposed scheme adequately exploits the features in both architecture and compilation, and delivers comparable performance to the SRAM counterpart. At the same time, the energy savings via the removal of large SRAM leakage well compensate for the additional refresh energy. This study promotes the eDRAM-based RF as a promising alternative that enables larger capacity and better power efficiency for future GPGPUs.

international conference on computer design | 2009

Empirical performance models for 3T1D memories

Kristen Lovin; Benjamin C. Lee; Xiaoyao Liang; David M. Brooks; Gu-Yeon Wei

Process variation poses a threat to the performance and reliability of the 6T SRAM cell. Research has turned to new memory cell designs, such as the 3T1D DRAM cell, as potential replacement designs. If designers are to consider 3T1D memory architectures, performance models are needed to better understand memory cell behavior. We propose a decoupled approach for collecting Monte Carlo HSPICE data, reducing simulation times by simulating memory array components separately based on their contribution to the worst-case critical path. We use this Monte Carlo data to train regression models, which accurately predict retention and access times of a 3T1D memory array with a median error of 7.39%.

Explore More