Liangzhen Lai
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liangzhen Lai.
design, automation, and test in europe | 2013
Liangzhen Lai; Vikas Chandra; Robert C. Aitken; Puneet Gupta
In situ monitoring is an accurate way to monitor circuit delay or timing slack, but usually incurs significant overhead. We observe that most existing slack monitoring methods exclusively focus on monitoring path ending registers, which is not cost efficient from power and area perspectives.
IEEE Transactions on Electron Devices | 2012
Greg Leung; Liangzhen Lai; Puneet Gupta; Chi On Chui
The variability impact of line edge roughness (LER) on sub-32-nm fin-shaped FET (FinFET) technologies is investigated from both device- and circuit-level perspectives using computer-aided design simulations. Resist-defined FinFETs exhibit sizeable device performance variation (up to 10% fluctuation in threshold voltage and 200% in leakage current) when subjected to fin roughness up to 1 nm root-mean-square amplitude. Spacer-defined FinFETs show negligible device performance variation and exhibit quadratic dependence with LER amplitude. For both patterning technologies, the resulting impact on large-scale digital-circuit performance variation is found to be minimal in terms of the overall delay mean and variation. This is attributed to self-averaging of uncorrelated LER effects between individual devices within the circuits, resulting in minimal delay impact for digital-circuit design. The impact of LER on leakage power variation is also found to be minimal for all technologies; however, the mean value increases by up to 40% for 15-nm resist FinFETs. On this basis, the impact of LER on sub-32-nm FinFET device-level variability is only significant for resist devices, whereas the resulting digital-circuit impact is important only in terms of mean leakage power increase.
international symposium on quality electronic design | 2012
Tuck-Boon Chan; Puneet Gupta; Andrew B. Kahng; Liangzhen Lai
As CMOS technology scales, circuit performance becomes more sensitive to manufacturing and environmental variations. Hence, there is a need to measure or monitor circuit performance during manufacturing and at runtime. Since each circuit may have different sensitivities to process variations, previous works have focused on synthesis of circuit performance monitors that are specific to a given design. In this work, we study the potential benefit of having multiple design-dependent monitors. We develop a systematic approach to the synthesis of multiple design-dependent monitors, as well as a corresponding delay estimation method. Our approach synthesizes design-dependent ring oscillators (DDROs) using standard library gates. This has the advantage of quick design turnaround time and reduced schedule impact, because the DDRO implementation can leverage automation in conventional implementation flows. Our delay estimation method seeks to minimize the number of parameters as well as computing resources (i.e., to limit information storage and exchange) used in delay estimation based on monitoring results. Experiments show that our delay estimation method using multiple DDROs reduces overestimation (timing margin) by up to 25% compared to use of a single DDRO.
international conference on hardware/software codesign and system synthesis | 2013
Lucas Francisco Wanner; Salma Elmalaki; Liangzhen Lai; Puneet Gupta; Mani B. Srivastava
Modern integrated circuits, fabricated in nanometer technologies, suffer from significant power/performance variation across-chip, chip-to-chip and over time due to aging and ambient fluctuations. Furthermore, several existing and emerging reliability loss mechanisms have caused increased transient, intermittent and permanent failure rates. While this variability has been typically addressed by process, device and circuit designers, there has been a recent push towards sensing and adapting to variability in the various layers of software. Current hardware platforms, however, typically lack variability sensing capabilities. Even if sensing capabilities were available, evaluating variability-aware software techniques across a significant number of hardware samples would prove exceedingly costly and time consuming. We introduce VarEMU, an extension to the QEMU virtual machine monitor that serves as a framework for the evaluation of variability-aware software techniques. VarEMU provides users with the means to emulate variations in power consumption and in fault characteristics and to sense and adapt to these variations in software. Through the use (and dynamic change) of parameters in a power model, users can create virtual machines that feature both static and dynamic variations in power consumption. Faults may be injected before or after, or completely replace the execution of any instruction. Power consumption and susceptibility to faults are also subject to dynamic change according to an aging model. A software stack for VarEMU features precise control over faults and provides virtual energy monitors to the operating system and processes. This allows users to precisely quantify and evaluate the effects of variations on individual applications. We show how VarEMU tracks energy consumption according to variation-aware power and aging models and give examples of how it may be used to quantify how faults in instruction execution affect applications.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2014
Liangzhen Lai; Vikas Chandra; Robert C. Aitken; Puneet Gupta
In situ monitoring is an accurate way to monitor circuit delay or timing slack, but usually incurs significant overhead. We observe that most existing slack monitoring methods focus exclusively on monitoring path endpoints, which is not cost efficient from power and area perspectives. In this paper, we first propose SlackProbe methodology, which inserts timing slack monitors like probes at a selected set of nets, including intermediate nets along critical paths. SlackProbe can be used to detect impending delay failures due to various reasons (process variations, ambient fluctuations, circuit aging, etc.) and can be used with various preventive actions (e.g., voltage/frequency scaling, clock stretching/time borrowing, etc.). Then we perform thorough analysis of the potential benefits and caveats of SlackProbe over conventional approaches in terms of number of monitors required, monitoring efficiency and observability, delay margin, and design perturbation. Experimental results on commercial processors show that with 5% extra timing margin, SlackProbe can reduce the number of monitors by 12-16X as compared to the number of monitors inserted at path ending pins. SlackProbe can also improve the monitoring efficiency by up to 1.9X and improve the monitoring observability by up to 32%, as compared to endpoint monitoring.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Tuck-Boon Chan; Puneet Gupta; Andrew B. Kahng; Liangzhen Lai
With CMOS technology scaling, circuit performance has become more sensitive to manufacturing and environmental variations. Hence, there is a need to measure or monitor circuit performance during manufacturing and at runtime. Since each circuit may have different sensitivities to process variations, previous works have focused on the synthesis of circuit performance monitors that are specific to a given design. We develop a systematic approach for the synthesis of multiple design-dependent monitors, as well as the corresponding calibration and delay estimation methods. Our approach synthesizes design-dependent ring oscillators (DDROs) using standard-cell library gates and conventional physical implementation flows. Our delay estimation method limits the memory usage overhead by clustering critical paths with similar delay sensitivities. Experimental results show that our delay estimation method using multiple DDROs reduces overestimation (timing margin) by up to 25% compared to using a single monitor. Furthermore, our silicon measurement results for monitoring an industrial microprocessor implemented in a 45-nm silicon-on-insulator process show that DDRO can reduce the mean delay estimation error by 35% compared to inverter-based ring oscillators.
asia and south pacific design automation conference | 2014
Liangzhen Lai; Puneet Gupta
Designing reliable integrated systems has become a major challenge with shrinking geometries, increasing fault rates and devices which age substantially in their usage life. The proposed research is motivated by the observation that many of the in-field failures are delay failures and several variability signatures are also delay-related. The origins of temporal delay fluctuations include manufacturing variability, voltage/temperature changes, negative or positive bias temperature instability-related Vth degradation, etc. Since the actual delay changes depend on process variations as well as workload, on-chip monitoring may be the best way of predicting them. There is a need to monitor circuit performance during manufacturing as well as at runtime to predict achievable performance and warn against impending failures. Adaptive mechanisms in hardware and/or software can optimize the trade-off between errors, energy and performance based on the feedback from runtime circuit performance monitors. This paper presents approaches for automated synthesis of design-dependent performance monitors. These monitors can be used to predict impending delay failures relatively inexpensively. For low-overhead monitoring, we propose multiple design-dependent ring oscillators (DDROs) as smart canary structures which can reliably predict achievable chip frequency but with margins for local variations. Early silicon results indicate that DDROs can reduce delay monitoring error by 35% compared to conventional ring oscillators. To further improve the prediction (albeit at a higher overhead), we propose in-situ slack monitors (SlackProbe) which can match local variations as well at overheads much smaller than monitoring all sequential elements. SlackProbe reduces the number of monitors required by over 15X with 5% additional delay margin in several commercial processor benchmarks. Finally, we show an example of software testbed that demonstrates a variability-aware system that utilizes the hardware monitors and operates with both hardware and software adaptation.
Information Technology | 2015
Lucas Francisco Wanner; Liangzhen Lai; Abbas Rahimi; Mark Gottscho; Pietro Mercati; Chu-Hsiang Huang; Frederic Sala; Yuvraj Agarwal; Lara Dolecek; Nikil D. Dutt; Puneet Gupta; Rajesh K. Gupta; Ranjit Jhala; Rakesh Kumar; Sorin Lerner; Subhasish Mitra; Alexandru Nicolau; Tajana Simunic Rosing; Mani B. Srivastava; Steven Swanson; Dennis Sylvester; Yuanyuan Zhou
Abstract In this paper we summarize recent results and contributions from the NSF Expedition on Variability-Aware Software, a five year, multi-university effort to tackle the problem of hardware variations and its implications and opportunities in software. The Expedition has made contributions in characterization and online monitoring of variations (particularly in microprocessors and flash memories), proposed new coding techniques for variability-tolerant storage, provided tools and platforms for the development of variability-aware software, and created new runtime support systems for variability-aware task-scheduling and execution.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2014
Liangzhen Lai; Vikas Chandra; Robert C. Aitken; Puneet Gupta
Negative- and positive bias temperature instability (N/PBTI) have become one of the most important reliability issues in modern semiconductor technology. N/PBTI-induced degradation depends heavily on workload, which causes imbalanced degradation and additional clock skew for clock distribution networks with clock gating features. In this work, we first analyze the effects of N/PBTI on clock paths with different clock gating use cases. Then cross-layer solutions are proposed to reduce N/PBTI-induced clock skew. Two integrated clock gating (ICG) cell circuits are proposed to alternate clock idle state between logic high and logic low for each clock gating operation. A skew mitigation methodology is also proposed to select the appropriate ICG cells based on the architecture and microarchitecture context. An example of sleep scheduling is also described as a simple software-level technique that can be used in conjunction with BTI-Gater to avoid certain pathological aging scenarios. Our experiments show that BTI-Gater can balance the gated clock branches to close to 50% signal duty ratio, while guaranteeing a glitch-free clock signal with easy-to-verify timing constraints. Results on commercial processors show that BTI-Gater can effectively reduce N/PBTI-induced clock skew of up to 17ps, which can be converted to up to 19.7% leakage power saving compared to pure design guardbanding.
compilers, architecture, and synthesis for embedded systems | 2015
Liangzhen Lai; Vikas Chandra; Puneet Gupta
Hardware reliability has been a major concern for nano-scale computing systems. Different hardware design choices, application workloads and software management schemes can jointly affect the systems resilience. In this paper, we first develop a hardware evaluation platform based on an embedded/mobile development board and standard Linux kernel. We demonstrate the use of our platform to evaluate the systems power and radiation-induced soft error rate in presence of system power management schemes and with different application workloads and various hardware design configurations. We also propose system/cloud-based virtual sensing to capture varying ambient conditions for reliability evaluation. New reliability management policies are proposed and implemented in Linux kernel to exploit the flexibility in different existing power management schemes. We demonstrate that our policies can achieve the system reliability target under varying application workloads and ambient conditions. Experiments show that our policies are efficient and with less than 3% additional power overhead compared to the optimal schemes characterized offline.