Guihai Yan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guihai Yan is active.

Explore More

Publication

Featured researches published by Guihai Yan.

high-performance computer architecture | 2012

AgileRegulator: A hybrid voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture

Guihai Yan; Yingmin Li; Yinhe Han; Xiaowei Li; Minyi Guo; Xiaoyao Liang

The widening gap between the fast-increasing transistor budget but slow-growing power delivery and system cooling capability calls for novel architectural solutions to boost energy efficiency. Leveraging the fact of surging “dark silicon” area, we propose a hybrid scheme to use both on-chip and off-chip voltage regulators, called “AgileRegulator”, for a multicore system to explore both coarse-grain and fine-grain power phases. We present two complementary algorithms: Sensitivity-Aware Application Scheduling (SAAS) and Responsiveness-Aware Application Scheduling (RAAS) to maximally achieve the energy saving potential of the hybrid regulator scheme. Experimental results show that the hybrid scheme achieves performance-energy efficiency close to per-core DVFS, without imposing much design cost. Meanwhile, the silicon overhead of this scheme is well contained into the “dark silicon”. Unlike other application specific schemes based on accelerators, the proposed scheme itself is a simple and universal solution for chip area and energy trade-offs.

design, automation, and test in europe | 2009

A unified online fault detection scheme via checking of stability violation

Guihai Yan; Yinhe Han; Xiaowei Li

In ultra-deep submicro technology, two of the paramount reliability concerns are soft errors and device aging. Although intensive studies have been done to face the two challenges, most take them separately so far, thereby failing to reach better performance-cost tradeoffs. To support a more efficient design tradeoff, we present a new fault model, Stability Violation, derived from analysis of signal behavior. Furthermore, we propose a unified fault detection scheme—Stability Violation based Fault Detection (SVFD), by which the soft errors (both Single Event Upset and Single Event Transient), aging delay, and delay faults can be uniformly handled. SVFD can greatly facilitate soft error-resistant and aging-aware designs. SVFD is validated by conducting a set of intensive Hspice simulations targeting 65nm CMOS technology. Experimental results show that SVFD has more robust capability for fault detection than previous schemes at comparable overhead in terms of area, power, and performance.

international symposium on computer architecture | 2010

Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors

Guihai Yan; Xiaoyao Liang; Yinhe Han; Xiaowei Li

Process, Voltage, and Temperature (PVT) variations can significantly degrade the performance benefits expected from next nanoscale technology. The primary circuit implication of the PVT variations is the resultant timing emergencies. In a multi-core processor running multiple programs, variations create spatial and temporal unbalance across the processing cores. Most prior schemes are dedicated to tolerating PVT variations individually for a single core, but ignore the opportunity of leveraging the complementary effects between variations and the intrinsic variation unbalance among individual cores. We find that the notorious delay impacts from different variations are not necessary aggregated. Cores with mild variations can share the violent workload from cores suffering large variations. If operated correctly, variations on different cores can help mitigating each other and result in a variation-mild environment. In this paper, we propose Timing Emergency Aware Thread Migration (TEA-TM), a delay sensor-based scheme to reduce system timing emergencies under PVT variations. Fourier transform and frequency domain analysis are conducted to provide the insights and the potential of the PVT co-optimization scheme. Experimental results show on average TEA-TM can help save up to 24% throughput loss, at the same time improve the system fairness by 85%.

design, automation, and test in europe | 2013

SmartCap: user experience-oriented power adaptation for smartphone's application processor

Xueliang Li; Guihai Yan; Yinhe Han; Xiaowei Li

Power efficiency is increasingly critical to battery-powered smartphones. Given the using experience is most valued by the user, we propose that the power optimization should directly respect the user experience. We conduct a statistical sample survey and study the correlation among the user experience, the system runtime activities, and the minimal required frequency of an application processor. This study motivates an intelligent self-adaptive scheme, SmartCap, which automatically identifies the most power-efficient state of the application processor according to system activities. Compared to prior Linux power adaptation schemes, SmartCap can help save power from 11% to 84%, depending on applications, with little decline in user experience.

IEEE Transactions on Computers | 2011

ReviveNet: A Self-Adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation

Guihai Yan; Yinhe Han; Xiaowei Li

The aggressive technology scaling poses serious challenges to lifetime reliability. A parament challenge comes from a variety of aging mechanisms that can cause gradual performance degradation of circuits. Prior work shows that such progressive degradation can be reliably detected by dedicated aging sensors, which provides a good foundation for proposing a new scheme to improve lifetime reliability. In this paper, we propose ReviveNet, a hardware-implemented aging-aware and self-adaptive architecture. Aging awareness is realized by deploying dedicated aging sensors, and self-adaptation is achieved by employing a group of synergistic agents. Each agent implements a localized timing adaptation mechanism to tolerate aging-induced delay on critical paths. On the evaluation, a reliability model based on widely used weibull distribution is presented. Experimental results show that, without compromising with any nominal architectural performance, ReviveNet can improve the Mean-Time-To-Failure by up to 48.7 percent, at the expense of 9.5 percent area overhead and small power increase.

IEEE Transactions on Very Large Scale Integration Systems | 2011

SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation

Guihai Yan; Yinhe Han; Xiaowei Li

In ultra-deep submicrometer technology, soft errors and device aging are two of the paramount reliability concerns. Although many studies have been done to tackle the two challenges, most take them separately so far, thereby failing to reach better performance-cost tradeoffs. To support a more efficient design tradeoff, we propose a unified fault detection scheme - stability violation-based fault detection (SVFD), by which the soft errors (both single event upset and single event transient), aging delay, and delay faults can be uniformly dealt with. SVFD grounds on a new fault model, stability violation, derived from analysis of signal behavior. SVFD has been validated by conducting a set of intensive Hspice simulations targeting the next-generation 32-nm CMOS technology. An application of SVFD to a floating-point unit (FPU) is also evaluated. Experimental results show that SVFD has more versatile fault detection capability for fault detection than several schemes recently proposed at comparable overhead in terms of area, power, and performance.

high-performance computer architecture | 2017

FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks

Wenyan Lu; Guihai Yan; Jiajun Li; Shijun Gong; Yinhe Han; Xiaowei Li

Convolutional Neural Networks (CNN) are verycomputation-intensive. Recently, a lot of CNN accelerators based on the CNN intrinsic parallelism are proposed. However, we observed that there is a big mismatch between the parallel types supported by computing engine and the dominant parallel types of CNN workloads. This mismatch seriously degrades resource utilization of existing accelerators. In this paper, we propose aflexible dataflow architecture (FlexFlow) that can leverage the complementary effects among feature map, neuron, and synapse parallelism to mitigate the mismatch. We evaluated our design with six typical practical workloads, it acquires 2-10x performance speedup and 2.5-10x power efficiency improvement compared with three state-of-the-art accelerator architectures. Meanwhile, FlexFlow is highly scalable with growing computing engine scale.

design automation conference | 2013

RISO: relaxed network-on-chip isolation for cloud processors

Hang Lu; Guihai Yan; Yinhe Han; Binzhang Fu; Xiaowei Li

Cloud service providers use workload consolidation technique in many-core cloud processors to optimize system utilization and augment performance for ever extending scale-out workloads. Performance isolation usually has to be enforced for the consolidated workloads sharing the same many-core resources. Networks-on-chip (NoC) serves as a major shared resource, also needs to be isolated to avoid violating performance isolation. Prior work uses strict network isolation to fulfill performance isolation. However, strict network isolation either results in low consolidation density, or complex routing mechanisms which indicates prohibitive high hardware cost and large latency. In view of this limitation, we propose a novel NoC isolation strategy for many-core cloud processors, called relaxed isolation (RISO). It permits underutilized links to be shared by multiple applications, at the same time keeps the aggregated traffic in check to enforce performance isolation. The experimental results show that the consolidation density is improved more than 12% in comparison with previous strict isolation scheme, meanwhile reducing network latency by 38.4% on average.

IEEE Transactions on Computers | 2016

An Analytical Framework for Estimating Scale-Out and Scale-Up Power Efficiency of Heterogeneous Manycores

Jun Ma; Guihai Yan; Yinhe Han; Xiaowei Li

Heterogeneous manycore architectures have shown to be highly promising to boost power efficiency through two independent ways: (1) enabling massive thread-level parallelism, called “scale-out” approach, and (2) enabling thread migration between heterogeneous cores, called “scale-up” approach. How to accurately model the profitability of power efficiency of the two ways, particularly in an analytical and computational-effective manner, is essential to reap the power efficiency of such architectures. We propose a comprehensive analytical model to predict the power efficiency from the two independent ways. Given power efficiency is measured by performance per watt, this model is composed of a performance and a power model. The performance model is built by two orthogonal functions a and β. Function a describes the scale-out speedup from multithreading; function β presents the scale-up speedup from core heterogeneity. Thus, the performance model can clearly capture the overall speedup of any multithreading and thread-to-core mapping strategies. The power model predicts the power of corresponding scale-out and scale-up configurations. It simultaneously captures the power variations caused by thread synchronization and thread migration between heterogeneous cores. We build both performance and power model in an analytical way and keep the computational complexity in mind. This merit leads to a suit of comprehensive and low-complexity models for runtime management. These models are validated on large-scale heterogeneous manycore architecture with full-system simulations. For performance prediction, the average error is below 12 percent, lower than that of the state-of-the-art methods. For power prediction, the average error is 7.74 percent. On top of the models, we introduce two heuristic scheduling algorithms, performance-oriented MAX-P and power efficiency-oriented MAX-E, to demonstrate the usage of these models. The results show that MAX-P outperforms the state-of-the-art methods by 18 percent in performance averagely; MAX-E outperforms the baseline by 70 percent in power efficiency on average.

design, automation, and test in europe | 2014

SuperRange: Wide operational range power delivery design for both STV and NTV computing

Xin He; Guihai Yan; Yinhe Han; Xiaowei Li

The load power range of modern processors is greatly enlarged because many advanced power management techniques like dynamic voltage frequency scaling, Turbo boosting, and Near Threshold Voltage technologies are incorporated. However, the power saving may be offset by power loss in power delivery; moreover, as the efficiency of power delivery varies greatly with different load conditions, conventional power delivery designs cannot maintain high efficiency over the entire voltage range. We propose SuperRange, a wide operational range power delivery scheme. SuperRange complements the power delivery capability of on-chip voltage regulator and off-chip voltage regulator. Experimental results show SuperRange has an average 70% power conversion efficiency over wide operational range which outperforms conventional power delivery schemes. And it also exhibits superior resilience to power-constrained systems.

Explore More