Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Keng-Hao Yang is active.

Publication


Featured researches published by Keng-Hao Yang.


international solid-state circuits conference | 2016

7.3 A resistance-drift compensation scheme to reduce MLC PCM raw BER by over 100× for storage-class memory applications

Win-San Khwa; Meng-Fan Chang; Jau-Yi Wu; Ming-Hsiu Lee; Tzu-Hsiang Su; Keng-Hao Yang; Tien-Fu Chen; Tien-Yen Wang; Hsiang-Pang Li; M. BrightSky; SangBum Kim; Hsiang-Lam Lung; Chung H. Lam

The large performance gap between traditional storage and the rest of the memory hierarchy calls for a storage class memory (SCM) to fill the need. Phase change memory (PCM) is an emerging memory candidate for SCM with the advantages of scalability, bit-alterability, non-volatility, and high program speed. Previous publications demonstrated high-density single-level-cell (SLC) PCMs using circuits and architectural techniques for expanding memory capacity, increasing bandwidth, and enabling embedded applications [1-4]. For PCM to be a true contender, a multi-level-cell (MLC) topology with at least a moderate data retention time is required. However, the resistance-drift (R-drift) effect causes cell resistance (RCELL) to increase with time, exceeding the ECC correction ability within hours of being programmed. Conventional R-drift mitigation approaches using reference-cell-based resistance tracking (RCRT) [5] and DRAM-like refresh (DR) [6] are feasible, but at the cost of compromising distinguished PCM traits: random write, low latency, and low power. This paper proposes a resistance drift compensation (RDC) scheme to mitigate against R-drift without such compromises, while minimizing the speed and power consumption penalties. The MLC-PCM fixed-threshold retention (FTR) raw-bit-error-rate (RBER) has been suppressed by over two orders of magnitude, reducing it below practical ECC capability limits.


design automation conference | 2015

Energy-efficient non-volatile TCAM search engine design using priority-decision in memory technology for DPI

Hsiang-Jen Tsai; Keng-Hao Yang; Yin-Chi Peng; Chien-Chen Lin; Ya-Han Tsao; Meng-Fan Chang; Tien-Fu Chen

TCAM-based search engines are widely used in regular expression matching across multiple packets. However, the use of priority encoder results in increased energy consumption of pattern updates and search operations. This work, proposes a promising memory technology, called Priority-Decision in Memory (PDM), which eliminates the need for priority encoders and removes restrictions on ordering, meaning that patterns can be stored in an arbitrary order without sorting their lengths. Moreover, we present a Sequential Input-State Search (SIS) scheme to disable the mass of redundant search operations in state segments, based on the analysis distribution of hex signatures in a virus database. Experimental results demonstrate that PDM-based technology can improve update energy consumption of nvTCAM search engines by 36%~67% because most of the energy in the latter is used to reorder. By adopting the SIS-based method to avoid unnecessarily search operations in a TCAM array, the search energy reduction is around 64% of nvTCAM search engines.


IEEE Transactions on Very Large Scale Integration Systems | 2015

Soft-Error-Tolerant Design Methodology for Balancing Performance, Power, and Reliability

Hsuan-Ming Chou; Ming-Yi Hsiao; Yi-Chiao Chen; Keng-Hao Yang; Jean Tsao; Chiao-Ling Lung; Shih-Chieh Chang; Wen-Ben Jone; Tien-Fu Chen

Soft error has become an important reliability issue in advanced technologies. To tolerate soft errors, solutions suggested in previous works incur significant performance and power penalties, especially when a design with fault-tolerant structures is overprotected. In this paper, we present a soft-error-tolerant design methodology to tradeoff performance, power, and reliability for different applications. First, four novel detection and correction flip-flop (FF) structures are proposed to provide different levels of tolerance capability against soft errors. Second, architecture-level vulnerability and logic-level susceptibility analyses are employed to identify weak FFs that can easily cause program execution errors. Third, an optimization framework is developed to synthesize the proposed four novel FF structures into weak and highly observable storage bits with the flexibility of trading off performance, power, and reliability. A five-stage pipeline RISC core (UniRISC) is adopted to demonstrate the usefulness of our methodology. Experimental results show that the proposed method can accomplish design goals by balancing performance, power, and reliability. For example, we can not only satisfy the reliability requirement that no more than five errors occur per one billion hours in a design but also reduce up to 87% performance overhead and 91% power overhead when compared with previous works.


IEEE Journal of Solid-state Circuits | 2017

A Resistance Drift Compensation Scheme to Reduce MLC PCM Raw BER by Over

Win-San Khwa; Meng-Fan Chang; Jau-Yi Wu; Ming-Hsiu Lee; Tzu-Hsiang Su; Keng-Hao Yang; Tien-Fu Chen; Tien-Yen Wang; Hsiang-Pang Li; M. BrightSky; SangBum Kim; Hsiang-Lam Lung; Chung H. Lam

For multilevel cell (MLC) phase change memory (PCM), resistance drift (R-drift) phenomenon causes cell resistance to increase with time, even at room temperature. As a result, the fixed-threshold-retention (FTR) raw-bit-error-rate (RBER) surpasses practical ECC correction ability within hours after being programmed. This study proposes a resistance drift compensation (RDC) scheme to mitigate R-drift issue. The proposed RDC scheme realizes PCM drift compensation and features RDC pulse to suppress ECC decoding failure. The proposed approach was validated using a 90-nm 128M cells PCM chip and an FPGA-based memory controller verification system. The MLC PCM FTR RBER has been suppressed by over 100×, thereby bringing it within ECC capability. The effectiveness of the RDC scheme was verified up to 106 cycles.


IEEE Transactions on Very Large Scale Integration Systems | 2017

100\times

Hsiang-Jen Tsai; Keng-Hao Yang; Yin-Chi Peng; Chien-Chen Lin; Ya-Han Tsao; Meng-Fan Chang; Tien-Fu Chen

Ternary content-addressable memory (TCAM)-based search engines generally need a priority encoder (PE) to select the highest priority match entry for resolving the multiple match problem due to the don’t care (X) features of TCAM. In contemporary network security, TCAM-based search engines are widely used in regular expression matching across multiple packets to protect against attacks, such as by viruses and spam. However, the use of PE results in increased energy consumption for pattern updates and search operations. Instead of using PEs to determine the match, our solution is a three-phase search operation that utilizes the length information of the matched patterns to decide the longest pattern match data. This paper proposes a promising memory technology called priority-decision in memory (PDM), which eliminates the need for PEs and removes restrictions on ordering, implying that patterns can be stored in an arbitrary order without sorting their lengths. Moreover, we present a sequential input-state (SIS) scheme to disable the mass of redundant search operations in state segments on the basis of an analysis distribution of hex signatures in a virus database. Experimental results demonstrate that the PDM-based technology can improve update energy consumption of nonvolatile TCAM (nvTCAM) search engines by 36%–67%, because most of the energy in these search engines is used to reorder. By adopting the SIS-based method to avoid unnecessary search operations in a TCAM array, the search energy reduction is around 64% of nvTCAM search engines.


IEEE Journal of Solid-state Circuits | 2016

for Storage Class Memory Applications

Meng-Fan Chang; Lie-Yue Huang; Wen-Zhang Lin; Yen-Ning Chiang; Chia-Chen Kuo; Ching-Hao Chuang; Keng-Hao Yang; Hsiang-Jen Tsai; Tien-Fu Chen; Shyh-Shyuan Sheu

This paper outlines the RC-filtered stress-decoupled (RCSD) 4T2R nonvolatile TCAM (nvTCAM) with the following benefits: 1) reduced NVM-stress; 2) reduced ML parasitic load; and 3) suppression of match-line (ML) leakage current from match cells. The RCSD-4T2R cell achieves a 6× reduction in NVM-stress, a 2× increase in maximum wordlength, and a 2× reduction in search delay. In this paper, we also outline two search schemes, referred to as dynamic source-line pulse controlled (DSL-PC) search and dataline-pulse controlled (DL-PC) search, which were developed specifically for the RCSD-4T2R nvTCAM. We fabricated a 128 × 32 b RCSD-4T2R nvTCAM macro with HfO ReRAM using a 180 nm CMOS process. Using the DSL-PC and DL-PC schemes, the measured search delay of the RCSD-4T2R nvTCAM macro was 1.2 ns under typical VDD.


IEEE Transactions on Circuits and Systems | 2017

Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology

Keng-Hao Yang; Hsiang-Jen Tsai; Chia-Yin Li; Paul Jendra; Meng-Fan Chang; Tien-Fu Chen

As 2.5D/3D die stacking technology emerges, stacked dynamic random access memory (DRAM) has been proposed as a cache due to its large capacity in order to bridge the latency gap between off-chip memory and SRAM caches. The main problems in utilizing a DRAM cache are the high tag storage overhead and the high lookup latency. To address these, we propose tags-in-eDRAM (embedded DRAM) due to its higher density and lower latency. This paper presents an eTag DRAM cache architecture that is composed of a novel tag-comparison-in-memory scheme to achieve direct data access. It eliminates access latency and comparison power by pushing tag-comparison into the sense amplifier. Furthermore, we propose a Merged Tag to enhance the eTag DRAM cache by comparing last-level cache tags and DRAM cache tags in parallel. Simulation results show that the eTag DRAM cache improves energy efficiency by 15.4% and 33.9% in 4-core and 8-core workloads, respectively. Additionally, the Merged Tag achieves 32.1% and 48.7% energy efficiency improvements in 4-core and 8-core workloads, respectively.


ACM Transactions on Design Automation of Electronic Systems | 2017

A ReRAM-Based 4T2R Nonvolatile TCAM Using RC-Filtered Stress-Decoupled Scheme for Frequent-OFF Instant-ON Search Engines Used in IoT and Big-Data Processing

Yin-Chi Peng; Chien-Chih Chen; Hsiang-Jen Tsai; Keng-Hao Yang; Pei-Zhe Huang; Shih-Chieh Chang; Wen-Ben Jone; Tien-Fu Chen

To alleviate high energy dissipation of unnecessary snooping accesses, snoop filters have been designed to reduce snoop lookups. These filters have the problem of decreasing filtering efficiency, and thus usually rely on partial or whole filter reset by detecting block evictions. Unfortunately, the reset conditions occur infrequently or unevenly (called passive filter deletion). This work proposes the concept of revitalized snoop filter (RSF) design, which can actively renew the destination filter by employing a generation wrapping-around scheme for various reference behaviors. We further utilize a sampling mechanism for RSF to timely trigger precise filter revitalizations, so that unnecessary RSF flushing can be minimized. The proposed RSF can be integrated to various existent inclusive snoop filters with only a minor change to their designs. We evaluate our proposed design and demonstrate that RSF eliminates 58.6% of snoop energy compared to JETTY on average while inducing only 6.5% of revitalization energy overhead. In addition, RSF eliminates 45.5% of snoop energy compared to stream registers on average and only induces 2.5% of revitalization energy overhead. Overall, these RSFs reduce the total L2 cache energy consumption by 52.1% (58.6% -- 6.5%) as compared to JETTY and by 43% (45.5% -- 2.5%) as compared to stream registers. Furthermore, RSF improves the overall performance by 1% to 1.4% on average compared to JETTY and stream registers for various benchmark suites.


IEEE Transactions on Very Large Scale Integration Systems | 2016

eTag: Tag-Comparison in Memory to Achieve Direct Data Access based on eDRAM to Improve Energy Efficiency of DRAM Cache

Hsuan-Ming Chou; Yi-Chiao Chen; Keng-Hao Yang; Jean Tsao; Shih-Chieh Chang; Wen-Ben Jone; Tien-Fu Chen

In a modern system-on-chip design, hundreds of cores and intellectual properties can be integrated into a single chip. To be suitable for high-performance interconnects, designers increasingly adopt advanced interconnect protocols that support novel mechanisms of parallel accessing, including outstanding transactions and out-of-order completion of transactions. To implement those novel mechanisms, a master tags an ID to each transaction to decide in-order or out-of-order properties. However, these advanced protocols may lead to transaction deadlocks that do not occur in traditional protocols. To prevent the deadlock problem, current solutions stall suspicious transactions and in certain cases, many such stalls can incur serious performance penalty. In this brief, we propose a novel ID assignment mechanism that guarantees the issued transactions to be deadlock-free and results in significant reduction in the number of transaction stalls issued by masters. Our experimental results show encouraging performance improvements compared with previous works with little hardware and power overheads.


design automation conference | 2014

Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control

Hsiang-Jen Tsai; Chien-Chih Chen; Keng-Hao Yang; Ting-Chin Yang; Li-Yue Huang; Ching-Hao Chung; Meng-Fan Chang; Tien-Fu Chen

Collaboration


Dive into the Keng-Hao Yang's collaboration.

Top Co-Authors

Avatar

Tien-Fu Chen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Hsiang-Jen Tsai

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Meng-Fan Chang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Shih-Chieh Chang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Tzu-Hsiang Su

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yin-Chi Peng

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chien-Chen Lin

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Chien-Chih Chen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Hsuan-Ming Chou

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Jean Tsao

National Tsing Hua University

View shared research outputs
Researchain Logo
Decentralizing Knowledge