Hsiang-Jen Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hsiang-Jen Tsai is active.

Explore More

Publication

Featured researches published by Hsiang-Jen Tsai.

international solid-state circuits conference | 2015

17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time

Meng-Fan Chang; Chien-Chen Lin; Albert Lee; Chia-Chen Kuo; Geng-Hau Yang; Hsiang-Jen Tsai; Tien-Fu Chen; Shyh-Shyuan Sheu; Pei-Ling Tseng; Heng-Yuan Lee; Tzu-Kun Ku

Many big-data (BD) processors reduce power consumption by employing ternary content-addressable-memory (TCAM) [1-2] with pre-stored signature patterns as filters to reduce the amount of data sent for processing in the following stage (i.e., wireless transmission). To further reduce standby power, BD-processors commonly use nonvolatile memory (NVM) to back up the signature patterns of SRAM-based TCAM (sTCAM) [3] during power interruptions or frequent-off operations. However, this 2-macro (sTCAM + NVM) scheme suffers long delays and requires considerable energy for wake-up operations, due to the word-by-word serial transfer of data between NVM and TCAM macros. Most of the signature patterns are seldom updated (written); therefore, single-macro nonvolatile TCAM (nvTCAM) can be used for BD-processors to reduce area and facilitate fast/low-power wake-up operations, compared to the 2-macro approach. Previous nvTCAMs were designed using diode-connected 4T2R with STT-MTJ (D4T2R) [4], 2T2R with PCM [5], and 4T2R with ReRAM [2]; however, they suffer the following issues: (1) large cell area (A) and high write energy (Ew) due to the use of two NVM (2R) devices; (2) limited word-length (WDL, /k-bits) caused by small current-ratio (I-ratio= IML-MIS/(K×IML)) between match-line (ML) mismatch current (IML-MIS) and ML leakage current of k matched cells (k × IML-MIS); (3) Long search delays (TSD) and excessive search energy (Es) due to large ML parasitic load (CML) and small I-ratio. ReRAM is promising for nvTCAM due to its low Ew, high resistance-ratio (R-ratio), and multiple-level cell (MLC) capability. To overcome issue (1) to (3), this study develops an MLC-based 3T1R nvTCAM with bi-directional voltage-divider control (BVDC). A 2×64×64b 3T1R nvTCAM macro is fabricated using back-end-of-line (BEOL) ReRAM [6] and a 90nm CMOS process, with 2.27× cell size reduction as compared with sTCAM using the same technology and the TSD (=0.96ns) for WDL=64b.

symposium on vlsi circuits | 2014

ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing

Li-Yue Huang; Meng-Fan Chang; Ching-Hao Chuang; Chia-Chen Kuo; Chien-Fu Chen; Geng-Hau Yang; Hsiang-Jen Tsai; Tien-Fu Chen; Shyh-Shyuan Sheu; Keng-Li Su; Frederick T. Chen; Tzu-Kun Ku; Ming-Jinn Tsai; Ming-Jer Kao

This study proposes an RC-filtered stress-decoupled (RCSD) 4T2R nonvolatile TCAM (nvTCAM) to 1) suppress match-line (ML) leakage current from match cells (IML-M), 2) reduce ML parasitic load (CML), 3) decouple NVM-stress from wordlength (WDL) and IML-MIS. RCSD reduces NVM-stress by 7+x, and achieves a 4+x improvement in speed-WDL-capacity-product. A 128×32b RCSD nvTCAM macro was fabricated using HfO ReRAM and an 180nm CMOS. This paper presents the first ReRAM-based nvTCAM featuring the shortest (1.2ns) search delay (TSD) among nvTCAMs with WDL≥32b.

design automation conference | 2015

Energy-efficient non-volatile TCAM search engine design using priority-decision in memory technology for DPI

Hsiang-Jen Tsai; Keng-Hao Yang; Yin-Chi Peng; Chien-Chen Lin; Ya-Han Tsao; Meng-Fan Chang; Tien-Fu Chen

TCAM-based search engines are widely used in regular expression matching across multiple packets. However, the use of priority encoder results in increased energy consumption of pattern updates and search operations. This work, proposes a promising memory technology, called Priority-Decision in Memory (PDM), which eliminates the need for priority encoders and removes restrictions on ordering, meaning that patterns can be stored in an arbitrary order without sorting their lengths. Moreover, we present a Sequential Input-State Search (SIS) scheme to disable the mass of redundant search operations in state segments, based on the analysis distribution of hex signatures in a virus database. Experimental results demonstrate that PDM-based technology can improve update energy consumption of nvTCAM search engines by 36%~67% because most of the energy in the latter is used to reorder. By adopting the SIS-based method to avoid unnecessarily search operations in a TCAM array, the search energy reduction is around 64% of nvTCAM search engines.

international solid-state circuits conference | 2016

7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell

Chien-Chen Lin; Jui-Yu Hung; Wen-Zhang Lin; Chieh-Pu Lo; Yen-Ning Chiang; Hsiang-Jen Tsai; Geng-Hau Yang; Ya-Chin King; Chrong Jung Lin; Tien-Fu Chen; Meng-Fan Chang

Ternary content-addressable memory (TCAM) is used in search engines for network and big-data processing [1]-[6]. Nonvolatile TCAM (nvTCAM) was developed to reduce cell area (A), search energy (ES), and standby power beyond what can be achieved using SRAM-based TCAM (sTCAM) [1]-[2]: particularly in applications with long idle times and frequent-search-few-write operations. nvTCAMs were previously designed using diode-4T2R (D4T2R) with STT-MTJ [3], 2T2R with phase-change memory [4], 4T2R and 3T1R with ReRAM [5,6]. However, these NV devices suffer from the following issues: 1) High ES requirements due to cell-DC-current (IDC-CELL) as well as large match-line (ML) parasitic load (CML), particularly when word-length (WDL) is long; 2) Large A due to the use of two NVM (2R) devices [3]-[5] or in-cell control logic [6]; 3) Limited WDL caused by small ML current-ratio (IML-ratio ≅ IML-MIS/N*IML-M) between mismatch current (IML-MIS) and the leakage-current (IML-M) from cells on a ML, particularly when NVM resistance (R)-ratio (=RHRS /RLRS) between high-R (HRS, RHRS) and low-R (LRS, RLRS) states is small due to process variation; 4) Long search delays (TSD) due to large CML and small IML-ratio. This work proposes 1) a 2.5T1R cell to reduce A, CML, and ES as well as increase IML-ratio; and 2) a region-splitter (RS) sense amplifier (SA) to achieve robust sensing with a smaller ML-Voltage (VML) swing (VMLS) to reduce TSD and ES.

IEEE Transactions on Very Large Scale Integration Systems | 2017

Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology

Hsiang-Jen Tsai; Keng-Hao Yang; Yin-Chi Peng; Chien-Chen Lin; Ya-Han Tsao; Meng-Fan Chang; Tien-Fu Chen

Ternary content-addressable memory (TCAM)-based search engines generally need a priority encoder (PE) to select the highest priority match entry for resolving the multiple match problem due to the don’t care (X) features of TCAM. In contemporary network security, TCAM-based search engines are widely used in regular expression matching across multiple packets to protect against attacks, such as by viruses and spam. However, the use of PE results in increased energy consumption for pattern updates and search operations. Instead of using PEs to determine the match, our solution is a three-phase search operation that utilizes the length information of the matched patterns to decide the longest pattern match data. This paper proposes a promising memory technology called priority-decision in memory (PDM), which eliminates the need for PEs and removes restrictions on ordering, implying that patterns can be stored in an arbitrary order without sorting their lengths. Moreover, we present a sequential input-state (SIS) scheme to disable the mass of redundant search operations in state segments on the basis of an analysis distribution of hex signatures in a virus database. Experimental results demonstrate that the PDM-based technology can improve update energy consumption of nonvolatile TCAM (nvTCAM) search engines by 36%–67%, because most of the energy in these search engines is used to reorder. By adopting the SIS-based method to avoid unnecessary search operations in a TCAM array, the search energy reduction is around 64% of nvTCAM search engines.

IEEE Journal of Solid-state Circuits | 2016

A ReRAM-Based 4T2R Nonvolatile TCAM Using RC-Filtered Stress-Decoupled Scheme for Frequent-OFF Instant-ON Search Engines Used in IoT and Big-Data Processing

Meng-Fan Chang; Lie-Yue Huang; Wen-Zhang Lin; Yen-Ning Chiang; Chia-Chen Kuo; Ching-Hao Chuang; Keng-Hao Yang; Hsiang-Jen Tsai; Tien-Fu Chen; Shyh-Shyuan Sheu

This paper outlines the RC-filtered stress-decoupled (RCSD) 4T2R nonvolatile TCAM (nvTCAM) with the following benefits: 1) reduced NVM-stress; 2) reduced ML parasitic load; and 3) suppression of match-line (ML) leakage current from match cells. The RCSD-4T2R cell achieves a 6× reduction in NVM-stress, a 2× increase in maximum wordlength, and a 2× reduction in search delay. In this paper, we also outline two search schemes, referred to as dynamic source-line pulse controlled (DSL-PC) search and dataline-pulse controlled (DL-PC) search, which were developed specifically for the RCSD-4T2R nvTCAM. We fabricated a 128 × 32 b RCSD-4T2R nvTCAM macro with HfO ReRAM using a 180 nm CMOS process. Using the DSL-PC and DL-PC schemes, the measured search delay of the RCSD-4T2R nvTCAM macro was 1.2 ns under typical VDD.

IEEE Transactions on Very Large Scale Integration Systems | 2017

A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata

Hsiang-Jen Tsai; Chien-Chih Chen; Yin-Chi Peng; Ya-Han Tsao; Yen-Ning Chiang; Wei-Cheng Zhao; Meng-Fan Chang; Tien-Fu Chen

Regular expression matching becomes indispensable elements of Internet of Things network security. However, traditional ternary content addressable memory (TCAM) search engine is unable to handle patterns with wildcards, as it precisely tracks only one active state with single transition. This paper proposes a promising simultaneous pattern matching methodology for wildcard patterns by two separated engines to represent discrete finite automata. A key preprocessing to encode possible postfix pattern by a unique key ensures that follow-up patterns can accurately traverse all possible matches with limited hardware resources. This approach is practical and scalable for achieving good performance and low space consumption in network security, and it can be applicable to any regular expressions even with multiwildcard patterns. The experimental results demonstrate that this scheme can efficiently and accurately recognize wildcard patterns by simultaneously tracking only two active states. By adopting SRAM TCAM in the proposed architecture, the energy consumption is reduced to around 39%, compared with the energy consumption using a computing system that contains a large memory lookup and comparison overhead.

IEEE Transactions on Circuits and Systems | 2017

eTag: Tag-Comparison in Memory to Achieve Direct Data Access based on eDRAM to Improve Energy Efficiency of DRAM Cache

Keng-Hao Yang; Hsiang-Jen Tsai; Chia-Yin Li; Paul Jendra; Meng-Fan Chang; Tien-Fu Chen

As 2.5D/3D die stacking technology emerges, stacked dynamic random access memory (DRAM) has been proposed as a cache due to its large capacity in order to bridge the latency gap between off-chip memory and SRAM caches. The main problems in utilizing a DRAM cache are the high tag storage overhead and the high lookup latency. To address these, we propose tags-in-eDRAM (embedded DRAM) due to its higher density and lower latency. This paper presents an eTag DRAM cache architecture that is composed of a novel tag-comparison-in-memory scheme to achieve direct data access. It eliminates access latency and comparison power by pushing tag-comparison into the sense amplifier. Furthermore, we propose a Merged Tag to enhance the eTag DRAM cache by comparing last-level cache tags and DRAM cache tags in parallel. Simulation results show that the eTag DRAM cache improves energy efficiency by 15.4% and 33.9% in 4-core and 8-core workloads, respectively. Additionally, the Merged Tag achieves 32.1% and 48.7% energy efficiency improvements in 4-core and 8-core workloads, respectively.

ACM Transactions on Design Automation of Electronic Systems | 2017

Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control

Yin-Chi Peng; Chien-Chih Chen; Hsiang-Jen Tsai; Keng-Hao Yang; Pei-Zhe Huang; Shih-Chieh Chang; Wen-Ben Jone; Tien-Fu Chen

To alleviate high energy dissipation of unnecessary snooping accesses, snoop filters have been designed to reduce snoop lookups. These filters have the problem of decreasing filtering efficiency, and thus usually rely on partial or whole filter reset by detecting block evictions. Unfortunately, the reset conditions occur infrequently or unevenly (called passive filter deletion). This work proposes the concept of revitalized snoop filter (RSF) design, which can actively renew the destination filter by employing a generation wrapping-around scheme for various reference behaviors. We further utilize a sampling mechanism for RSF to timely trigger precise filter revitalizations, so that unnecessary RSF flushing can be minimized. The proposed RSF can be integrated to various existent inclusive snoop filters with only a minor change to their designs. We evaluate our proposed design and demonstrate that RSF eliminates 58.6% of snoop energy compared to JETTY on average while inducing only 6.5% of revitalization energy overhead. In addition, RSF eliminates 45.5% of snoop energy compared to stream registers on average and only induces 2.5% of revitalization energy overhead. Overall, these RSFs reduce the total L2 cache energy consumption by 52.1% (58.6% -- 6.5%) as compared to JETTY and by 43% (45.5% -- 2.5%) as compared to stream registers. Furthermore, RSF improves the overall performance by 1% to 1.4% on average compared to JETTY and stream registers for various benchmark suites.

design automation conference | 2014