Wei-Yu Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wei-Yu Tsai is active.

Explore More

Publication

Featured researches published by Wei-Yu Tsai.

IEEE Transactions on Multi-Scale Computing Systems | 2016

Enabling New Computation Paradigms with HyperFET - An Emerging Device

Wei-Yu Tsai; Xueqing Li; Matthew Jerry; Baihua Xie; Nikhil Shukla; Huichu Liu; Nandhini Chandramoorthy; Matthew Cotter; Arijit Raychowdhury; Donald M. Chiarulli; Steven P. Levitan; Suman Datta; Jack Sampson; Nagarajan Ranganathan; Vijaykrishnan Narayanan

High power consumption has significantly increased the cooling cost in high-performance computation stations and limited the operation time in portable systems powered by batteries. Traditional power reduction mechanisms have limited traction in the post-Dennard Scaling landscape. Emerging research on new computation devices and associated architectures has shown three trends with the potential to greatly mitigate current power limitations. The first is to employ steep-slope transistors to enable fundamentally more efficient operation at reduced supply voltage in conventional Boolean logic, reducing dynamic power. The second is to employ brain-inspired computation paradigms, directly embodying computation mechanisms inspired by the brains, which have shown potential in extremely efficient, if approximate, processing with silicon-neuron networks. The third is “let physics do the computation”, which focuses on using the intrinsic operation mechanism of devices (such as coupled oscillators) to do the approximate computation, instead of building complex circuits to carry out the same function. This paper first describes these three trends, and then proposes the use of the hybrid-phase-transition-FET (Hyper-FET), a device that could be configured as a steep-slope transistor, a spiking neuron cell, or an oscillator, as the device of choice for carrying these three trends forward. We discuss how a single class of device can be configured for these multiple use cases, and provide in-depth examination and analysis for a case study of building coupled-oscillator systems using Hyper-FETs for image processing. Performance benchmarking highlights the potential of significantly higher energy efficiency than dedicated CMOS accelerators at the same technology node.

2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC) | 2014

Low-power high-speed current mode logic using Tunnel-FETs

Wei-Yu Tsai; Huichu Liu; Xueqing Li; Vijaykrishnan Narayanan

Current mode logic (CML) circuits have been widely used in high-speed data transceivers. The lower-voltage-swing makes the switching speed of CML much higher than the static logic can achieve, so it is worthy to adopt the CML circuits at the cost of higher power consumption in the high-speed applications. In order to obtain a better power efficiency (Frequency/power) in CML, it is critical to reduce the power consumption while maintaining the high operating frequency. This paper proposes an alternative approach by building the CML circuits with tunneling-field-effect-transistor (Tunnel FETs or TFETs) to achieve a high-throughput, low-voltage interface circuit design. By taking advantage of its steep subthreshold slope (less than 60 mV/dec), TFET exhibits the same on/off current ratio at the input voltage swing interval much lower than that of the MOSFETs, which enables the supply voltage scaling in CML circuits. For a design target data-rate (20 Gbps for multiplexer and 50 Gbps for buffer), our simulations show that the proposed TFET CML circuits are able to reduce the supply voltage from 0.6 V in conventional Si FinFET CML circuits to as low as 0.3 V while using the same constant tail current. As a result, a power consumption reduction of approximately 50% is achieved by the proposed TFET CML circuits, making the TFET CML approach a promising candidate for future low-power, high-performance applications.

IEEE Transactions on Circuits and Systems | 2017

Advancing Nonvolatile Computing With Nonvolatile NCFET Latches and Flip-Flops

Xueqing Li; Sumitha George; Kaisheng Ma; Wei-Yu Tsai; Ahmedullah Aziz; Jack Sampson; Sumeet Kumar Gupta; Meng-Fan Chang; Yongpan Liu; Suman Datta; Vijaykrishnan Narayanan

Nonvolatile computing has been proven to be effective in dealing with power supply outages for on-chip check-pointing in emerging energy-harvesting Internet-of-Things applications. It also plays an important role in power-gating to cut off leakage power for higher energy efficiency. However, existing on-chip state backup solutions for D flip–flop (DFF) have a bottleneck of significant energy and/or latency penalties which limit the overall energy efficiency and computing progress. Meanwhile, these solutions rely on external control that limits compatibility and increases system complexity. This paper proposes an approach to fundamentally advancing the nonvolatile computing paradigm by intrinsically nonvolatile area-efficient latches and flip–flops designs using negative capacitance FET. These designs consume fJ-level energy and ns-level intrinsic latency for a backup plus restore operation, e.g., 2.4 fJ in energy and 1.1 ns in time for one proposed nonvolatile DFF with a supply power of 0.80 V.

ieee computer society annual symposium on vlsi | 2014

A Low-Voltage Low-Power LC Oscillator Using the Diode-Connected SymFET

Xueqing Li; Wei-Yu Tsai; Huichu Liu; Suman Datta; Vijaykrishnan Narayanan

In this paper, a low-voltage low-power LC-tank oscillator design using the symmetric graphene tunneling field-effect transistor (SymFET) diode is presented. The SymFET takes advantage of the resonant current tunneling through two graphene layers, with a large current peak exhibiting negative differential resistance (NDR) when the drain-to-source voltage aligns the Dirac point. A Verilog-A SymFET model is presented with noise performance for circuit design and evaluation. The NDR phenomenon of the diode-connected SymFET is further explored, and oscillator design considerations are discussed for performance optimization. Simulation results show that the proposed SymFET 3.05 GHz oscillator has a simulated phase noise of -117 dBC/Hz at 1.0 MHz offset, with a power consumption of only 0.23 mW from a 0.30 V supply.

international symposium on circuits and systems | 2017

Path planning on the TrueNorth neurosynaptic system

Kate D. Fischl; Kaitlin L. Fair; Wei-Yu Tsai; Jack Sampson; Andreas G. Andreou

We report on the implementation of a path planning algorithm on the TrueNorth neurosynaptic system. Our implementation exploits processing in the temporal domain within the architectural constraints of the TrueNorth chip to deduce the optimal path. The optimal path is computed on the TrueNorth chip for grid maps with dimensions as large as 173 χ 168 nodes consuming 70.0mW at an operating voltage of 0.8V.

international symposium on neural networks | 2016

LATTE: Low-power Audio Transform with TrueNorth Ecosystem

Wei-Yu Tsai; R Davis; Andrew S. Cassidy; Michael DeBole; Alexander Andreopoulos; Bryan L. Jackson; Myron Flickner; Dharmendra S. Modha; Jack Sampson; Vijaykrishnan Narayanan

With recent advances in silicon technology, previously intractable Deep Neural Network (DNN) solutions to complex visual, auditory, and other sensory perception problems are now practical for real-time, energy constrained systems. One such advancement is IBMs TrueNorth neurosynaptic processor, containing 1 million neurons and 256 million synapses, consuming 65mW of power, and capable of operating in real-time for a variety of applications. In this work, we explore how auditory features can be extracted on the TrueNorth processor using low numerical precision while maintaining algorithmic fidelity for DNN based spoken digit recognition on isolated words from the TIDIGITS dataset. Further, we show that our Low-power Audio Transform with TrueNorth Ecosystem (LATTE) is capable of achieving a 24× reduction in energy for feature extraction over a baseline FPGA implementation using standard MFCC audio features, while only incurring a 3 - 6% accuracy penalty.

design automation conference | 2017

Co-training of Feature Extraction and Classification using Partitioned Convolutional Neural Networks

Wei-Yu Tsai; Jinhang Choi; Tulika Parija; Priyanka Gomatam; Chita R. Das; Jack Sampson; Vijaykrishnan Narayanan

There are an increasing number of neuromorphic hardware platforms designed to efficiently support neural network inference tasks. However, many applications contain structured processing in addition to classification. Being able to map both neural network classification and structured computation onto the same platform is appealing from a system design perspective. In this paper, we perform a case study on mapping the feature extraction stage of pedestrian detection using Histogram of Oriented Gradients (HoG) onto a neuromophic platform. We consider three implementations: one that approximates HoG using neuromorphic intrinsics, one that emulates HoG outputs using a trained network, and one that allows feature extraction to be absorbed into classification. The proposed feature extraction methods are implemented and evaluated on neuromorphic hardware (IBM Neurosynaptic System). Our study shows that both a designed approximation and a “parroted” emulation can achieve similar accuracy, and that the latter appears to better capitalize on limited training and resource budgets, compared to the absorbed approach, while also being more power efficient than the programmed approach by a factor of 6.5×–208×.

symposium on vlsi technology | 2016

Ultra low power coupled oscillator arrays for computer vision applications

Nikhil Shukla; Wei-Yu Tsai; Matthew Jerry; Michael Barth; Vijay Narayanan; Suman Datta

Coupled oscillators provide an efficient non-Boolean paradigm for solving a variety of computationally intensive problems in computer vision. This motivates the realization of large networks of low-power coupled oscillators. In this work, we experimentally demonstrate: (i) a relaxation oscillator based on the insulator-metal transition (IMT) in vanadium dioxide (VO2) with record low DC input (peak) power of ~23 μW; (ii) a network of coupled VO2 oscillators with record number of elements (6 oscillators) which perform image processing functionalities in high dimensional space like color detection and morphological operations such as dilation and erosion). Calibrated simulations show that 10× reduction in power compared to a 32 nm CMOS accelerator at iso-throughput.

ACM Journal on Emerging Technologies in Computing Systems | 2017

An Accuracy Tunable Non-Boolean Co-Processor Using Coupled Nano-Oscillators

Neel Gala; Sarada Krithivasan; Wei-Yu Tsai; Xueqing Li; Vijaykrishnan Narayanan; V. Kamakoti

As we enter an era witnessing the closer end of Dennard scaling, where further reduction in power supply-voltage to reduce power consumption becomes more challenging in conventional systems, a goal of developing a system capable of performing large computations with minimal area and power overheads needs more optimization aspects. A rigorous exploration of alternate computing techniques, which can mitigate the limitations of Complementary Metal-Oxide Semiconductor (CMOS) technology scaling and conventional Boolean systems, is imperative. Reflecting on these lines of thought, in this article we explore the potential of non-Boolean computing employing nano-oscillators for performing varied functions. We use a two coupled nano-oscillator as our basic computational model and propose an architecture for a non-Boolean coupled oscillator based co-processor capable of executing certain functions that are commonly used across a variety of approximate application domains. The proposed architecture includes an accuracy tunable knob, which can be tuned by the programmer at runtime. The functionality of the proposed co-processor is verified using a soft coupled oscillator model based on Kuramoto oscillators. The article also demonstrates how real-world applications such as Vector Quantization, Digit Recognition, Structural Health Monitoring, and the like, can be deployed on the proposed model. The proposed co-processor architecture is generic in nature and can be implemented using any of the existing modern day nano-oscillator technologies such as Resonant Body Transistors (RBTs), Spin-Torque Nano-Oscillators (STNOs), and Metal-Insulator Transition (MITs) . In this article, we perform a validation of the proposed architecture using the HyperField Effect Transistor (FET) technology-based coupled oscillators, which provide improvements of up to 3.5× increase in clock speed and up to 10.75× and 14.12× reduction in area and power consumption, respectively, as compared to a conventional Boolean CMOS accelerator executing the same functions.

device research conference | 2016