Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Steven K. Hsu is active.

Publication


Featured researches published by Steven K. Hsu.


design automation conference | 2012

Near-threshold voltage (NTV) design: opportunities and challenges

Himanshu Kaul; Mark A. Anders; Steven K. Hsu; Amit Agarwal; Ram K. Krishnamurthy; Shekhar Borkar

Moores Law will continue providing abundance of transistors for integration, only to be limited by the energy consumption. Near threshold voltage (NTV) operation has potential to improve energy efficiency by an order of magnitude. We discuss design techniques necessary for reliable operation over a wide range of supply voltage - from nominal down to subthreshold region. The system designed for NTV can dynamically select modes of operation, from high performance, to high energy efficiency, to the lowest power.


symposium on computer arithmetic | 2005

An improved unified scalable radix-2 Montgomery multiplier

David Money Harris; Ram K. Krishnamurthy; Mark A. Anders; Sanu K. Mathew; Steven K. Hsu

This paper describes an improved version of the Tenca-Koc unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the Tenca-Koc multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2/sup n/) up to the size of the on-chip memory. An FPGA implementation can perform 1024-bit modular exponentiation in 16 ms using 5598 4-input lookup tables, making it the fastest unified scalable design yet reported.


international solid-state circuits conference | 2008

A 320mV 56μW 411GOPS/Watt Ultra-Low Voltage Motion Estimation Accelerator in 65nm CMOS

Himanshu Kaul; Mark A. Anders; Sanu K. Mathew; Steven K. Hsu; Amit Agarwal; Ram K. Krishnamurthy; Shekhar Borkar

Motion estimation for compressing inter-frame redundancies is the most performance and power-critical operation in video encoding applications, where a wide range of throughput and power constraints are required to handle a variety of video resolution, frame rate and application specifications. A motion estimation engine targeted for special-purpose on-die acceleration of sum of absolute difference (SAD) computation in real-time video encoding workloads on power-constrained mobile microprocessors is fabricated in 65nm CMOS.


IEEE Transactions on Very Large Scale Integration Systems | 2006

A process variation compensating technique with an on-die leakage current sensor for nanometer scale dynamic circuits

Chris H. Kim; Kaushik Roy; Steven K. Hsu; Ram K. Krishnamurthy; Shekhar Borkar

This paper describes a process compensating dynamic (PCD) circuit technique for maintaining the performance benefit of dynamic circuits and reducing the variation in delay and robustness. A variable strength keeper that is optimally programmed based on the die leakage, enables 10% faster performance, 35% reduction in delay variation, and 5times reduction in the number of robustness failing dies, compared to conventional designs. A new leakage current sensor design is also presented that can detect leakage variation and generate the keeper control signals for the PCD technique. Results based on measured leakage data show 1.9-10.2times higher signal-to-noise ratio (SNR) and reduced sensitivity to supply and p-n skew variations compared to prior leakage sensor designs


international solid-state circuits conference | 2014

16.2 A 0.19pJ/b PVT-variation-tolerant hybrid physically unclonable function circuit for 100% stable secure key generation in 22nm CMOS

Sanu K. Mathew; Sudhir K. Satpathy; Mark A. Anders; Himanshu Kaul; Steven K. Hsu; Amit Agarwal; Gregory K. Chen; Rachael J. Parker; Ram K. Krishnamurthy; Vivek De

Physically unclonable function (PUF) circuits are low-cost cryptographic primitives used for generation of unique, stable and secure keys or chip IDs for device authentication and data security in high-performance microprocessors [1][2][3][7]. The volatile nature of PUFs provides a high level of security and tamper resistance against invasive probing attacks compared to conventional fuse-based key storage technologies [4]. A process-voltage-temperature (PVT) variation-tolerant all-digital PUF array targeted for on-die generation of 100% stable, device-specific, high-entropy keys is fabricated in 22nm tri-gate high-κ metal-gate CMOS technology [5], featuring: i) a hybrid delay/cross-coupled PUF circuit where interaction of 16 minimum-sized, variation-impacted transistors determines resolution dynamics, ii) a temporal majority voting (TMV) circuit to stabilize occasionally unstable bits, resulting in 53% reduction in instability, iii) burn-in hardening to reinforce manufacturing-time PUF bias, resulting in 22% reduction in bit-errors, iv) soft dark bits for run-time identification and sequestration of highly unstable bits during field operation, resulting in 78% lower bit-errors, v) 19× separation between inter- and intra-PUF Hamming distance, enabling die-specific keys, vi) autocorrelation factor≈0 and entropy=0.9997, while passing NIST randomness tests, vii) high tolerance to voltage and temperature variation with 82% reduction in average Hamming-distance using a 100-cycle dark bit window, viii) in-situ PUF hardening by leveraging directed NBTI aging to improve stability during field operation, and ix) ultra-low energy consumption of 0.19pJ/b with compact bitcell layout of 4.66μm2 (Fig. 16.2.7a).


symposium on vlsi circuits | 2003

A process variation compensating technique for sub-90 nm dynamic circuits

Chris H. Kim; Kaushik Roy; Steven K. Hsu; Atila Alvandpour; Ram K. Krishnamurthy; Shekhar Borkar

A process variation compensating technique for dynamic circuits is described for sub-90 nm technologies where leakage variation is severe. A keeper whose effective strength is optimally programmable based on die leakage enables 10% faster performance, 35% reduction in delay variation, and 5x reduction in robustness failing dies over conventional static keeper design in 90 nm dual-V/sub t/ CMOS.


IEEE Journal of Solid-state Circuits | 2011

53 Gbps Native

Sanu K. Mathew; Farhana Sheikh; Michael E. Kounavis; Shay Gueron; Amit Agarwal; Steven K. Hsu; Himanshu Kaul; Mark A. Anders; Ram K. Krishnamurthy

Abstract-This paper describes an on-die, reconfigurable AES encrypt/decrypt hardware accelerator fabricated in 45 nm CMOS, targeted for content-protection in high-performance microprocessors. 100% round computation in native GF(24)2 composite-field arithmetic, unified reconfigurable datapath for encrypt/decrypt, optimized ground & composite-field polynomials, integrated affine/bypass multiplexer circuits, fused Mix/InvMixColumn circuits and a folded ShiftRow datapath enable peak 2.2 Tbps/Watt AES-128 energy efficiency with a dense 2-round layout occupying 0.052 mm2, while achieving: (i) 53/44/38 Gbps AES-128/192/256 performance, 125 mW, measured at 1.1 V, 50 °C, (ii) scalable AES-128 performance up to 66 Gbps, measured at 1.35 V, 50 °C, (iii) wide operating supply voltage range with robust subthreshold voltage performance of 800 Mbps, 409 μW, measured at 320 mV, 50 °C (iv) 37% Sbox delay reduction and 25% area reduction with a compact Sbox layout occupying 759 μm2 (v) 67% reduction in worst-case interconnect length and 33% reduction in ShiftRow wiring tracks and (vi) 43 % reduction in Mix/InvMixColumn area with no performance penalty.


symposium on vlsi circuits | 2004

{\rm GF}(2 ^{4}) ^{2}

Chris H. Kim; Kaushik Roy; Steven K. Hsu; Rain K. Krishnamurthy; Shekhar Borkar

This paper describes an on-die leakage current sensor in 1.2V, 90nm dual-V/sub t/ CMOS technology for accurately measuring process variation. Results based on measured leakage data show higher signal-to-noise ratio and reduced sensitivity to supply and P/N skew variations compared to prior designs.


international solid-state circuits conference | 2009

Composite-Field AES-Encrypt/Decrypt Accelerator for Content-Protection in 45 nm High-Performance Microprocessors

Himanshu Kaul; Mark A. Anders; Sanu K. Mathew; Steven K. Hsu; Amit Agarwal; Ram K. Krishnamurthy; Shekhar Borkar

This paper describes a reconfigurable 4-way SIMD engine fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors. The SIMD accelerator is reconfigured to perform 4-way 16b × 16b multiplies, 32b × 32b multiply, 4-way 16b additions, 2-way 32b additions or 72b addition with single-cycle throughput and wide supply voltage range of operation (1.3 V-230 mV). A reconfigurable 2 × 2 tile of signed 2s complement 16b multipliers, with conditional carry gating in the 72b sparse tree adder, dual-supplies for voltage hopping, and fine-grained power-gating enables peak energy efficiency of 494GOPS/W (measured at 300 mV, 50°C) with a dense layout occupying 0.081 mm2 while achieving: (i) scalable performance up to 2.8 GHz, 278 mW measured at 1.3 V; (ii) fast single-cycle switching between any operating/idle mode; (iii) configuration-dependent power reduction of up to 41% in total power and 6.5× in active leakage power; (iv) 10× standby leakage reduction during idle mode; (v) deep subthreshold operation measured at 230 mV, 8.8 MHz, 87 ¿W; and (vi) compensation for up to 3× performance variation in ultra-low voltage mode.


IEEE Journal of Solid-state Circuits | 2012

On-die CMOS leakage current sensor for measuring process variation in sub-90nm generations

Sanu K. Mathew; Suresh Srinivasan; Mark A. Anders; Himanshu Kaul; Steven K. Hsu; Farhana Sheikh; Amit Agarwal; Sudhir K. Satpathy; Ram K. Krishnamurthy

This paper describes an all-digital PVT-variation tolerant true-random number generator (TRNG), fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die entropy generation in high-performance microprocessors. The TRNG harvests differential thermal-noise at the diffusion nodes of a pre-charged cross-coupled inverter pair to resolve out of metastability, generating one random bit/cycle. A self-calibrating 2-step tuning mechanism using coarse-grained configurable inverters and fine-grained programmable clock delay generators, along with an entropy-tracking feedback loop provide tolerance to 20% PVT variation-induced device mismatches, enabling lowest-reported energy-consumption of 2.9 pJ/bit with a dense layout occupying 4004 μm2, while achieving: (i) 2.4 Gbps random bit throughput, 7 mW total power consumption with 0.7 mW leakage power component, measured at 1.1 V, 50°C, (ii) random bitstreams that passes all NIST RNG tests with raw entropy/bit measured up to 0.9999999993, (iii) good distribution of 1s with 4-bit entropy of 3.97996 and high-entropy pattern probability of 0.066 (iv) wide operating supply voltage range with robust sub-threshold voltage performance of 14 Mbps, 5.6 μW, measured at 280 mV, 50°C, (v) 12 fine-grained high-entropy settings for the TRNG to dither in during steady-state operation, (vi) <;3% error while using an analytical ergodic Markov chain model for predicting pattern probabilities and (vii) 200x higher throughput and 9x higher energy-efficiency than previously reported implementations. Design modifications for robust operation in 22 nm high-volume manufacturing in the presence of 3σ process variations demonstrate scalability of the all-digital design to future technologies.

Collaboration


Dive into the Steven K. Hsu's collaboration.

Researchain Logo
Decentralizing Knowledge