Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tzu-Chien Hsueh is active.

Publication


Featured researches published by Tzu-Chien Hsueh.


IEEE Journal of Solid-state Circuits | 2013

A Scalable 0.128–1 Tb/s, 0.8–2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS

Mozhgan Mansuri; James E. Jaussi; Joseph T. Kennedy; Tzu-Chien Hsueh; Sudip Shekhar; Ganesh Balamurugan; Frank O'Mahony; Clark Roberts; Randy Mooney; Bryan K. Casper

A scalable 64-lane chip-to-chip I/O, with per-lane data rate of 2-16 Gb/s is demonstrated in 32-nm low-power CMOS technology. At maximum aggregate bandwidth of 1.024 Tb/s across 50-cm channel length, the link consumes 2.7 W from a 1.08-V supply, corresponding to 2.6 pJ/bit. As bandwidth demand decreases, scaling the per-lane data rate to 4 Gb/s and power supply to 0.65 V provides 1/4 of the maximum bandwidth while consuming 0.2 W. Across a 1-m channel, the link operates at a maximum per-lane data rate of 16 Gb/s; thus, providing up to 1.024 Tb/s of aggregate bandwidth with 3.2 pJ/bit power efficiency from a 1.15-V supply. A length-matched dense interconnect topology allows clocking to be shared across multiple lanes to reduce area and power. Reconfigurable current/voltage mode transmitter driver and CMOS clocking enable a highly scalable power-efficient link. Optional low-dropout regulators provide >22-dB supply noise rejection at the package resonance frequency of 200 MHz. System-level optimization of duty-cycle and quadrature error correctors across the clock hierarchy provides optimized clock phase placement and, thus, enhances link performance and power. A lane failover mechanism provides design robustness to mitigate channel or circuit defects. The active circuitry occupies 1.3 mm2.


IEEE Journal of Solid-state Circuits | 2014

A 4–32 Gb/s Bidirectional Link With 3-Tap FFE/6-Tap DFE and Collaborative CDR in 22 nm CMOS

Tawfiq Musah; James E. Jaussi; Ganesh Balamurugan; Sami Hyvonen; Tzu-Chien Hsueh; Gokce Keskin; Sudip Shekhar; Joseph T. Kennedy; Shreyas Sen; Rajesh Inti; Mozhgan Mansuri; Michael W. Leddige; Bryce D. Horine; Clark Roberts; Randy Mooney; Bryan K. Casper

This paper details the design of an 8-lane bidirectional link for both within-the-box and external communications in 22 nm CMOS technology. A low profile connector with a high density cable assembly ensure a data rate of up to 32 Gb/s per lane while maintaining channel loss below 25 dB. Channel equalization is performed by a combination of a 3-tap feed-forward equalizer (FFE), single-stage continuous-time linear equalizer (CTLE) and a 6-tap decision-feedback equalizer (DFE). Collaborative timing recovery is used to enable lane characterization without degrading jitter performance. Phase error decimation, with a conditional phase detection scheme, is used to reduce the DFE complexity by 50%. Power consumption over a wide range of data rates from 4 to 32 Gb/s is reduced by using regulated CMOS clocking with lane bundling, low swing transmitter with a source-series terminated (SST) driver and a highly reconfigurable receiver with an active inductor CTLE. At a lane data rate of 32 Gb/s, over a 0.5 m cable with 16 dB of loss, a transceiver lane consumes 205 mW from a 1.07 V supply. The power scales down to 26 mW from a 0.72 V supply at 8 Gb/s, when transmitting over a channel with 8 dB loss. The active silicon area per lane is 0.079 mm2.


international solid-state circuits conference | 2013

A scalable 0.128-to-1Tb/s 0.8-to-2.6pJ/b 64-lane parallel I/O in 32nm CMOS

Mozhgan Mansuri; James E. Jaussi; Joseph T. Kennedy; Tzu-Chien Hsueh; Sudip Shekhar; Ganesh Balamurugan; Frank O'Mahony; Clark Roberts; Randy Mooney; Bryan K. Casper

High-performance computing (HPC) systems demand aggressive scaling of memory and I/O to achieve multiple terabits/sec of bandwidth. Minimizing I/O cost, area and power are crucial to achieving a practically realizable system with such large bandwidth. To meet these needs, we developed a low-power dense 64-lane I/O system with per-port aggregate bandwidth up to 1Tb/s and 2.6pJ/bit power efficiency. We developed a high-density connector and cable, attached to the top side of the package that enables this high interconnect density. A lane-failover mechanism provides design robustness for fault-tolerance. To further optimize power efficiency, the lane data rate scales from 2 to 16Gb/s with non-linear power efficiency of 0.8 to 2.6pJ/bit, providing scalable aggregate bandwidth of 0.128 to 1Tb/s. Highly power scalable circuits such as CMOS clocking and reconfigurable current-mode (CM) or voltage-mode (VM) TX driver enable the 8× bandwidth and 3× power efficiency scalability with aggressive supply voltage scaling (0.6 to 1.08V).


international solid-state circuits conference | 2014

26.2 A 205mW 32Gb/s 3-Tap FFE/6-tap DFE bidirectional serial link in 22nm CMOS

James E. Jaussi; Ganesh Balamurugan; Sami Hyvonen; Tzu-Chien Hsueh; Tawfiq Musah; Gokce Keskin; Sudip Shekhar; Joseph T. Kennedy; Shreyas Sen; Rajesh Inti; Mozhgan Mansuri; Michael W. Leddige; Bryce D. Horine; Clark Roberts; Randy Mooney; Bryan K. Casper

Peripheral I/O data-rates for PCs and mobile computing platforms continue to scale to meet high-bandwidth applications including high-resolution displays and large-capacity external storage. The bandwidth requirements will soon exceed the data-rates of current standards such as PCI Express and USB. A low-power low-cost serial link is needed for the next-generation peripheral interface that can scale to 32Gb/s per lane. Recent publications have demonstrated 28 to 32Gb/s rates [1-2]. However, the circuit power and channel characteristics are not suitable for mainstream PC and mobile markets. A low-profile connector and cable assembly prototype is developed for these markets, where the link architecture and design are optimized for the channel characteristics. This paper describes a data-rate-scalable 32Gb/s serial link that features a bidirectional transceiver, source-series terminated (SST) 3-tap FFE, a continuous-time linear equalizer (CTLE) with an active inductor, a 6-tap DFE, and clock calibration and adaptation circuitry.


international solid-state circuits conference | 2014

26.4 A 25.6Gb/s differential and DDR4/GDDR5 dual-mode transmitter with digital clock calibration in 22nm CMOS

Tzu-Chien Hsueh; Ganesh Balamurugan; James E. Jaussi; Sami Hyvonen; Joseph T. Kennedy; Gokce Keskin; Tawfiq Musah; Sudip Shekhar; Rajesh Inti; Shreyas Sen; Mozhgan Mansuri; Clark Roberts; Bryan K. Casper

A wide range of memory configurations exist in todays high-speed digital systems to meet platform-specific bandwidth, power, capacity, and cost constraints. In the near term, DDR4 and GDDR5 are expected to meet the needs of server, client, graphics and mobile platforms [1]. Differential signaling with high-speed serial I/O enhancements will potentially continue I/O performance scaling for post-DDR4 and future buffered memory solutions. A unified memory interface that can meet the signaling requirements of all these memory standards offers several benefits: reduced cost and design time, greater platform design flexibility, and a smoother transition from DDR4/GDDR5 to a high-speed differential memory interface [2]. This paper presents a dual-mode TX that supports single-ended (SE) 1.2V-DDR4/1.5V-GDDR5 (hereafter referred to as DDR-mode) as well as high-speed differential signaling (hereafter referred to as HSD-mode), which is implemented using only thin-gate-oxide devices in 22nm CMOS. Other key design features include: (a) a DDR4/GDDR5 driver implemented using only active devices (no linearizing resistors), (b) enhanced voltage-mode driver supply regulation, (c) reconfigurable logic to support pre-emphasis in both TX modes, and (d) low-overhead digital clock-calibration techniques based on asynchronous digital sampling (ADS) to improve calibration coverage and accuracy.


symposium on vlsi circuits | 2015

A 0.5-to-0.75V, 3-to-8 Gbps/lane, 385-to-790 fJ/b, bi-directional, quad-lane forwarded-clock transceiver in 22nm CMOS

Rajesh Inti; Sudip Shekhar; Ganesh Balamurugan; James E. Jaussi; Clark Roberts; Tzu-Chien Hsueh; Bryan K. Casper

A highly digital, low-power, forwarded clock transceiver is presented. It employs source shunt terminated transmit driver and all-digital delay line based I/Q generator based clock deskew suitable for fast wakeup, low-voltage operation. A quad-lane test chip fabricated in 22nm CMOS process operates between 3-to-8 Gbps over a FR4 channel with 12dB loss and achieves BER<;10-12 while consuming 385-to-790fJ/b.


IEEE Journal of Solid-state Circuits | 2015

An On-Die All-Digital Power Supply Noise Analyzer With Enhanced Spectrum Measurements

Tzu-Chien Hsueh; Frank O'Mahony; Mozhgan Mansuri; Bryan K. Casper

This paper presents a scalable all-digital power supply noise analyzer with 20GHz sampling bandwidth and 1mV resolution implemented in 32nm CMOS. This averaging-based analyzer measures power supply noise in both the equivalent-time and frequency domain with low-resolution VCO-based samplers. For frequency-domain measurements, it uses digital random phase-noise accumulation to remove correlation between the power supply noise and sampling clocks. In addition, the equivalent-time current step response is measured on-die to characterize the frequency-domain impedance of the power delivery network.


european solid-state circuits conference | 2014

An on-die all-digital power supply noise analyzer with enhanced spectrum measurements

Tzu-Chien Hsueh; Frank O'Mahony; Mozhgan Mansuri; Bryan K. Casper

A scalable all-digital power supply noise analyzer with 20 GHz sampling bandwidth and 1 mV resolution is demonstrated in 32 nm CMOS technology for enabling low-cost low-power in-situ power supply noise measurements without dedicated clean supplies and clock sources. This subsampled averaging-based analyzer measures power supply noise in both the equivalent-time and frequency domains with low-resolution VCO-based ADCs. For equivalent-time measurements, the accurate impedance characterization of power delivery networks is simply done by measuring a clock-synchronized current-step response. For frequency-domain measurements, the digital random phase-noise accumulation technique is analyzed and verified to overcome the clock-and-noise correlation issue in autocorrelation measurements. In general large scale integrated circuits and systems, the entire power supply noise analyzer consumes negligible active and leakage powers because of the MHz-range sampling clock frequency and fully digital implementation with only hundreds of logic gates.


symposium on vlsi circuits | 2015

A 1.2–5Gb/s 1.4–2pJ/b serial link in 22nm CMOS with a direct data-sequencing blind oversampling CDR

Sudip Shekhar; Rajesh Inti; James E. Jaussi; Tzu-Chien Hsueh; Bryan K. Casper

A scalable-rate serial link - comprising of a bidirectional transmitter (TX)/receiver (RX) and two all-digital PLLs (ADPLLs) - operates at 1.2-5Gb/s from 0.55-0.7V DC supply with 1.4-2pJ/b total energy efficiency, respectively. Power efficiency is improved by avoiding the use of any analog circuitry, a low swing voltage-mode transmitter, and a direct data-sequencing blind oversampling (DDS-BOS) clock and data recovery (CDR). Using DDS in feed-forward BOS-CDR obviates area and power consuming FIFOs, improves jitter tolerance (JTOL), and permits up to 7500ppm frequency tolerance (FTOL) between the TX-RX clocks - rendering it attractive for fast-locking continuous/burst operation.


Archive | 2016

Égaliseur numérique à modulation d'amplitude d'impulsions m-aire

Shiva Kiran; Tzu-Chien Hsueh; James E. Jaussi

Collaboration


Dive into the Tzu-Chien Hsueh's collaboration.

Researchain Logo
Decentralizing Knowledge