Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joseph T. Kennedy is active.

Publication


Featured researches published by Joseph T. Kennedy.


IEEE Journal of Solid-state Circuits | 2008

A Scalable 5–15 Gbps, 14–75 mW Low-Power I/O Transceiver in 65 nm CMOS

Ganesh Balamurugan; Joseph T. Kennedy; Gaurab Banerjee; James E. Jaussi; Mozhgan Mansuri; Frank O'Mahony; Bryan K. Casper; Randy Mooney

We present a scalable low-power I/O transceiver in 65 nm CMOS, capable of 5-15 Gbps operation over single-board and backplane FR4 channels with power efficiencies between 2.8-6.5 mW/Gbps. Nonlinear power-performance tradeoff is achieved by the use of scalable transceiver circuit blocks and joint optimization of the supply voltage, bias currents and driver power with data rate. Low-power operation is enabled by passive equalization through inductive link termination, active continuous-time RX equalization, global TX/RX clock distribution with on-die transmission lines, and low-noise offset-calibrated receivers.


IEEE Transactions on Advanced Packaging | 2009

Modeling and Analysis of High-Speed I/O Links

Ganesh Balamurugan; Bryan K. Casper; James E. Jaussi; Mozhgan Mansuri; Frank O'Mahony; Joseph T. Kennedy

Improvements in signaling methods, circuits and process technology have allowed input/output (I/O) data rates to scale beyond 10 Gb/s over several legacy channels. In this regime, it is critical to accurately model and comprehend channel/circuit nonidealities in order to co-optimize the link architecture, circuits, and interconnect. Empirical and worst-case analysis methods used at lower rates are inadequate to account for several deterministic and random noise sources present in I/O links today. In this paper, we review models and methods for statistical signaling analysis of high-speed links, and also propose a new way to integrate behavioral modeling approaches with analytical methods. A computationally efficient segment-based analysis method is shown to accurately capture the effect of transmit jitter and its interaction with the channel. In addition, a new jitter interpretation approach is proposed to enable the analysis of arbitrary I/O clocking topologies. We also present some examples to illustrate the practical utility of these analysis methods in the realm of high-speed I/O design.


international solid-state circuits conference | 2010

A 47

Frank O'Mahony; James E. Jaussi; Joseph T. Kennedy; Ganesh Balamurugan; Mozhgan Mansuri; Clark Roberts; Sudip Shekhar; Randy Mooney; Bryan K. Casper

A 47 × 10 Gb/s chip-to-chip interface consuming 660 mW is demonstrated in 45 nm CMOS. The circuitry and interconnect are co-designed to minimize power and area for a wide parallel interface. Power is reduced by amortizing clocking, minimizing the span of clock signals and pairing a low-swing transmitter driver with a sensitive receiver sampler. The active silicon area is compressed by 64% relative to the C4 bumps using on-chip transmission line routing. A dense, top-side package connector and bridge enable both high off-chip interconnect density and low overall power by reducing equalization and deskew requirements. The interface also demonstrates fast power management for the I/O circuits. The receiver power can be reduced by 93% during standby and an integrated wake-up timer indicates that all lanes return reliably to active mode in <;5 ns. The interface operates at 470 Gb/s with an aggregate bit error ratio better than 2 ×10-18 while consuming 1.4 mW/Gb/s and occupies 3.2 mm2 active silicon area.


custom integrated circuits conference | 2007

\,\times\,

Bryan K. Casper; Ganesh Balamurugan; James E. Jaussi; Joseph T. Kennedy; Mozhgan Mansuri

High-aggregate bandwidth interfaces with minimized power, silicon area, cost and complexity will be essential to the viability of future microprocessor systems. Optimization of microprocessor interfaces at the system level is crucial to providing the most cost-effective and efficient solution. This paper details a comprehensive interconnect and system level analysis method that can be used to accurately evaluate platform-level tradeoffs and has been correlated to link measurements with 10% accuracy. System tradeoffs with respect to interconnect quality, equalization, modulation, clock architecture are shown. Interconnect and circuit density improvements are identified as a promising research direction to maximize the bandwidth and power efficiency of future microprocessor platforms.


custom integrated circuits conference | 2009

10 Gb/s 1.4 mW/Gb/s Parallel Interface in 45 nm CMOS

Sudip Shekhar; Ganesh Balamurugan; David J. Allstot; Mozhgan Mansuri; James E. Jaussi; Randy Mooney; Joseph T. Kennedy; Bryan K. Casper; Frank O'Mahony

A general model for injection-locked LC oscillators (LC-ILOs) is presented that is valid for any tank quality factor and injection strength. Important properties of an ILO such as lock-range, phase shift, bandwidth and response to input jitter are described. An LC-ILO together with a half-rate data sampler is implemented as a forwarded-clock I/O receiver in 45-nm CMOS. A strongly-injected low-Q LC oscillator enables clock deskew across 1UI and rejects high-frequency clock jitter. The complete 27 Gb/s ILO-based data receiver has an overall power efficiency of 1.6 mW/Gb/s.


international solid-state circuits conference | 2008

Future Microprocessor Interfaces: Analysis, Design and Optimization

Frank O'Mahony; Sudip Shekhar; Mozhgan Mansuri; Ganesh Balamurugan; James E. Jaussi; Joseph T. Kennedy; Bryan K. Casper; David J. Allstot; Randy Mooney

This paper describes a method for both filtering and deskewing a link clock using a differential injection-locked LC-DCO and demonstrates a forwarded-clock data receiver using this technique operating at 27 Gb/s.


international symposium on vlsi design, automation and test | 2009

Strong Injection Locking in Low-

Frank O'Mahony; Ganesh Balamurugan; James E. Jaussi; Joseph T. Kennedy; Mozhgan Mansuri; Sudip Shekhar; Bryan K. Casper

High-speed CMOS microprocessor I/O has scaled aggressively over the past decade in terms of power and performance largely due to advances in equalization and clocking techniques. With future multi-core processors expected to require ≫1TB/s bandwidth and dramatically improved power efficiency, there has been some question as to whether electrical I/O will continue to satisfy chip-to-chip communication requirements over the next decade. In this paper, we show that electrical signaling has the power, performance, and density scaling potential to enable the next several generations of systems and applications. Circuit innovation is aggressively pushing link power efficiency toward 1–2mW/Gb/s while departures from legacy channels to include new topologies and materials can significantly improve the power/performance/density tradeoff. Statistical link-level design tools that allow designers to rapidly quantify high-level architecture tradeoffs will enable balanced link designs that co-optimize power, performance, and channel topology.


IEEE Journal of Solid-state Circuits | 2013

Q

Mozhgan Mansuri; James E. Jaussi; Joseph T. Kennedy; Tzu-Chien Hsueh; Sudip Shekhar; Ganesh Balamurugan; Frank O'Mahony; Clark Roberts; Randy Mooney; Bryan K. Casper

A scalable 64-lane chip-to-chip I/O, with per-lane data rate of 2-16 Gb/s is demonstrated in 32-nm low-power CMOS technology. At maximum aggregate bandwidth of 1.024 Tb/s across 50-cm channel length, the link consumes 2.7 W from a 1.08-V supply, corresponding to 2.6 pJ/bit. As bandwidth demand decreases, scaling the per-lane data rate to 4 Gb/s and power supply to 0.65 V provides 1/4 of the maximum bandwidth while consuming 0.2 W. Across a 1-m channel, the link operates at a maximum per-lane data rate of 16 Gb/s; thus, providing up to 1.024 Tb/s of aggregate bandwidth with 3.2 pJ/bit power efficiency from a 1.15-V supply. A length-matched dense interconnect topology allows clocking to be shared across multiple lanes to reduce area and power. Reconfigurable current/voltage mode transmitter driver and CMOS clocking enable a highly scalable power-efficient link. Optional low-dropout regulators provide >22-dB supply noise rejection at the package resonance frequency of 200 MHz. System-level optimization of duty-cycle and quadrature error correctors across the clock hierarchy provides optimized clock phase placement and, thus, enhances link performance and power. A lane failover mechanism provides design robustness to mitigate channel or circuit defects. The active circuitry occupies 1.3 mm2.


IEEE Journal of Solid-state Circuits | 2014

LC Oscillators: Modeling and Application in a Forwarded-Clock I/O Receiver

Tawfiq Musah; James E. Jaussi; Ganesh Balamurugan; Sami Hyvonen; Tzu-Chien Hsueh; Gokce Keskin; Sudip Shekhar; Joseph T. Kennedy; Shreyas Sen; Rajesh Inti; Mozhgan Mansuri; Michael W. Leddige; Bryce D. Horine; Clark Roberts; Randy Mooney; Bryan K. Casper

This paper details the design of an 8-lane bidirectional link for both within-the-box and external communications in 22 nm CMOS technology. A low profile connector with a high density cable assembly ensure a data rate of up to 32 Gb/s per lane while maintaining channel loss below 25 dB. Channel equalization is performed by a combination of a 3-tap feed-forward equalizer (FFE), single-stage continuous-time linear equalizer (CTLE) and a 6-tap decision-feedback equalizer (DFE). Collaborative timing recovery is used to enable lane characterization without degrading jitter performance. Phase error decimation, with a conditional phase detection scheme, is used to reduce the DFE complexity by 50%. Power consumption over a wide range of data rates from 4 to 32 Gb/s is reduced by using regulated CMOS clocking with lane bundling, low swing transmitter with a source-series terminated (SST) driver and a highly reconfigurable receiver with an active inductor CTLE. At a lane data rate of 32 Gb/s, over a 0.5 m cable with 16 dB of loss, a transceiver lane consumes 205 mW from a 1.07 V supply. The power scales down to 26 mW from a 0.72 V supply at 8 Gb/s, when transmitting over a channel with 8 dB loss. The active silicon area per lane is 0.079 mm2.


international solid-state circuits conference | 2013

A 27Gb/s Forwarded-Clock I/O Receiver Using an Injection-Locked LC-DCO in 45nm CMOS

Mozhgan Mansuri; James E. Jaussi; Joseph T. Kennedy; Tzu-Chien Hsueh; Sudip Shekhar; Ganesh Balamurugan; Frank O'Mahony; Clark Roberts; Randy Mooney; Bryan K. Casper

High-performance computing (HPC) systems demand aggressive scaling of memory and I/O to achieve multiple terabits/sec of bandwidth. Minimizing I/O cost, area and power are crucial to achieving a practically realizable system with such large bandwidth. To meet these needs, we developed a low-power dense 64-lane I/O system with per-port aggregate bandwidth up to 1Tb/s and 2.6pJ/bit power efficiency. We developed a high-density connector and cable, attached to the top side of the package that enables this high interconnect density. A lane-failover mechanism provides design robustness for fault-tolerance. To further optimize power efficiency, the lane data rate scales from 2 to 16Gb/s with non-linear power efficiency of 0.8 to 2.6pJ/bit, providing scalable aggregate bandwidth of 0.128 to 1Tb/s. Highly power scalable circuits such as CMOS clocking and reconfigurable current-mode (CM) or voltage-mode (VM) TX driver enable the 8× bandwidth and 3× power efficiency scalability with aggressive supply voltage scaling (0.6 to 1.08V).

Researchain Logo
Decentralizing Knowledge