Jose A. Tierno
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jose A. Tierno.
IEEE Journal of Solid-state Circuits | 2008
Jose A. Tierno; Alexander V. Rylyakov; Daniel J. Friedman
An all static CMOS ADPLL fabricated in 65 nm digital CMOS SOI technology has a fully programmable proportional-integral-differential (PID) loop filter and features a third order delta sigma modulator. The DCO is a three stage, static inverter based ring oscillator programmable in 768 frequency steps. The ADPLL lock range is 500 MHz to 8 GHz at 1.3 V and 25degC, and 90 MHz to 1.2 GHz at 0.5 V and 100degC. The IC dissipates 8 mW/GHz at 1.2 V and 1.6 mW/GHz at 0.5 V. The synthesized 4 GHz clock has a period jitter of 0.7 ps rms, and long term jitter of 6 ps rms. The phase noise under nominal operating conditions is 112 dBc/Hz measured at a 10 MHz offset from a 4 GHz center frequency. The total circuit area is 200 mum 150 mum.
custom integrated circuits conference | 2011
Jae-sun Seo; Bernard Brezzo; Yong Liu; Benjamin D. Parker; Steven K. Esser; Robert K. Montoye; Bipin Rajendran; Jose A. Tierno; Leland Chang; Dharmendra S. Modha; Daniel J. Friedman
Efforts to achieve the long-standing dream of realizing scalable learning algorithms for networks of spiking neurons in silicon have been hampered by (a) the limited scalability of analog neuron circuits; (b) the enormous area overhead of learning circuits, which grows with the number of synapses; and (c) the need to implement all inter-neuron communication via off-chip address-events. In this work, a new architecture is proposed to overcome these challenges by combining innovations in computation, memory, and communication, respectively, to leverage (a) robust digital neuron circuits; (b) novel transposable SRAM arrays that share learning circuits, which grow only with the number of neurons; and (c) crossbar fan-out for efficient on-chip inter-neuron communication. Through tight integration of memory (synapses) and computation (neurons), a highly configurable chip comprising 256 neurons and 64K binary synapses with on-chip learning based on spike-timing dependent plasticity is demonstrated in 45nm SOI-CMOS. Near-threshold, event-driven operation at 0.53V is demonstrated to maximize power efficiency for real-time pattern classification, recognition, and associative memory tasks. Future scalable systems built from the foundation provided by this work will open up possibilities for ubiquitous ultra-dense, ultra-low power brain-like cognitive computers.
IEEE Journal of Solid-state Circuits | 2003
Hui Wu; Jose A. Tierno; Petar Pepeljugoski; Jeremy D. Schaub; Sudhir Gowda; Jeffrey A. Kash; Ali Hajimiri
Intersymbol interference (ISI) caused by intermodal dispersion in multimode fibers is the major limiting factor in the achievable data rate or transmission distance in high-speed multimode fiber-optic links for local area networks applications. Compared with optical-domain and other electrical-domain dispersion compensation methods, equalization with transversal filters based on distributed circuit techniques presents a cost-effective and low-power solution. The design of integrated distributed transversal equalizers is described in detail with focus on delay lines and gain stages. This seven-tap distributed transversal equalizer prototype has been implemented in a commercial 0.18-/spl mu/m SiGe BiCMOS process for 10-Gb/s multimode fiber-optic links. A seven-tap distributed transversal equalizer reduces the ISI of a 10-Gb/s signal after 800 m of 50-/spl mu/m multimode fiber from 5 to 1.38 dB, and improves the bit-error rate from about 10/sup -5/ to less than 10/sup -12/.
international symposium on microarchitecture | 2011
Charles R. Lefurgy; Alan J. Drake; Michael Stephen Floyd; Malcolm S. Allen-Ware; Bishop Brock; Jose A. Tierno; John B. Carter
Microprocessor voltage levels include substantial margin to deal with process variation, system power supply variation, workload induced thermal and voltage variation, aging, random uncertainty, and test inaccuracy. This margin allows the microprocessor to operate correctly during worst-case conditions, but during typical conditions it is larger than necessary and wastes energy. We present a mechanism that reduces excess voltage margin by (1) introducing a critical path monitor (CPM) circuit that measures available timing margin in real-time, (2) coupling the CPM output to the clock generation circuit to adjust clock frequency within cycles in response to excess or inadequate timing margin, and (3) adjusting the processor voltage level periodically in firmware to achieve a specified average clock frequency target. We implemented this mechanism in a prototype IBM POWER7 server. During better-than-worst case conditions our guardband management mechanism reduces the average voltage setting 137–152 mV below nominal, resulting in average processor power reduction of 24% with no performance loss while running industry-standard benchmarks.
international symposium on microarchitecture | 2011
Michael Stephen Floyd; Malcolm S. Allen-Ware; Karthick Rajamani; Bishop Brock; Charles R. Lefurgy; Alan J. Drake; Lorena Pesantez; Tilman Gloekler; Jose A. Tierno; Pradip Bose; Alper Buyuktosunoglu
Power7 implements several new adaptive power management techniques which, in concert with the EnergyScale firmware, let it proactively exploit variations in workload, environmental conditions, and overall system use to meet customer-directed power and performance goals. These innovative features include per-core frequency scaling with available autonomic frequency control, per-chip automated voltage slewing, power consumption estimation, and hardware instrumentation assist.
international solid-state circuits conference | 2012
Ankur Agrawal; John F. Bulzacchelli; Timothy O. Dickson; Yong Liu; Jose A. Tierno; Daniel J. Friedman
This paper presents the design of a 19-Gb/s serial link receiver with both 4-tap feed-forward equalizer (FFE) and 5-tap decision-feedback equalizer (DFE), thereby making the equalization system self-contained in the receiver. This design extends existing power-efficient DFEs based on current-integrating summers and adds FFE functionality to the DFE circuit infrastructure for an efficient implementation. Key techniques for implementing receive-side FFE are: the use of multiphase quarter-rate sample-and-hold circuits for generating multiple time-shifted input data signals, time-based analog multiplication for FFE coefficient weighting, and a merged FFE/DFE summer. The receiver test chip, implemented in a 45-nm silicon-on-insulator (SOI) CMOS technology, occupies 0.07 mm2 and has a power efficiency of 6.2 mW/Gb/s at 19 Gb/s. Step-reponse characterization of the receiver demonstrates accurate FFE computation. The receiver equalizes a 35-in PCB trace at 17 Gb/s with a channel loss of 30 dB at 8.5 GHz and a 20-in PCB trace at 19 Gb/s with a channel loss of 25 dB at 9.5 GHz.
field programmable gate arrays | 2012
Sameh W. Asaad; Ralph Bellofatto; Bernard Brezzo; Chuck Haymes; Mohit Kapur; Benjamin D. Parker; Thomas Roewer; Proshanta Saha; Todd E. Takken; Jose A. Tierno
Software based tools for simulation are not keeping up with the demands for increased chip and system design complexity. In this paper, we describe a cycle-accurate and cycle-reproducible large-scale FPGA platform that is designed from the ground up to accelerate logic verification of the Bluegene/Q compute node ASIC, a multi-processor SOC implemented in IBMs 45 nm SOI CMOS technology. This paper discusses the challenges for constructing such large-scale FPGA platforms, including design partitioning, clocking & synchronization, and debugging support, as well as our approach for addressing these challenges without sacrificing cycle accuracy and cycle reproducibility. The resulting fullchip simulation of the Bluegene/Q compute node ASIC runs at a simulated processor clock speed of 4 MHz, over 100,000 times faster than the logic level software simulation of the same design. The vast increase in simulation speed provides a new capability in the design cycle that proved to be instrumental in logic verification as well as early software development and performance validation for Bluegene/Q.
international solid-state circuits conference | 2009
Alexander V. Rylyakov; Jose A. Tierno; Herschel A. Ainspan; Jean-Olivier Plouchart; John F. Bulzacchelli; Z. Toprak Deniz; Daniel J. Friedman
Wireline communication applications typically require a low-phase-noise wide-tuning-range PLL. While these requirements can be met using traditional charge-pump PLL architectures, a high-performance digital PLL (DPLL)-based solution offers potential advantages in area, testability, and flexibility. Nearly all high-performance DPLL architectures reported in the literature to date (see, e.g., [1–3]) incorporate a time-to-digital converter (TDC) that acts as the loops PFD. Subject to its quantization limits, a high-resolution TDC generates output signals proportional to the phase error at its input, effectively linearizing the PFD response. It should be noted, however, that reported high-performance TDC-based DPLLs have generally been fractional-N, i.e., not integer-N, synthesizers. In a fractional-N loop, the phase difference between the feedback clock and the reference clock at the PFD input varies significantly, frequently jumping by as much as a full output clock period from one phase comparison to the next. At 10GHz output, this results in a 100ps phase shift, thus making a TDC with resolution on the order of 10 to 20ps adequate to generate multiple quantization levels. In an integer-N case, by contrast, a PLL with 500fsrms jitter at the output and a typical feedback divider value in the range of 16 to 40 would have feedback phase jitter of only 2 to 3.2psrms. In this low noise situation, a TDC with less than 3.2ps of resolution would act essentially like a bang-bang PFD (BB-PFD). Existing wireline communication PLLs are predominantly integer-N designs with strict system-level requirements on the rms jitter. A DPLL designer targeting these applications, therefore, would have to face the challenging and ever-increasing requirements on TDC resolution, or to find a way of using a BB-PFD.
IEEE Micro | 2013
Charles R. Lefurgy; Alan J. Drake; Michael Stephen Floyd; Malcolm S. Allen-Ware; Bishop Brock; Jose A. Tierno; John B. Carter; Robert W. Berry
Microprocessor voltage levels traditionally include substantial margin to ensure reliable operation despite variations in manufacturing, workload, and environmental parameters. This margin allows the microprocessor to function correctly during worst-case conditions, but during typical operation it is larger than necessary and wastes energy. The authors present a mechanism that reduces excess voltage margin by introducing a critical-path monitor (CPM) circuit that measures available timing margin in real time; coupling the CPM output to the clock generation circuit to rapidly adjust clock frequency in response to excess or inadequate timing margin; and adjusting the processor voltage level periodically in firmware to achieve a specified average clock frequency target. They first demonstrated this mechanism in an IBM Power7 server and proved its effectiveness in the Power7+ product. Power consumption on the VDD rail was reduced by 11 percent for SPEC CPU2006 workloads with negligible performance loss yet increased protection against noise events.
international solid-state circuits conference | 2005
Scott K. Reynolds; Petar Pepeljugoski; Jeremy D. Schaub; Jose A. Tierno; D. Beisser
A 130mA 2.5V 7-tap analog FIR equalizer for 10Gb/s fiber-optic links is implemented in a 0.12 /spl mu/m CMOS process. The filter precedes the receiver CDR and recovers data signals distorted by multi-mode fiber dispersion over a 600m link to a BER <10/sup -12/. Tap delays are implemented by a combination of passive and buffered LC transmission lines.