Nasser A. Kurd
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nasser A. Kurd.
international solid-state circuits conference | 2007
J. Tschanz; Nam Sung Kim; Saurabh Dighe; Jason Howard; Gregory Ruhl; S. Vanga; S. Narendra; Yatin Hoskote; Howard Wilson; C. Lam; M. Shuman; Dinesh Somasekhar; Stephen H. Tang; David Finan; Tanay Karnik; Nitin Borkar; Nasser A. Kurd; Vivek De
Temperature, voltage, and current sensors monitor the operation of a TCP/IP offload accelerator engine fabricated in 90nm CMOS, and a control unit dynamically changes frequency, voltage, and body bias for optimum performance and energy efficiency. Fast response to droops and temperature changes is enabled by a multi-PLL clocking unit and on-chip body bias. Adaptive techniques are also used to compensate performance degradation due to device aging, reducing the aging guardband.
IEEE Journal of Solid-state Circuits | 2001
Nasser A. Kurd; J.S. Barkarullah; R.O. Dizon; Thomas D. Fletcher; P.D. Madland
Core and I/O clock design for the Pentium(R) 4 microprocessor is described. Two phase-locked loops generate core and I/O clocks supporting concurrent multiple frequencies. A clock distribution network with skew optimization and jitter reduction is designed to achieve low clock inaccuracies for processors at frequencies /spl ges/2 GHz for the core and /spl ges/4 GHz for the rapid execution engine. A global medium clock frequency is distributed. Local clock drivers generate pulsed or regular (nonpulsed) clocks at fast, medium, and slow frequencies. A 3.2-GB/s system bus is achieved using a dedicated I/O phase-locked loop with glitch protection and detection. Silicon speed path tools and clock debug features are designed to enable a short debug cycle.
symposium on vlsi circuits | 2008
Nasser A. Kurd; Jonathan P. Douglas; Praveen Mosalikanti; Rajesh Kumar
This paper describes the next generation Intelreg micro-architecture (Nehalem) 45 nm IA processorpsilas core and I/O clocking architecture. Among the highlights are: configurable clocking, fastlock low-skew PLLs, high reference clock frequencies, analog supply tracking system, adaptive frequency clocking, low jitter Intelreg QuickPath interconnect and Intelreg QuickPath memory controller clock generation, and jitter-attenuating DLLs.
IEEE Journal of Solid-state Circuits | 2009
Nasser A. Kurd; Praveen Mosalikanti; Mark Neidengard; Jonathan P. Douglas; Rajesh Kumar
This paper describes the core and I/O clocking architecture of the next generation Intelreg Coretrade micro-architecture processor (Nehalem), designed on a 45 nm process technology. Local PLL placement provides modularity and power-efficient scalability by allowing independent frequency and voltage domains. Fast-locking, low-skew PLLs are used to achieve 56% lock time reduction and 30% long-tem jitter improvement. Adaptive frequency, supply, and duty cycle mechanisms combine for up to 5% core frequency gain at iso-voltage. Jitter attenuating DLLs with enhanced linearity and plusmn15% duty cycle correction drive a differential, low-swing I/O receiver clock distribution, reducing jitter by 25% and enabling 25.6 GB/s Intelreg QuickPath Interconnect bandwidth and three-channel DDR3 traffic up to 32 GB/s.
international solid-state circuits conference | 2014
Nasser A. Kurd; Muntaquim Chowdhury; Edward A. Burton; Thomas P. Thomas; Christopher P. Mozak; Brent R. Boswell; Manoj B. Lal; Anant Deval; Jonathan P. Douglas; Mahmoud Elassal; Ankireddy Nalamalpu; Timothy M. Wilson; Matthew C. Merten; Srinivas Chennupaty; Wilfred Gomes; Rajesh Kumar
The 4th Generation Intel® Core™ processor, codenamed Haswell, is a family of products implemented on Intel 22nm Tri-gate process technology [1]. The primary goals for the Haswell program are platform integration and low power to enable smaller form factors. Haswell incorporates several building blocks, including: platform controller hubs (PCHs), memory, CPU, graphics and media processing engines, thus creating a portfolio of product segments from fan-less Ultrabooks™ to high-performance desktop, as shown in Fig. 5.9.1. It also integrates a number of new technologies: a fully integrated voltage regulator (VR) consolidating 5 platform VRs down to 1, on-die eDRAM cache for improved graphics performance, lower-power states, optimized IO interfaces, an Intel AVX2 instruction set that supports floating-point multiply-add (FMA), and 256b SIMD integer achieving 2× the number of floating-point and integer operations over its predecessor. The 22nm process is optimized for Haswell and includes 11 metal layers (2 additional metal layers vs. Ivy Bridge [2]), high-density metal-insulator-metal (MIM) capacitors, and is tuned for different leakage/speed targets based on the market segment. For example, in some low-power products, the process is optimized to reduce leakage by 75% at Vmin, while paying only 12% intrinsic device degradation at the high-voltage corner.
international solid-state circuits conference | 2010
Nasser A. Kurd; Subramani Bhamidipati; Christopher P. Mozak; Jeffrey L. Miller; Timothy M. Wilson; Mahadev Nemani; Muntaquim Chowdhury
The Westmere processor is implemented on a high-к metal-gate 32nm process technology [1] as a compaction of the Nehalem processor family [2]. Figure 5.1.1 shows the 6-core dual-socket server processor and the 2-core single-socket processor for mainstream client. This paper focuses on innovations and circuit optimizations made to the 6-core processor. The 6-core design has 1.17B transistors including the 12MB shared L3 Cache and fits in approximately the same die area as its 45nm 4-core 8MB-L3-cache Nehalem counterpart. The core supports new instructions for accelerating encryption/decryption algorithms, speeds up performance under virtualized environments, and contains a host of other targeted performance features.
symposium on vlsi circuits | 2003
Charles E. Dike; Nasser A. Kurd; Priyadarsan Patra; Javed S. Barkatullah
Unintentional clock skews between clock domains represent an increasing and costly overhead in high-performance VLSI chips. We describe a novel yet easy-to-implement design that reduces skew between local clock domains dynamically or statically by sensing clock-delay differences and then tuning the clock of each domain relative to its neighbors. Lowering local clock skew is accomplished without compromising worst-case global skew.
IEEE Journal of Solid-state Circuits | 2015
Nasser A. Kurd; Muntaquim Chowdhury; Edward A. Burton; Thomas P. Thomas; Christopher P. Mozak; Brent R. Boswell; Praveen Mosalikanti; Mark Neidengard; Anant Deval; Ashish Khanna; Nasirul Chowdhury; Ravi Rajwar; Timothy M. Wilson; Rajesh Kumar
We describe the 4th Generation Intel® Core™ processor family (codenamed “Haswell”) implemented on Intel® 22 nm technology and intended to support form factors from desktops to fan-less Ultrabooks™. Performance enhancements include a 102 GB/sec L4 eDRAM cache, hardware support for transactional synchronization, and new FMA instructions that double FP operations per clock. Power improvements include Fully-Integrated Voltage Regulators ( ~ 50% battery life extension), new low-power states (95% standby power savings), optimized MCP I/O system (1.0-1.22 pJ/b), and improved DDR I/O circuits (40% active and 100x idle power savings). Other improvements include full-platform optimization via integrated display I/O interfaces.
IEEE Journal of Solid-state Circuits | 2011
Nasser A. Kurd; Subramani Bhamidipati; Christopher P. Mozak; Jeffrey L. Miller; Praveen Mosalikanti; Timothy M. Wilson; Ali M. El-Husseini; Mark Neidengard; Ramy E. Aly; Mahadev Nemani; Muntaquim Chowdhury; Rajesh Kumar
Westmere is the latest IA processor family for mobile, desktop and server market segments, implemented on Intels second-generation high-k metal gate 32 nm process. Westmere not only increases core count, cache size, and frequency within the previous generations power envelope, it also provides further improvements in power efficiency, feature set, and support for combo DDR3 and low voltage DDR3 despite using a thin gate technology.
international conference on computer design | 2005
Muhammad M. Khellah; Maged Ghoneima; James W. Tschanz; Yibin Ye; Nasser A. Kurd; Javed Barkatullah; Srikanth Nimmagadda; Yehea I. Ismail; Vivek De
This paper proposes a bus architecture called skewed repeater bus (SRB) for reducing on-chip interconnect energy in microprocessors. By introducing relative delay between neighboring bus lines, SRB reduces both average and worst-case coupling capacitance between those lines. SRB is compared to previously published techniques like delayed data bus (DDB) and delayed clock bus (DCB). Simulation results in 65-nm process show that bus energy reduction of 18% is achieved when SRB is applied to a real microprocessor example, versus 11% and 7% only for DDB and DCB; respectively.