Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christopher P. Mozak is active.

Publication


Featured researches published by Christopher P. Mozak.


international solid-state circuits conference | 2014

5.9 Haswell: A family of IA 22nm processors

Nasser A. Kurd; Muntaquim Chowdhury; Edward A. Burton; Thomas P. Thomas; Christopher P. Mozak; Brent R. Boswell; Manoj B. Lal; Anant Deval; Jonathan P. Douglas; Mahmoud Elassal; Ankireddy Nalamalpu; Timothy M. Wilson; Matthew C. Merten; Srinivas Chennupaty; Wilfred Gomes; Rajesh Kumar

The 4th Generation Intel® Core™ processor, codenamed Haswell, is a family of products implemented on Intel 22nm Tri-gate process technology [1]. The primary goals for the Haswell program are platform integration and low power to enable smaller form factors. Haswell incorporates several building blocks, including: platform controller hubs (PCHs), memory, CPU, graphics and media processing engines, thus creating a portfolio of product segments from fan-less Ultrabooks™ to high-performance desktop, as shown in Fig. 5.9.1. It also integrates a number of new technologies: a fully integrated voltage regulator (VR) consolidating 5 platform VRs down to 1, on-die eDRAM cache for improved graphics performance, lower-power states, optimized IO interfaces, an Intel AVX2 instruction set that supports floating-point multiply-add (FMA), and 256b SIMD integer achieving 2× the number of floating-point and integer operations over its predecessor. The 22nm process is optimized for Haswell and includes 11 metal layers (2 additional metal layers vs. Ivy Bridge [2]), high-density metal-insulator-metal (MIM) capacitors, and is tuned for different leakage/speed targets based on the market segment. For example, in some low-power products, the process is optimized to reduce leakage by 75% at Vmin, while paying only 12% intrinsic device degradation at the high-voltage corner.


international solid-state circuits conference | 2010

Westmere: A family of 32nm IA processors

Nasser A. Kurd; Subramani Bhamidipati; Christopher P. Mozak; Jeffrey L. Miller; Timothy M. Wilson; Mahadev Nemani; Muntaquim Chowdhury

The Westmere processor is implemented on a high-к metal-gate 32nm process technology [1] as a compaction of the Nehalem processor family [2]. Figure 5.1.1 shows the 6-core dual-socket server processor and the 2-core single-socket processor for mainstream client. This paper focuses on innovations and circuit optimizations made to the 6-core processor. The 6-core design has 1.17B transistors including the 12MB shared L3 Cache and fits in approximately the same die area as its 45nm 4-core 8MB-L3-cache Nehalem counterpart. The core supports new instructions for accelerating encryption/decryption algorithms, speeds up performance under virtualized environments, and contains a host of other targeted performance features.


IEEE Journal of Solid-state Circuits | 2015

Haswell: A Family of IA 22 nm Processors

Nasser A. Kurd; Muntaquim Chowdhury; Edward A. Burton; Thomas P. Thomas; Christopher P. Mozak; Brent R. Boswell; Praveen Mosalikanti; Mark Neidengard; Anant Deval; Ashish Khanna; Nasirul Chowdhury; Ravi Rajwar; Timothy M. Wilson; Rajesh Kumar

We describe the 4th Generation Intel® Core™ processor family (codenamed “Haswell”) implemented on Intel® 22 nm technology and intended to support form factors from desktops to fan-less Ultrabooks™. Performance enhancements include a 102 GB/sec L4 eDRAM cache, hardware support for transactional synchronization, and new FMA instructions that double FP operations per clock. Power improvements include Fully-Integrated Voltage Regulators ( ~ 50% battery life extension), new low-power states (95% standby power savings), optimized MCP I/O system (1.0-1.22 pJ/b), and improved DDR I/O circuits (40% active and 100x idle power savings). Other improvements include full-platform optimization via integrated display I/O interfaces.


IEEE Journal of Solid-state Circuits | 2011

A Family of 32 nm IA Processors

Nasser A. Kurd; Subramani Bhamidipati; Christopher P. Mozak; Jeffrey L. Miller; Praveen Mosalikanti; Timothy M. Wilson; Ali M. El-Husseini; Mark Neidengard; Ramy E. Aly; Mahadev Nemani; Muntaquim Chowdhury; Rajesh Kumar

Westmere is the latest IA processor family for mobile, desktop and server market segments, implemented on Intels second-generation high-k metal gate 32 nm process. Westmere not only increases core count, cache size, and frequency within the previous generations power envelope, it also provides further improvements in power efficiency, feature set, and support for combo DDR3 and low voltage DDR3 despite using a thin gate technology.


symposium on vlsi circuits | 2015

Broadwell: A family of IA 14nm processors

Ankireddy Nalamalpu; Nasser A. Kurd; Anant Deval; Christopher P. Mozak; Jonathan P. Douglas; Ashish Khanna; Fabrice Paillet; Gerhard Schrom; Boyd S. Phelps

Intel Core™ M and 5th generation of Core™ processors (code named Broadwell) are fabricated on an optimized 14 nm process technology node resulting in a 49% reduction in feature-neutral die area. 14nm created a new optimized process flavor for Core™ M to improve energy efficiency for mobile devices. Techniques and optimizations were implemented to deliver 2.5x TDP reduction coupled with up-to 60% higher graphics performance. New process technology combined with various design techniques reduced the minimum voltage of operation by 50 m V. Broadwell introduces the second generation of Fully Integrated Voltage Regulator with better droop control and parallel boot LVR along with other power-reduction features resulting in 35% reduction in active and standby power over first generation. 3DL inductor technology introduced for the first time in Broadwell, enables 30 % reduction in package thickness and improved low-load efficiency. IO re-partitioning of the SOC and a major re-design of DDR system resulted in 30% reduction in I/O power. Shutting down various parts of the SOC die in various idle states (C* states) resulted in 60% reduction in the idle power. New software controlled co-optimization methods were implemented such as duty-cycle control and dynamic display support to improve the energy efficiency of the graphics and the display subsystem.


custom integrated circuits conference | 2015

Low power analog circuit techniques in the 5 th generation intel core TM microprocessor (broadwell)

Praveen Mosalikanti; Nasser A. Kurd; Christopher P. Mozak; Takao Oshita

Fabricated on a 14nm process technology node, the Intel CoreTM M and the 5th generation CoreTM processors (code named Broadwell) improve energy efficiency over the previous 22nm generation by up to 2.5x. Numerous optimizations were used in the analog circuits to achieve this power reduction. PLLs were designed to have low analog Vmin to enable operation without the use of a dedicated voltage rail. This enabled system level power optimization that yielded 28% lower power on that rail. Zero Distribution Latency to Full Distribution Latency (ZDL-to-FDL) mode was introduced in the PLLs, reducing clock distribution power and achieving ~150mV reduction in the clock distribution supplys Vmin. DDR power was reduced by 3x through the use of VTT termination, instead of the traditional Center Tapped Termination (CTT). A new package Cstate (C7+) was introduced to reduce integrated voltage regulator losses under low load conditions. Duty cycling of the thermal sensor reduced average power 10x relative to the prior generation while a fast wakeup technique reduced convergence time to ~10us.


international symposium on vlsi design, automation and test | 2012

Intel® Core™ i5/i7 QuickPath Interconnect receiver clocking circuits and training algorithm

Nasirul Chowdhury; Jeff Wight; Christopher P. Mozak; Nasser A. Kurd

This paper describes the forwarded clock amplifier (FCA), phase interpolator (PI) and training algorithm used in receiver clocking of QuickPath Interconnect™ (QPI) in Intel® Core™ micro-processor, implemented in 45nm and 32nm process technologies. QPI is used for communication among processors/chipsets and delivers up to 25.6GB/s BW per port at 6.4GT/s. The FCA has a built in duty cycle corrector (DCC). Two PIs were used for each receiver lane to generate clocks to capture odd and even data independently. The novel training and retraining algorithm trains each PI for its corresponding data eye eliminating the need for any duty cycle correction of the PI output while maximizing the eye margin.


Archive | 2013

Row hammer refresh command

Kuljit S. Bains; John B. Halbert; Christopher P. Mozak; Theodore Z. Schoenborn; Zvika Greenfield


Archive | 2014

Method and apparatus for dynamically adjusting voltage reference to optimize an i/o system

Christopher P. Mozak; Kevin B. Moore; John V. Lovelace; Theodore Z. Schoenborn; Bryan L. Spry; Christopher E. Yunker


Archive | 2007

Memory link training

Bryan L. Spry; Christopher P. Mozak; Stanley S. Kulick

Researchain Logo
Decentralizing Knowledge