Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Doug Carmean is active.

Publication


Featured researches published by Doug Carmean.


international symposium on computer architecture | 2002

Increasing processor performance by implementing deeper pipelines

Eric Sprangle; Doug Carmean

One architectural method for increasing processor performance involves increasing the frequency by implementing deeper pipelines. This paper will explore the relationship between performance and pipeline depth using a Pentium® 4 processor like architecture as a baseline and will show that deeper pipelines can continue to increase performance.This paper will show that the branch misprediction latency is the single largest contributor to performance degradation as pipelines are stretched, and therefore branch prediction and fast branch recovery will continue to increase in importance. We will also show that higher performance cores, implemented with longer pipelines for example, will put more pressure on the memory system, and therefore require larger on-chip caches. Finally, we will show that in the same process technology, designing deeper pipelines can increase the processor frequency by 100%, which, when combined with larger on-chip caches can yield performance improvements of 35% to 90% over a Pentium® 4 like processor.


international symposium on microarchitecture | 2009

Larrabee: A Many-Core x86 Architecture for Visual Computing

Larry Seiler; Doug Carmean; Eric Sprangle; Tom Forsyth; Pradeep Dubey; Stephen Junkins; Adam T. Lake; Robert D. Cavin; Roger Espasa; Ed Grochowski; Toni Juan; Michael Abrash; Jeremy Sugerman; Pat Hanrahan

The Larrabee many-core visual computing architecture uses multiple in-order x86 cores augmented by wide vector processor units, together with some fixed-function logic. This increases the architectures programmability as compared to standard GPUs. The article describes the Larrabee architecture, a software renderer optimized for it, and other highly parallel applications. The article analyzes performance through scalability studies based on real-world workloads.


IEEE Transactions on Visualization and Computer Graphics | 2009

Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures

Mikhail Smelyanskiy; David R. Holmes; Jatin Chhugani; Alan Larson; Doug Carmean; Dennis P. Hanson; Pradeep Dubey; Kurt E. Augustine; Daehyun Kim; Alan B. Kyker; Victor W. Lee; Anthony D. Nguyen; Larry Seiler; Richard A. Robb

Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First, we describe the three major categories of volume rendering algorithms and confirm through an imaging scientist-guided evaluation that ray-casting is the most acceptable. We describe a thread- and data-parallel implementation of ray-casting that makes it amenable to key architectural trends of three modern commodity parallel architectures: multi-core, GPU, and an upcoming many-core Intelreg architecture code-named Larrabee. We achieve more than an order of magnitude performance improvement on a number of large 3D medical datasets. We further describe a data compression scheme that significantly reduces data-transfer overhead. This allows our approach to scale well to large numbers of Larrabee cores.


automation, robotics and control systems | 2013

Virtual register renaming

Mageda Sharafeddine; Haitham Akkary; Doug Carmean

This paper presents a novel high performance substrate for building energy-efficient out-of-order superscalar cores. The architecture does not require a reorder buffer or physical registers for register renaming and instruction retirement. Instead, it uses a large number of virtual register IDs for register renaming, a physical register file of the same size as the logical register file, and checkpoints to bulk retire instructions and to recover from exceptions and branch mispredictions. By eliminating physical register renaming and the reorder buffer, the architecture not only eliminates complex power hungry hardware structures, but also reduces reorder buffer capacity stalls when execution encounters long delays from data cache misses, thus improving performance. The paper presents performance and power evaluation of this new architecture using Spec 2006 benchmarks. The performance data was collected using an x86 ASIM-based performance simulator from Intel Labs. The data shows that the new architecture improves performance of a 2-wide out-of-order x86 processor core by an average of 4.2%, while saving 43% of the energy consumption of the reorder buffer and retirement register file functional block.


international conference on computer aided design | 2012

Scaling the "memory wall"

Shih-Lien Lu; Tanay Karnik; Ganapati Srinivasa; Kai-Yuan Chao; Doug Carmean; Jim Held

DRAM has been the technology for computer main memory since Intel released the first commercial DRAM chip (i1103) in 1970. As technology scales and demand for memory performance, it seems DRAM is facing several challenges. Many other memory technologies are anticipated to replace it but none has emerged as a clear winner thus far. In this paper we post the question. Is it possible to re-examine the design of DRAM to continue its life for another decade at least?


international conference on computer graphics and interactive techniques | 2008

Larrabee: a many-core x86 architecture for visual computing

Larry Seiler; Doug Carmean; Eric Sprangle; Tom Forsyth; Michael Abrash; Pradeep Dubey; Stephen Junkins; Adam T. Lake; Jeremy Sugerman; Robert D. Cavin; Roger Espasa; Ed Grochowski; Toni Juan; Pat Hanrahan


Archive | 2014

Distribution of tasks among asymmetric processing elements

Eric Sprangle; Doug Carmean; Rajesh Kumar


european conference on computer systems | 2007

Enabling scalability and performance in a large scale CMP environment

Bratin Saha; Ali-Reza Adl-Tabatabai; Anwar M. Ghuloum; Mohan Rajagopalan; Richard L. Hudson; Leaf Petersen; Vijay Menon; Brian R. Murphy; Tatiana Shpeisman; Eric Sprangle; Anwar Rohillah; Doug Carmean; Jesse Fang


Archive | 2002

Method and apparatus for processing a load-lock instruction using a relaxed lock protocol

Herbert H. J. Hum; Doug Carmean


Archive | 2008

Migrating execution of thread between cores of different instruction set architecture in multi-core processor and transitioning each core to respective on / off power state

Herbert H. J. Hum; Eric Sprangle; Doug Carmean; Rajesh Kumar

Researchain Logo
Decentralizing Knowledge