James S. Burns | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James S. Burns is active.

Explore More

Publication

Featured researches published by James S. Burns.

IEEE Transactions on Parallel and Distributed Systems | 2002

SMT layout overhead and scalability

James S. Burns; Jean-Luc Gaudiot

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

international conference on parallel architectures and compilation techniques | 2001

Area and System Clock Effects on SMT/CMP Processors

James S. Burns; Jean-Luc Gaudiot

Two approaches to high throughput processors are chip multiprocessing (CMP) and simultaneous multi-threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware partitioning of functional resources. SMT increases functional unit utilization by issuing instructions simultaneously from multiple threads. However, a wide-issue SMT suffers from layout and technology implementation problems. We use silicon resources as our basis for comparison and find that area and system clock have a large effect on the optimal SMT/CCMP design trade. We show the area overhead of SMT on each processor and how it scales with the width of the processor pipeline and the number of SMT threads. The wide issue SMT delivers the highest single-thread performance with improved multi-thread throughput. However multiple smaller cores deliver the highest throughput.

high performance computer architecture | 2000

Quantifying the SMT layout overhead-does SMT pull its weight?

James S. Burns; Jean-Luc Gaudiot

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches or better branch predictors. How large is the SMT overhead, and at what point does SMT no longer pay off compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect level analysis of the layout. We discuss micro-architecture issues that impact SMT implementations, and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

IEEE Transactions on Computers | 2005

Area and system clock effects on SMT/CMP throughput

James S. Burns; Jean-Luc Gaudiot

Two approaches to high throughput processors are chip multiprocessing (CMP) and simultaneous multithreading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware partitioning of functional resources. SMT increases functional unit utilization by issuing instructions simultaneously from multiple threads. However, a wide-issue SMT suffers from layout and technology implementation problems. We use silicon resources as our basis for comparison and find that area and system clock have a large effect on the optimal SMT/CMP design trade. We show the area overhead of SMT on each processor and how it scales with the width of the processor pipeline and the number of SMT threads. The wide issue SMT delivers the highest single-thread performance with improved multithread throughput. However, multiple smaller cores deliver the highest throughput. Also, alternate processor configurations are explored that trade off SMT threads for other microarchitecture features. The result is a small increase to single-thread performance, but a fairly large reduction in throughput.

Archive | 2002