Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael D. Upton is active.

Publication


Featured researches published by Michael D. Upton.


international symposium on computer architecture | 2005

The Impact of Performance Asymmetry in Emerging Multicore Architectures

Saisanthosh Balakrishnan; Ravi Rajwar; Michael D. Upton; Konrad K. Lai

Performance asymmetry in multicore architectures arises when individual cores have different performance. Building such multicore processors is desirable because many simple cores together provide high parallel performance while a few complex cores ensure high serial performance. However, application developers typically assume computational cores provide equal performance, and performance asymmetry breaks this assumption. This paper is concerned with the behavior of commercial applications running on performance asymmetric systems. We present the first study investigating the impact of performance asymmetry on a wide range of commercial applications using a hardware prototype. We quantify the impact of asymmetry on an applications performance variance when run multiple times, and the impact on the applications scalability. Performance asymmetry adversely affects behavior of many workloads. We study ways to eliminate these effects. In addition to asymmetry-aware operating system kernels, the application often itself needs to be aware of performance asymmetry for stable and scalable performance.


architectural support for programming languages and operating systems | 2004

Continual flow pipelines

Srikanth T. Srinivasan; Ravi Rajwar; Haitham Akkary; Amit Gandhi; Michael D. Upton

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.


international symposium on microarchitecture | 2004

Continual flow pipelines: achieving resource-efficient latency tolerance

Srikanth T. Srinivasan; Ravi Rajwar; Haitham Akkary; Amit Gandhi; Michael D. Upton

With the natural trend toward integration, microprocessors are increasingly supporting multiple cores on a single chip. To keep design effort and costs down, designers of these multicore microprocessors frequently target an entire product range, from mobile laptops to high-end servers. This article discusses a continual flow pipeline (CFP) processor. Such processor architecture can sustain a large number of in-flight instructions (commonly referred to as the instruction window and comprising all instructions renamed but not retired) without requiring the cycle-critical structures to scale up. By keeping these structures small and making the processor core tolerant of memory latencies, a CFP mechanism enables the new core to achieve high single-thread performance, and many of these new cores can be placed on a chip for high throughput. The resulting large instruction window reveals substantial instruction-level parallelism and achieves memory latency tolerance, while the small size of cycle-critical resources permits a high clock frequency


Archive | 1999

Trace based instruction caching

Robert F. Krick; Glenn J. Hinton; Michael D. Upton; David J. Sager; Chan W. Lee


Archive | 1999

Method and apparatus for lock synchronization in a microprocessor system

Douglas M. Carmean; Harish Kumar; Brent E. Lince; Michael D. Upton; Zhongying Zhang


Archive | 2001

Processor having execution core sections operating at different clock rates

David J. Sager; Thomas D. Fletcher; Glenn J. Hinton; Michael D. Upton


Archive | 1998

Computer processor with a replay system having a plurality of checkers

Amit A. Merchant; David J. Sager; Darrell D. Boggs; Michael D. Upton


Archive | 2000

Processor having replay architecture with fast and slow replay paths

Michael D. Upton; David A. Sager; Darrell D. Boggs; Glenn J. Hinton


Archive | 2001

Determination of approaching instruction starvation of threads based on a plurality of conditions

David W. Burns; James D. Allen; Michael D. Upton; Darrell D. Boggs; Alan B. Kyker


Archive | 1997

Address translation system having first and second translation look aside buffers

Michael D. Upton; Gregory Mont Thornton; Bryon G. Conley

Researchain Logo
Decentralizing Knowledge