Michael D. Upton | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael D. Upton is active.

Explore More

Publication

Featured researches published by Michael D. Upton.

international symposium on computer architecture | 2005

The Impact of Performance Asymmetry in Emerging Multicore Architectures

Saisanthosh Balakrishnan; Ravi Rajwar; Michael D. Upton; Konrad K. Lai

Performance asymmetry in multicore architectures arises when individual cores have different performance. Building such multicore processors is desirable because many simple cores together provide high parallel performance while a few complex cores ensure high serial performance. However, application developers typically assume computational cores provide equal performance, and performance asymmetry breaks this assumption. This paper is concerned with the behavior of commercial applications running on performance asymmetric systems. We present the first study investigating the impact of performance asymmetry on a wide range of commercial applications using a hardware prototype. We quantify the impact of asymmetry on an applications performance variance when run multiple times, and the impact on the applications scalability. Performance asymmetry adversely affects behavior of many workloads. We study ways to eliminate these effects. In addition to asymmetry-aware operating system kernels, the application often itself needs to be aware of performance asymmetry for stable and scalable performance.

architectural support for programming languages and operating systems | 2004

Continual flow pipelines

Srikanth T. Srinivasan; Ravi Rajwar; Haitham Akkary; Amit Gandhi; Michael D. Upton

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.

international symposium on microarchitecture | 2004

Continual flow pipelines: achieving resource-efficient latency tolerance

Srikanth T. Srinivasan; Ravi Rajwar; Haitham Akkary; Amit Gandhi; Michael D. Upton

With the natural trend toward integration, microprocessors are increasingly supporting multiple cores on a single chip. To keep design effort and costs down, designers of these multicore microprocessors frequently target an entire product range, from mobile laptops to high-end servers. This article discusses a continual flow pipeline (CFP) processor. Such processor architecture can sustain a large number of in-flight instructions (commonly referred to as the instruction window and comprising all instructions renamed but not retired) without requiring the cycle-critical structures to scale up. By keeping these structures small and making the processor core tolerant of memory latencies, a CFP mechanism enables the new core to achieve high single-thread performance, and many of these new cores can be placed on a chip for high throughput. The resulting large instruction window reveals substantial instruction-level parallelism and achieves memory latency tolerance, while the small size of cycle-critical resources permits a high clock frequency

Archive | 1999