Philip G. Emma | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philip G. Emma is active.

Explore More

Publication

Featured researches published by Philip G. Emma.

design automation conference | 2007

Interconnects in the third dimension: design challenges for 3D ICs

Kerry Bernstein; Paul S. Andry; Jerome L. Cann; Philip G. Emma; David R. Greenberg; Wilfried Haensch; Mike Ignatowski; Steven J. Koester; John Harold Magerlein; Ruchir Puri; Albert M. Young

Despite generation upon generation of scaling, computer chips have until now remained essentially 2-dimensional. Improvements in on-chip wire delay and in the maximum number of I/O per chip have not been able to keep up with transistor performance growth; it has become steadily harder to hide the discrepancy. 3D chip technologies come in a number of flavors, but are expected to enable the extension of CMOS performance. Designing in three dimensions, however, forces the industry to look at formerly-two- dimensional integration issues quite differently, and requires the re-fitting of multiple existing EDA capabilities.

international symposium on computer architecture | 1991

Branch history table prediction of moving target branches due to subroutine returns

David R. Kaeli; Philip G. Emma

Ideally, a pipeline processor can run at a rate that is limited by its slowest stage. Branches in the instruction stream disrupt the pipeIine, and reduce processor performance to well below ideal. Since workloads contain a high percentage of taken branches, techniques are needed to reduce or eliminate thk degradation. A Branch History Table (BHT) stores past action and target for branches, and predicts that future behavior will repeat. Although past action is a good indicator of future action, the subroutine CALL/RETURN paradigm makes correct prediction of the branch target dlfflcult. We propose a new stack mechanism for reducing this type of mispredlction. Using traces of the SPEC benchmark suite running on an RS/6000, we provide an analysis of the performance enhancements possible using a BHT. We show that the proposed mechanism can reduce the number of branch wrong guesses by 18.2°/0 on average.

international symposium on microarchitecture | 2002

Optimizing pipelines for power and performance

Viji Srinivasan; David M. Brooks; Michael Karl Gschwind; Pradip Bose; Victor Zyuban; Philip N. Strenski; Philip G. Emma

During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.

Ibm Journal of Research and Development | 2003

New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors

David M. Brooks; Pradip Bose; Viji Srinivasan; Michael Karl Gschwind; Philip G. Emma; Michael G. Rosenfield

The PowerTimer toolset has been developed for use in early-stage, microarchitecture-level power-performance analysis of microprocessors. The key component of the toolset is a parameterized set of energy functions that can be used in conjunction with any given cycle-accurate microarchitectural simulator. The energy functions model the power consumption of primitive and hierarchically composed building blocks which are used in microarchitecture-level performance models. Examples of structures modeled are pipeline stage latches, queues, buffers and component read/write multiplexers, local clock buffers, register files, and cache array macros. The energy functions can be derived using purely analytical equations that are driven by organizational, circuit, and technology parameters or behavioral equations that are derived from empirical, circuit-level simulation experiments. After describing the modeling methodology, we present analysis results in the context of a current-generation superscalar processor simulator to illustrate the use and effectiveness of such early-stage models. In addition to average power and performance tradeoff analysis, PowerTimer is useful in assessing the typical and worst-case power (or current) swings that occur between successive cycle windows in a given workload execution. Such a characterization of workloads at the early stage of microarchitecture definition helps pinpoint potential inductive noise problems on the voltage rail that can be addressed by designing an appropriate package or by suitably tuning the dynamic power management controls within the processor.

IEEE Transactions on Computers | 2004

Integrated analysis of power and performance for pipelined microprocessors

Victor Zyuban; David M. Brooks; Viji Srinivasan; Michael Karl Gschwind; Pradip Bose; Philip N. Strenski; Philip G. Emma

Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in todays cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. We describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.

Ibm Journal of Research and Development | 1997

Understanding some simple processor-performance limits

Philip G. Emma

To understand processor performance, it is essential to use metrics that are intuitive, and it is essential to be familiar with a few aspects of a simple scalar pipeline before attempting to understand more complex structures. This paper shows that cycles per instruction (CPI) is a simple dot product of event frequencies and event penalties, and that it is far more intuitive than its more popular cousin, instructions per cycle (IPC). CPI is separable into three components that account for the inherent work, the pipeline, and the memory hierarchy, respectively. Each of these components is a fixed upper limit, or “hard bound,” for the superscalar equivalent components. In the last decade, the memory-hierarchy component has become the most dominant of the three components, and in the next decade, queueing at the memory data bus will become a very significant part of this. In a reaction to this trend, an evolution in bus protocols will ensue. This paper provides a general sketch of those protocols. An underlying theme in this paper is that power constraints have been a driving force in computer architecture since the first computers were built fifty years ago. In CMOS technology, power constraints will shape future microarchitecture in a positive and surprising way. Specifically, a resurgence of the RISC approach is expected in high-performance design which will cause the client and server microarchitectures to converge.

computing frontiers | 2006

Cache miss behavior: is it √2?

Allan M. Hartstein; Viji Srinivasan; Thomas R. Puzak; Philip G. Emma

It has long been empirically observed that the cache miss rate decreased as a power law of cache size, where the power was approximately -1/2. In this paper, we examine the dependence of the cache miss rate on cache size both theoretically and through simulation. By combining the observed time dependence of the cache reference pattern with a statistical treatment of cache entry replacement, we predict that the cache miss rate should vary with cache size as an inverse power law for a first level cache. The exponent in the power law is directly related to the time dependence of cache references, and lies between -0.3 to -0.7. Results are presented for both direct mapped and set associative caches, and for various levels of the cache hierarchy. Our results demonstrate that the dependence of cache miss rate on cache size arises from the temporal dependence of the cache access pattern.

PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems | 2002

Early-stage definition of LPX: a low power issue-execute processor

Pradip Bose; David M. Brooks; Alper Buyuktosunoglu; Peter W. Cook; Koushik K. Das; Philip G. Emma; Michael Karl Gschwind; Hans M. Jacobson; Tejas Karkhanis; Prabhakar Kudva; Stanley E. Schuster; James E. Smith; Viji Srinivasan; Victor Zyuban; David H. Albonesi; Sandhya Dwarkadas

We present the high-level microarchitecture of LPX: a low-power issue-execute processor prototype that is being designed by a joint industry-academia research team. LPX implements a very small subset of a RISC architecture, with a primary focus on a vector (SIMD) multimedia extension. The objective of this project is to validate some key new ideas in power-aware microarchitecture techniques, supported by recent advances in circuit design and clocking.

IEEE Transactions on Computers | 1997

Improving the accuracy of history-based branch prediction

David R. Kaeli; Philip G. Emma

In this paper, we present mechanisms that improve the accuracy and performance of history-based branch prediction. By studying the characteristics of the decision structures present in high-level languages, two mechanisms are proposed that reduce the number of wrong predictions made by a branch target buffer (BTB). Execution-driven modeling is used to evaluate the improvement in branch prediction accuracy, as well as the reduction in overall program execution.

IEEE Design & Test of Computers | 2009

Opportunities and Challenges for 3D Systems and Their Design

Philip G. Emma; Eren Kursun

This article presents the system design opportunities offered by 3D integration, and it discusses the design and test challenges for 3D ICs, with various new design-for-manufacture and DFT issues.

Explore More