Jeanine Cook | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeanine Cook is active.

Explore More

Publication

Featured researches published by Jeanine Cook.

IEEE Computer Architecture Letters | 2006

Performance modeling using Monte Carlo simulation

Ram Srinivasan; Jeanine Cook; Olaf M. Lubeck

Cycle accurate simulation has long been the primary tool for micro-architecture design and evaluation. Though accurate, the slow speed often imposes constraints on the extent of design exploration. In this work, we propose a fast, accurate Monte-Carlo based model for predicting processor performance. We apply this technique to predict the CPI of in-order architectures and validate it against the Itanium-2. The Monte Carlo model uses micro-architecture independent application characteristics, and cache, branch predictor statistics to predict CPI with an average error of less than 7%. Since prediction is achieved in a few seconds, the model can be used for fast design space exploration that can efficiently cull the space for cycle-accurate simulations. Besides accurately predicting CPI, the model also breaks down CPI into various components, where each component quantifies the effect of a particular stall condition (branch misprediction, cache miss, etc.) on overall CPI. Such a CPI decomposition can help processor designers quickly identify and resolve critical performance bottlenecks

international symposium on performance analysis of systems and software | 2005

Fast, Accurate Microarchitecture Simulation Using Statistical Phase Detection

Ram Srinivasan; Jeanine Cook

Simulation-based microarchitecture research is often hindered by the slow speed of simulators. In this work, we propose a novel statistical technique to identify highly representative unique behaviors or phases in a benchmark based on its IPC (instructions committed per cycle) trace. By simulating the timing of only the unique phases, the cycle-accurate simulation time for the SPEC suite is reduced from 5 months to 5 days, with a significant retention of the original dynamic behavior. Evaluation across many processor configurations within the same architecture family shows that the algorithm is robust. A cost function is provided that enables users to easily optimize the parameters of the algorithm for either simulation speed or accuracy depending on preference. A new measure is introduced to quantify the ability of a simulation speedup technique to retain behavior realized in the original workload. Unlike a first order statistic such as mean value, the newly introduced measure captures important differences in dynamic behavior between the complete and the sampled simulations

modeling, analysis, and simulation on computer and telecommunication systems | 2005

Improved estimation for software multiplexing of performance counters

Wiplove Mathur; Jeanine Cook

On-chip performance counters are gaining popularity as an analysis and validation tool. Most contemporary processors have between two and six physical counters that can monitor an equal number of unique events simultaneously at fixed sampling periods. Through multiplexing and estimation, an even greater number of unique events can be monitored in a single program execution. When a program is sampled in multiplexed mode using round-robin scheduling of a specified event set, the number of events that are physically counted during each sampling period is limited by the number of counters that can be simultaneously accessed. During this period, the remaining events of the multiplexed event-set are not monitored, but their counts are estimated. Our work quantifies the estimation error of the event-counts in the multiplexed mode, which indicates that as many as 42% of sampled intervals are estimated with error greater than 10%. We propose new estimation algorithms that result in an accuracy improvement of up to 40%.

symposium on computer architecture and high performance computing | 2006

Ultra-Fast CPU Performance Prediction: Extending the Monte Carlo Approach

Ram Srinivasan; Jeanine Cook; Olaf M. Lubeck

Performance evaluation of contemporary processors is becoming increasingly difficult due to the lack of proper frameworks. Traditionally, cycle-accurate simulators have been extensively used due to their inherent accuracy and flexibility. However, the effort involved in building them, their slow speed, and their limited ability to provide insight often imposes constraints on the extent of design exploration. In this paper, we refine our earlier Monte Carlo based CPI prediction model (Srinivasan et al., 2006) to include software assisted data-prefetching and an improved memory model. Software-based prefetching is becoming an increasingly important feature in modern processors but to the best of our knowledge, existing frameworks do not model it. Our model uses micro-architecture independent application characteristics to predict CPI with an average error of less than 10% when validated against the Itanium-2 processor. Besides accurate performance prediction, we illustrate the applications of the model to processor bottleneck analysis, workload characterization and design space exploration

measurement and modeling of computer systems | 2011

A statistical performance model of the opteron processor

Jeanine Cook; Jonathan E. Cook; Waleed Alkohlani

Cycle-accurate simulation is the dominant methodology for processor design space analysis and performance prediction. However, with the prevalence of multi-core, multi-threaded architectures, this method has become highly impractical as the sole means for design due to its extreme slowdowns. We have developed a statistical technique for modeling multicore processors that is based on Monte Carlo methods. Using this method, processor models of contemporary architectures can be developed and applied to performance prediction, bottleneck detection, and limited design space analysis. To date, we have accurately modeled the IBM Cell, the Intel Itanium, and the Sun Niagara 1 and Niagara 2 processors [23, 22, 8]. In this paper, we present a work in progress which is applying this methodology to an out-of-order execution processor. We present the initial single-core model and results for the AMD Barcelona (Opteron) processor.

international performance computing and communications conference | 2007

Compiler-Directed Functional Unit Shutdown for Microarchitecture Power Optimization

Santosh Talli; Ram Srinivasan; Jeanine Cook

Leakage power is a major concern in current microarchitectures as it is increasing exponentially with decreasing transistor feature sizes. In this paper, we present a technique called functional unit shutdown to reduce the static leakage power consumption of a microprocessor by power gating functional units when not used. We use profile information to identify functional unit idle periods that is used by the compiler to issue corresponding OFF/ON instructions. The decision to power gate during idle periods is made based on the comparison between the energy consumed by leaving the units ON and the overhead and leakage energy involved in power cycling them. This comparison identifies short idle periods where less power is consumed if a functional unit is left ON rather than cycling the power during that period. The results show that this technique saves up to 18% of the total energy and between 4 and 11% on average with a performance degradation of 1%.

international conference on parallel processing | 2010

Extending the Monte Carlo Processor Modeling Technique: Statistical Performance Models of the Niagara 2 Processor

Waleed Alkohlani; Jeanine Cook; Ram Srinivasan

With the complexity of contemporary single- and multi-core, multi-threaded processors comes a greater need for faster methods of performance analysis and design. It is no longer practical to use only cycle-accurate processor simulators for design space analysis of modern processors and systems. Therefore, we propose a statistical processor modeling method that is based on Monte Carlo techniques. In this paper, we present new details of the methodology and the recent extensions that we have made to it, including the capability to model multi-core processors. We detail the steps to develop a new model and then present statistical performance models of the Sun Niagara 2 processor micro-architecture that, together with a previously published Itanium 2 Monte Carlo model, demonstrates the validity of the technique and its new capabilities. We show that we can accurately predict single and multi-core performance within 7% of actual on average, and we can use the models to quickly pinpoint performance problems at various components.

measurement and modeling of computer systems | 2002

Toward reducing processor simulation time via dynamic reduction of microarchitecture complexity

Jeanine Cook; Richard L. Oliver; Eric E. Johnson

As processor microarchitectures continue to increase in complexity, so does the time required to explore the design space. Performing cycle-accurate, detailed timing simulation of a realistic workload on a proposed processor microarchitecture often incurs a prohibitively large time cost. We propose a method to reduce the time cost of simulation by dynamically varying the complexity of the processor model throughout the simulation. In this paper, we give first evidence of the feasibility of this approach. We demonstrate that there are significant amounts of time during a simulation where a reduced processor model accurately tracks important behavior of a full model, and that by simulating the reduced model during these times the total simulation time can be reduced. Finally, we discuss metrics for detecting areas where the two processor models track each other, which is crucial for dynamically deciding when to use a reduced rather than a full model.

ieee international conference on high performance computing data and analytics | 2012

Towards Performance Predictive Application-Dependent Workload Characterization

Waleed Alkohlani; Jeanine Cook

Workload characterization is important for users, designers, and those specifying future machine acquisitions. If the characterization method is carefully crafted to be comprehensive and consistent across platforms, it can be used to specify characteristics and components that comprise an optimal micro-architecture for the workload or application. Prior work has traditionally focused on two primary objectives: explaining application performance on a particular architecture through bottleneck identification and studying application similarity. This work defines an efficient characterization methodology that enables performance prediction in the context of architecture resources in addition to understanding application performance and similarity. We use four different and relatively new benchmark suites; two of which have not been characterized before. We apply this technique on two distinct micro-architectures to show that the characterization is consistent across platforms and can be used to accurately and optimally map applications to a machine in a testbed of available platforms.

international performance computing and communications conference | 2014

Accurate statistical performance modeling and validation of out-of-order processors using Monte Carlo methods

Waleed Alkohlani; Jeanine Cook; Jonathan E. Cook

Although simulation is an indispensable tool in computer architecture research and development, there is a pressing need for new modeling techniques to improve simulation speeds while maintaining accuracy and robustness. It is no longer practical to use only cycle-accurate processor simulation (the dominant simulation method) for design space and performance studies due to its extremely slow speed. To address this and other problems of cycle-accurate simulation, we propose a fast and accurate statistical modeling methodology based on Monte Carlo methods to model the performance of modern out-of-order processors. Using these statistical models, simulation and performance prediction can be achieved in seconds regardless of the modeled applications size. This paper presents the proposed methodology and its first application to model a modern out-of-order execution processor. We present a statistical model for the Opteron (Magny-Cours) processor and validate it against real hardware. Using SPEC CPU2006 and Mantevo benchmarks, the model can predict performance in terms of cycles-per-instruction within 4.79% of actual on average. We also present a novel method for generating CPI stacks which are CPI representations that quantify the contribution of individual performance components to the total CPI. To further validate these CPI stacks, we use a detailed processor simulator, build a statistical model of the simulator architecture, validate the model against the simulator, and then proceed to validate the CPI stacks predicted by our statistical model. The average CPI prediction error is 5.6%, and the average difference between the predicted and measured CPI components is 1.3% with a maximum difference of 5.4%.

Explore More