Roland E. Wunderlich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roland E. Wunderlich is active.

Explore More

Publication

Featured researches published by Roland E. Wunderlich.

international symposium on computer architecture | 2003

SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Roland E. Wunderlich; Thomas F. Wenisch; Babak Falsafi; James C. Hoe

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This paper presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of 41 of the 45 possible SPEC2K benchmark/input combinations show CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.

international symposium on microarchitecture | 2006

SimFlex: Statistical Sampling of Computer System Simulation

Thomas F. Wenisch; Roland E. Wunderlich; Michael Ferdman; Anastassia Ailamaki; Babak Falsafi; James C. Hoe

Timing-accurate full-system multiprocessor simulations can take years because of architecture and application complexity. Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism

measurement and modeling of computer systems | 2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

Nikolaos Hardavellas; Stephen Somogyi; Thomas F. Wenisch; Roland E. Wunderlich; Shelley Chen; Jangwoo Kim; Babak Falsafi; James C. Hoe; Andreas G. Nowatzyk

The new focus on commercial workloads in simulation studies of server systems has caused a drastic increase in the complexity and decrease in the speed of simulation tools. The complexity of a large-scale full-system model makes development of a monolithic simulation tool a prohibitively difficult task. Furthermore, detailed full-system models simulate so slowly that experimental results must be based on simulations of only fractions of a second of execution of the modelled system.This paper presents SIMFLEX, a simulation framework which uses component-based design and rigorous statistical sampling to enable development of complex models and ensure representative measurement results with fast simulation turnaround. The novelty of SIMFLEX lies in its combination of a unique, compile-time approach to component interconnection and a methodology for obtaining accurate results from sampled simulations on a platform capable of evaluating unmodified commercial workloads.

measurement and modeling of computer systems | 2005

TurboSMARTS: accurate microarchitecture simulation sampling in minutes

Thomas F. Wenisch; Roland E. Wunderlich; Babak Falsafi; James C. Hoe

Recent research proposes accelerating processor microarchitecture simulation through statistical sampling. Prior simulation sampling approaches construct accurate model state for each measurement by continuously warming large microarchitectural structures (e.g., caches and the branch predictor) while emulating the billions of instructions between measurements. This approach, called functional warming, occupies hours of runtime while the detailed simulation that is measured requires mere minutes.To eliminate the functional warming bottleneck, we propose TurboSMARTS, a simulation framework that stores functionally-warmed state in a library of small, reusable checkpoints. TurboSMARTS enables the creation of the thousands of checkpoints necessary for accurate sampling by storing only the subset of warmed state accessed during simulation of each brief execution window. TurboSMARTS matches the accuracy of prior simulation sampling techniques (i.e., ±3% error with 99.7% confidence), while estimating the performance of an 8-way out-of-order super-scalar processor running SPEC CPU2000 in 91 seconds per benchmark, on average, using a 12 GB checkpoint library.

international symposium on performance analysis of systems and software | 2006

Simulation sampling with live-points

Thomas F. Wenisch; Roland E. Wunderlich; Babak Falsafi; James C. Hoe

Current simulation-sampling techniques construct accurate model state for each measurement by continuously warming large microarchitectural structures (e.g., caches and the branch predictor) while functionally simulating the billions of instructions between measurements. This approach, called functional warming, is the main performance bottleneck of simulation sampling and requires hours of runtime while the detailed simulation of the sample requires only minutes. Existing simulators can avoid functional simulation by jumping directly to particular instruction stream locations with architectural state checkpoints. To replace functional warming, these checkpoints must additionally provide microarchitectural model state that is accurate and reusable across experiments while meeting tight storage constraints. In this paper, we present a simulation-sampling framework that replaces functional warming with live-points without sacrificing accuracy. A live-point stores the bare minimum of functionally-warmed state for accurate simulation of a limited execution window while placing minimal restrictions on microarchitectural configuration. Live-points can be processed in random rather than program order, allowing simulation results and their statistical confidence to be reported while simulations are in progress. Our framework matches the accuracy of prior simulation-sampling techniques (i.e., /spl plusmn/3% error with 99.7% confidence), while estimating the performance of an 8-way out-of-order superscalar processor running SPEC CPU2000 in 91 seconds per benchmark, on average, using a 12 GB live-point library.

international conference on computer design | 2004

In-system FPGA prototyping of an Itanium microarchitecture

Roland E. Wunderlich; James C. Hoe

In this paper, we describe an effort to prototype an Itanium microarchitecture using an FPGA. The microarchitecture model is written in the Bluespec hardware description language (HDL) and supports a subset of the Itanium instruction set architecture. The microarchitecture model includes details such as multi-bundle instruction fetch, decode and issue; parallel pipelined execution units with scoreboarding and predicated bypassing; and multiple levels of cache hierarchies. The microarchitecture model is synthesized and prototyped on a special FPGA card that allows the processor model to interface directly to the memory bus of a host PC. This is an effort toward developing a flexible microprocessor prototyping framework for rapid design exploration.

field programmable gate arrays | 2004

In-system FPGA prototyping of an itanium microarchitecture

Roland E. Wunderlich; James C. Hoe

This work is part of our on-going effort to prototype an Itanium microarchitecture on an FPGA. To conserve time and effort in model development, we described our microarchitecture in Bluespec, a synthesizable high-level hardware description language. The microarchitecture model currently supports a subset of the Itanium instruction set architecture (ISA). The model includes details such as multi-bundle instruction fetch, decode and issue, parallel pipelined execution units with scoreboarding and bypassing, and multiple levels of cache hierarchies. The microarchitecture model is synthesized and prototyped on an FPGA that interfaces directly to the memory bus of a host PC. The prototyped microprocessor core executes the supported ISA subset at 100MHz and directly references the host-PCs DRAM and I/O resources through the memory bus at up to 800MB/sec of bandwidth. This effort is a first step toward developing a convenient in-system microprocessor prototyping platform capable of executing realistic full-scale applications and operating systems.

international parallel and distributed processing symposium | 2006

Statistical sampling of microarchitecture simulation

Thomas F. Wenisch; Roland E. Wunderlich; Babak Falsafi; James C. Hoe

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. The Sampling Microarchitecture Simulation (SMARTS) framework is an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates. Analysis of the SPEC CPU2000 benchmark suite shows that CPI can be estimated to within /spl plusmn/3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to /spl sim/2% for the tested benchmarks. We present two implementations of SMARTS that both achieve an average error of only 0.64% on CPI. SMARTSim constructs accurate model state through functional warming - continuously warming large microarchitectural structures (e.g., caches and the branch predictor) while functionally simulating the billions of instructions between measurements - reducing average simulation turnaround from 5.5 days to 7.0 hours. TurboSMARTSim replaces functional warming with live-points - checkpoints that store a bare minimum of functionally-warmed state for accurate simulation of a limited execution window - further reducing average turnaround to 91 seconds.

Proceedings of the IEEE Workshop on Duplicating, Deconstructing and Debunking | 2004