K. De Bosschere | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where K. De Bosschere is active.

Explore More

Publication

Featured researches published by K. De Bosschere.

international symposium on computer architecture | 2004

Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Lieven Eeckhout; Robert H. Bell; B. Stougie; K. De Bosschere; Lizy Kurian John

Designing a new microprocessor is extremely time-consuming. One of the contributing reasons is that computer designers rely heavily on detailed architectural simulations, which are very time-consuming. Recent work has focused on statistical simulation to address this issue. The basic idea of statistical simulation is to measure characteristics during program execution, generate a synthetic trace with those characteristics and then simulate the synthetic trace. The statistically generated synthetic trace is orders of magnitude smaller than the original program sequence and hence results in significantly faster simulation. This paper makes the following contributions to the statistical simulation methodology. First, we propose the use of a statistical flow graph to characterize the control flow of a program execution. Second, we model delayed update of branch predictors while profiling program execution characteristics. Experimental results show that statistical simulation using this improved control flow modeling attains significantly better accuracy than the previously proposed HLS system. We evaluate both the absolute and the relative accuracy of our approach for power/performance modeling of superscalar microarchitectures. The results show that our statistical simulation framework can be used to efficiently explore processor design spaces.

international conference on parallel architectures and compilation techniques | 2002

Workload design: selecting representative program-input pairs

Lieven Eeckhout; Hans Vandierendonck; K. De Bosschere

Having a representative workload of the target domain of a microprocessor is extremely important throughout its design. The composition of a workload involves two issues: (i) which benchmarks to select and (ii) which input data sets to select per benchmark. Unfortunately, it is impossible to select a huge number of benchmarks and respective input sets due to the large instruction counts per benchmark and due to limitations on the available simulation time. We use statistical data analysis techniques such as principal component analysis (PCA) and cluster analysis to efficiently explore the workload space. Within this workload space, different input data sets for a given benchmark can be displayed, a distance can be measured between program-input pairs that gives us an idea about their mutual behavioral differences and representative input data sets can be selected for the given benchmark. This methodology is validated by showing that program-input pairs that are close to each other in this workload space indeed exhibit similar behavior. The final goal is to select a limited set of representative benchmark-input pairs that span the complete workload space. Next to workload composition, there are a number of other possible applications, namely getting insight in the impact of input data sets on program behavior and profile-guided compiler optimizations.

IEEE Micro | 2003

Statistical simulation: adding efficiency to the computer designer's toolbox

Lieven Eeckhout; S. Nussbaum; James E. Smith; K. De Bosschere

Statistical simulation enables quick and accurate design decisions in the early stages of computer design, at the processor and system levels. it complements detailed but slower architectural simulations, reducing total design time and cost.

high performance computer architecture | 2001

Differential FCM: increasing value prediction accuracy by improving table usage efficiency

Bart Goeman; Hans Vandierendonck; K. De Bosschere

Value prediction is a relatively new technique to increase the Instruction Level Parallelism (ILP) in future microprocessors. An important problem when designing a value predictor is efficiency, an accurate predictor requires huge prediction tables. This is especially the case for the finite context method (FCM) predictor the most accurate one. In this paper, we show that the prediction accuracy of the FCM can be greatly improved by making the FCM predict studies instead of values. This new predictor is called the differential finite context method (DFCM) predictor. The DFCM predictor outperforms a similar FCM predictor by as much as 33%, depending on the prediction table size. If we take the additional storage into account, the difference is still 15% for realistic predictor sizes. We use several metrics to show that the key to this success is reduced aliasing in the level-2 table. We also show that the DFCM is superior to hybrid predictors based on FCM and stride predictors, since its prediction accuracy is higher than that of a hybrid one using a perfect meta-predictor.

international symposium on performance analysis of systems and software | 2000

Performance analysis through synthetic trace generation

Lieven Eeckhout; K. De Bosschere; Henk Neefs

Most research in the area of microarchitectural performance analysis is done using trace-driven simulations. Although trace-driven simulations are fairly accurate, they are both time- and space-consuming which makes them sometimes impractical. Modeling the execution of a computer program by a statistical profile and generating a synthetic benchmark trace from this statistical profile can be used to accelerate the design process. Thanks to the statistical nature of this technique, performance characteristics quickly converge to a steady state solution during simulation, which makes this technique suitable for fast design space explorations. In this paper, it is shown how more detailed statistical profiles can be obtained and how the synthetic trace generation mechanism should be designed to generate syntactically correct benchmark traces. As a result, the performance predictions in this paper are far more accurate than those reported in previous research.

international symposium on signal processing and information technology | 2005

DIABLO: a reliable, retargetable and extensible link-time rewriting framework

L. Van Put; Dominique Chanet; B. De Bus; B. De Sutter; K. De Bosschere

Modern software engineering techniques introduce an overhead to programs in terms of performance and code size. A traditional development environment, where only the compiler optimizes the code, cannot completely eliminate this overhead. To effectively remove the overhead, tools are needed that have a whole-program overview. Link-time binary rewriting is an effective technique for whole-program optimization and instrumentation. In this paper, we describe a novel framework to reliably perform link-time program transformations. This framework is designed to be retargetable, supporting multiple architectures and development toolchains. Furthermore it is extensible, which we illustrate by describing three different applications that are built on top of the framework.

IEEE Computer | 2003

Designing computer architecture research workloads

Lieven Eeckhout; Hans Vandierendonck; K. De Bosschere

Although architectural simulators model microarchitectures at a high abstraction level, the increasing complexity of both the microarchitectures themselves and the applications that run on them make simulator use extremely time-consuming. Simulators must execute huge numbers of instructions to create a workload representative of real applications, creating an unreasonably long simulation time and stretching the time to market. Using reduced input sets instead of reference input sets helps to solve this problem. The authors have developed a methodology that reliably quantifies program behavior similarity to verify if reduced input sets result in program behavior similar to the reference inputs.

international conference on parallel architectures and compilation techniques | 2001

Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces

Lieven Eeckhout; K. De Bosschere

Microprocessor design time and effort are getting impractical due to the huge number of simulations that need to be done to evaluate various processor configurations for various workloads. An early design stage methodology could be useful to efficiently cull huge design spaces to identify regions of interest to be further explored using more accurate simulations. The authors present an early design stage method that bridges the gap between analytical and statistical modeling. The hybrid analytical-statistical method presented is based on the observation that register traffic characteristics exhibit power law properties which allows its to fully characterize a workload with just a few parameters which is much more efficient than the collection of distributions that need to be specified in classical statistical modeling. We evaluate the applicability and the usefulness of this hybrid analytical-statistical modeling technique to efficiently and accurately cull huge architectural design spaces. In addition, we demonstrate that this hybrid analytical-statistical modeling technique can be used to explore the entire workload space by varying just a few workload parameters.

design, automation, and test in europe | 2006

Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors

Stijn Eyerman; Lieven Eeckhout; K. De Bosschere

Previous work on efficient customized processor design primarily focused on in-order architectures. However, with the recent introduction of out-of-order processors for high-end high-performance embedded applications, researchers and designers need to address how to automate the design process of customized out-of-order processors. Because of the parallel execution of independent instructions in out-of-order processors, in-order processor design methodologies which subdivide the search space in independent components are unlikely to be effective in terms of accuracy for designing out-of-order processors. In this paper we propose and evaluate various automated singleand multi-objective optimizations for exploring out-of-order processor designs. We conclude that the newly proposed genetic local search algorithm outperforms all other search algorithms in terms of accuracy. In addition, we propose two-phase simulation in which the first phase explores the design space through statistical simulation; a region of interest is then simulated through detailed simulation in the second phase. We show that simulation time speedups can be obtained of a factor 2.2times to 7.3times using two-phase simulation

Software - Practice and Experience | 2004

JaRec: a portable record/replay environment for multi-threaded Java applications

Andy Georges; Mark Christiaens; Michiel Ronsse; K. De Bosschere

This paper describes JaRec, a portable record/replay system for Java. It correctly replays multi‐threaded, data‐race free Java applications, by recording the order of synchronization operations, and by executing them in the same order during replay. The record/replay infrastructure is developed in Java, and does not require a modification of the Java Virtual Machine (JVM) if it provides the JVM Profiler Interface (JVMPI). If the JVM does not support JVMPI, which is used for intercepting the loaded classes, only a minor modification to the JVM is required in order to run the system. On ystems with limited memory resources, JaRec can be executed in a distributed fashion. This also makes it suitable to aid debugging of multi‐threaded applications on embedded systems. Copyright

Explore More