Is this you? Create Your Porfile

Erez Perelman

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erez Perelman is active.

Explore More

Publication

Featured researches published by Erez Perelman.

architectural support for programming languages and operating systems | 2002

Automatically characterizing large scale program behavior

Timothy Sherwood; Erez Perelman; Greg Hamerly; Brad Calder

Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.

international conference on parallel architectures and compilation techniques | 2001

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Timothy Sherwood; Erez Perelman; Brad Calder

Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a programs execution to evaluate their results, rather than simulating the entire program. In this paper we propose Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire programs execution. This approach is based upon using profiles of a programs code structure (basic blocks) to uniquely identify different phases of execution in the program. We show that the periodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics (e.g., IPC, branch miss rate, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.

international symposium on microarchitecture | 2003

Discovering and exploiting program phases

Timothy Sherwood; Erez Perelman; Greg Hamerly; Suleyman Sair; Brad Calder

Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the largest of scales (that is, over the programs complete execution). During one part of the execution, a program can be completely memory bound; in another, it can repeatedly stall on branch mispredicts. Average statistics gathered about a program might not accurately picture where the real problems lie. This realization has ramifications for many architecture and compiler techniques, from how to best schedule threads on a multithreaded machine, to feedback-directed optimizations, power management, and the simulation and test of architectures. Taking advantage of time-varying behavior requires a set of automated analytic tools and hardware techniques that can discover similarities and changes in program behavior on the largest of time scales. The challenge in building such tools is that during a programs lifetime it can execute billions or trillions of instructions. How can high-level behavior be extracted from this sea of instructions? Some programs change behavior drastically, switching between periods of high and low performance, yet system design and optimization typically focus on average system behavior. It is argued that instead of assuming average behavior, it is now time to model and optimize phase-based program behavior.

international conference on parallel architectures and compilation techniques | 2003

Picking statistically valid and early simulation points

Erez Perelman; Greg Hamerly; Brad Calder

Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To address this issue we have recently proposed using simulation points (found by only examining basic block execution frequency profiles) to increase the efficiency and accuracy of simulation. Simulation points are a small set of execution samples that when combined represent the complete execution of the program. We present a statistically driven algorithm for forming clusters from which simulation points are chosen, and examine algorithms for picking simulation points earlier in a programs execution-in order to significantly reduce fast-forwarding time during simulation. In addition, we show that simulation points can be used independent of the underlying architecture. The points are generated once for a program/input pair by only examining the code executed. We show the points accurately track hardware metrics (e.g., performance and cache miss rates) between different architecture configurations. They can therefore be used across different architecture configurations to allow a designer to make accurate trade-off decisions between different configurations.

measurement and modeling of computer systems | 2003

Using SimPoint for accurate and efficient simulation

Erez Perelman; Greg Hamerly; Michael Van Biesbrouck; Timothy Sherwood; Brad Calder

Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months to complete. This problem is exacerbated by the fact that to properly perform an architectural evaluation requires multiple benchmarks to be evaluated across many separate runs. To address this issue we recently created a tool called SimPoint that automatically finds a small set of Simulation Points to represent the complete execution of a program for efficient and accurate simulation. In this paper we describe how to use the SimPoint tool, and introduce an improved SimPoint algorithm designed to significantly reduce the simulation time required when the simulation environment relies upon fast-forwarding.

international symposium on performance analysis of systems and software | 2005

The Strong correlation Between Code Signatures and Performance

Jeremy Lau; Jack Sampson; Erez Perelman; Greg Hamerly; Brad Calder

A recent study examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code signatures and performance predictability. The paper raises the question of how much information is lost in the sampling process, and our paper focuses on examining this issue. We first focus on showing that there exists a strong correlation between code signatures and performance. We then examine the relationship between sampled and full code signatures, and how these affect performance predictability. Our results confirm that there is a fuzzy correlation found in recent work for the SPEC programs with sampled code signatures, but that a strong correlation exists with full code signatures. In addition, we propose converting the sampled instruction counts, used in the prior work, into sampled code signatures representing loop and procedure execution frequencies. These sampled loop and procedure code signatures allow phase analysis to more accurately and easily find patterns, and they correlate better with performance

international parallel and distributed processing symposium | 2006

Detecting phases in parallel applications on shared memory architectures

Erez Perelman; Marzia Polito; Jean-Yves Bouguet; Jack Sampson; Brad Calder; Carole Dulong

Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a programs execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource requirements. In this paper, we examine applying these phase analysis algorithms and how to adapt them to parallel applications running on shared memory processors. Our approach relies on a separate representation of each threads activity. We first focus on showing its ability to identify similar intervals of execution across threads for a single run. We then show that it is effective at identifying similar behavior of a program when the number of threads is varied between runs. This can be used by developers to examine how different phases scale across different number of threads. Finally, we examine using the phase analysis to pick simulation points to guide multithreaded simulation.

international symposium on performance analysis of systems and software | 2005

Motivation for Variable Length Intervals and Hierarchical Phase Behavior

Jeremy Lau; Erez Perelman; Greg Hamerly; Timothy Sherwood; Brad Calder

Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a programs execution into phases, where the intervals in each phase have homogeneous behavior and similar resource requirements. These prior techniques focus on fixed length intervals (such as a hundred million instructions) to find phase behavior. Fixed length intervals can make a programs periodic phase behavior difficult to find, because the fixed interval length can be out of sync with the period of the programs actual phase behavior. In addition, a fixed interval length can only express one level of phase behavior. In this paper, we graphically show that there exists a hierarchy of phase behavior in programs and motivate the need for variable length intervals. We describe the changes applied to SimPoint to support variable length intervals. We finally conclude by providing an initial study into using variable length intervals to guide SimPoint

measurement and modeling of computer systems | 2004

How to use SimPoint to pick simulation points

Greg Hamerly; Erez Perelman; Brad Calder

Understanding the cycle level behavior of a processor running an application is crucial to modern computer architecture research. To gain this understanding, detailed cycle level simulators are typically employed. Unfortunately, this level of detail comes at the cost of speed, and simulating the full execution of an industry standard benchmark on even the fastest simulator can take weeks to months to complete. This fact has not gone unnoticed, and several techniques have been developed aimed at reducing simulation time.

international conference on parallel architectures and compilation techniques | 2005

Variational path profiling

Erez Perelman; Trishul M. Chilimbi; Brad Calder

Current profiling techniques are good at identifying where time is being spent during program execution. These techniques are not as good at pinpointing exactly where in the execution there are definite opportunities a programmer can exploit with optimization. In this paper we present a new type of profiling analysis called variational path profiling (VPP). VPP pinpoints exactly where in the program there are potentially significant optimization opportunities for speedup. VPP finds the acyclic control flow paths that vary the most in execution time (the time it takes to execute each occurrence of the path). This is calculated by sampling the time it takes to execute frequent paths using hardware performance counters. The motivation for concentrating on a path with a high net variation in its execution time is that it can potentially be optimized so that most or all executions of that path have the minimal execution time seen during profiling. We present a profiling and analysis approach to find these variational paths, so that they can be communicated back to a programmer to guide optimization. Our results show that this variation accounts for a significant fraction of overall program execution time and a small number of paths account for a large fraction of this variation. By applying straight forward prefetching optimizations to these variational paths we see 8.5% speedups on average.

Explore More