Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Brad Calder is active.

Publication


Featured researches published by Brad Calder.


architectural support for programming languages and operating systems | 2002

Automatically characterizing large scale program behavior

Timothy Sherwood; Erez Perelman; Greg Hamerly; Brad Calder

Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.


international conference on parallel architectures and compilation techniques | 2001

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Timothy Sherwood; Erez Perelman; Brad Calder

Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a programs execution to evaluate their results, rather than simulating the entire program. In this paper we propose Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire programs execution. This approach is based upon using profiles of a programs code structure (basic blocks) to uniquely identify different phases of execution in the program. We show that the periodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics (e.g., IPC, branch miss rate, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.


symposium on operating systems principles | 2011

Windows Azure Storage: a highly available cloud storage service with strong consistency

Brad Calder; Ju Wang; Aaron W. Ogus; Niranjan Nilakantan; Arild E. Skjolsvold; Sam McKelvie; Yikang Xu; Shashwat Srivastav; Jiesheng Wu; Huseyin Simitci; Jaidev Haridas; Chakravarthy Uddaraju; Hemal Khatri; Andrew James Edwards; Vaman Bedekar; Shane Mainali; Rafay Abbasi; Arpit Agarwal; Mian Fahim ul Haq; Muhammad Ikram ul Haq; Deepali Bhardwaj; Sowmya Dayanand; Anitha Adusumilli; Marvin McNett; Sriram Sankaran; Kavitha Manivannan; Leonidas Rigas

Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durably using both local and geographic replication to facilitate disaster recovery. Currently, WAS storage comes in the form of Blobs (files), Tables (structured storage), and Queues (message delivery). In this paper, we describe the WAS architecture, global namespace, and data model, as well as its resource provisioning, load balancing, and replication systems.


international symposium on computer architecture | 2003

Phase tracking and prediction

Timothy Sherwood; Suleyman Sair; Brad Calder

In a single second a modern processor can execute billions of instructions. Obtaining a birds eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavior is anything but steady state, and understanding the patterns of behavior, at run-time, can unlock a multitude of optimization opportunities.In this paper, we present a unified profiling architecture that can efficiently capture, classify, and predict phase-based program behavior on the largest of time scales. By examining the proportion of instructions that were executed from different sections of code, we can find generic phases that correspond to changes in behavior across many metrics. By classifying phases generically, we avoid the need to identify phases for each optimization, and enable a unified prediction scheme that can forecast future behavior. Our analysis shows that our design can capture phases that account for over 80% of execution using less that 500 bytes of on-chip memory.


international conference on computer communications | 2004

Deterministic memory-efficient string matching algorithms for intrusion detection

Nathan Tuck; Timothy Sherwood; Brad Calder; George Varghese

Intrusion detection systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is the ability to search through packets and identify content that matches known attacks. Space and time efficient string matching algorithms are therefore important for identifying these packets at line rate. We examine string matching algorithms and their use for intrusion detection, in particular, we focus our efforts on providing worst-case performance that is amenable to hardware implementation. We contribute modifications to the Aho-Corasick string-matching algorithm that drastically reduce the amount of memory required and improve its performance on hardware implementations. We also show that these modifications do not drastically affect software performance on commodity processors, and therefore may be worth considering in these cases as well.


Journal of Parallel and Distributed Computing | 2003

Entropia: architecture and performance of an enterprise desktop grid system

Andrew A. Chien; Brad Calder; Stephen T. Elbert; Karan Bhatia

The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10×-1000×). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of efficiency, robustness, security, scalability, manageability, unobtrusiveness, and openness/ ease of application integration.We describe the Entropia distributed computing system as a case study, detailing its internal architecture and philosophy in attacking these key problems. Key aspects of the Entropia system include the use of: (1) binary sandboxing technology for security and unobtrusiveness, (2) a layered architecture for efficiency, robustness, scalability and manageability, and (3) an open integration model to allow applications from many sources to be incorporated.Typical applications for the Entropia System includes molecular docking, sequence analysis, chemical structure modeling, ;and risk management. The applications come from a diverse set of domains including virtual screening for drug discovery, genomics for drug targeting, material property prediction, and portfolio management. In all cases, these applications scale to many thousands of nodes and have no dependences between tasks. We present representative performance results from several applications that illustrate the high performance, linear scaling, and overall capability presented by the Entropia system.


international symposium on computer architecture | 2005

BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Satish Narayanasamy; Gilles Pokam; Brad Calder

Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a programs execution. BugNet focuses on being able to replay the applications execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an applications execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer.


architectural support for programming languages and operating systems | 1998

Cache-conscious data placement

Brad Calder; Chandra Krintz; Simmi John; Todd M. Austin

As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet


international symposium on microarchitecture | 2003

Discovering and exploiting program phases

Timothy Sherwood; Erez Perelman; Greg Hamerly; Suleyman Sair; Brad Calder

ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.


international conference on parallel architectures and compilation techniques | 2003

Picking statistically valid and early simulation points

Erez Perelman; Greg Hamerly; Brad Calder

Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the largest of scales (that is, over the programs complete execution). During one part of the execution, a program can be completely memory bound; in another, it can repeatedly stall on branch mispredicts. Average statistics gathered about a program might not accurately picture where the real problems lie. This realization has ramifications for many architecture and compiler techniques, from how to best schedule threads on a multithreaded machine, to feedback-directed optimizations, power management, and the simulation and test of architectures. Taking advantage of time-varying behavior requires a set of automated analytic tools and hardware techniques that can discover similarities and changes in program behavior on the largest of time scales. The challenge in building such tools is that during a programs lifetime it can execute billions or trillions of instructions. How can high-level behavior be extracted from this sea of instructions? Some programs change behavior drastically, switching between periods of high and low performance, yet system design and optimization typically focus on average system behavior. It is argued that instead of assuming average behavior, it is now time to model and optimize phase-based program behavior.

Collaboration


Dive into the Brad Calder's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dirk Grunwald

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeremy Lau

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Glenn Reinman

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Suleyman Sair

North Carolina State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge