Is this you? Create Your Porfile

Brendon Cahoon

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brendon Cahoon is active.

Explore More

Publication

Featured researches published by Brendon Cahoon.

international conference on parallel architectures and compilation techniques | 2001

Data Flow Analysis for Software Prefetching Linked Data Structures in Java

Brendon Cahoon; Kathryn S. McKinley

Describes an effective compile-time analysis for software prefetching in Java. Previous work in software data prefetching for pointer-based codes asses simple compiler algorithms and does not investigate prefetching for object-oriented language features that male compile-time analysis difficult. We develop a new data flow analysis to detect regular accesses to linked data structures in Java programs. We use intra- and inter-procedural analysis to identify profitable prefetching opportunities for greedy and jump-pointer prefetching, and we implement these techniques in a compiler for Java. Our results show that both prefetching techniques improve four of our ten programs. The largest performance improvement is 48% with jump-pointers, but consistent improvements are difficult to obtain.

ACM Transactions on Information Systems | 2000

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Brendon Cahoon; Kathryn S. McKinley; Zhihong Lu

The information explosion across the Internet and elswhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inquery simulation model. We measure performance as a function of system parameters such as client command rate, number of document collections, ter ms per query, query term frequency, number of answers returned, and command mixture. Our results show that it is important to model both query and document commands because the heterogeneity of commands significantly impacts performance. Based on our results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.

international acm sigir conference on research and development in information retrieval | 1996

Performance evaluation of a distributed architecture for information retrieval

Brendon Cahoon; Kathryn S. McKinley

Information explosion across the Internet and elsewhere offersaccess toanincreasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we describe a fully functional distributed IR system based on the Inquery unified IR system. To refine this prototype, we implement a flexible simulation model that analyzes performance issues given a wide variety of system parameters and configurations. We present aseries ofexperiments that measure response time, system utilization, and identify bottlenecks. We vary numerous system parameters, such as the number of users, text collections, terms per query, and workload to ireneralize our results for other distributed IR systems. Based on our initial results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads. we demonstrate system organizations for which response’ time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.

compiler construction | 2004

The Limits of Alias Analysis for Scalar Optimizations

Rezaul Alam Chowdhury; Peter Djeu; Brendon Cahoon; James H. Burrill; Kathryn S. McKinley

In theory, increasing alias analysis precision should improve compiler optimizations on C programs. This paper compares alias analysis algorithms on scalar optimizations, including an analysis that assumes no aliases, to establish a very loose upper bound on optimization opportunities. We then measure optimization opportunities on thirty-six C programs. In practice, the optimizations are rarely inhibited due to the precision of the alias analyses. Previous work finds similarly that the increased precision of specific alias algorithms provide little benefit for scalar optimizations, and that simple static alias algorithms find almost all dynamically determined aliases. This paper, however, is the first to provide a static methodology that indicates that additional precision is unlikely to yield improvements for a set of optimizations. For clients with higher alias accuracy demands, this methodology can help pinpoint the need for additional accuracy.

Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande | 2002

Simple and effective array prefetching in Java

Brendon Cahoon; Kathryn S. McKinley

Java is becoming a viable choice for numerical algorithms due to the software engineering benefits of object-oriented programming. Because these programs still use large arrays that do not fit in the cache, they continue to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work uses data-flow analysis to discover linked data structure accesses, and here we present a more general version that also identifies loop induction variables used in array accesses. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. Because our analysis is much simpler and quicker, it is suitable for including in a just-in-time compiler. We further show that the additional loop transformations and careful scheduling of prefetches used in previous work are not always necessary for modern architectures and Java programs.

european conference on parallel processing | 1998

The Hardware/Software Balancing Act for Information Retrieval on Symmetric Multiprocessors

Zhihong Lu; Kathryn S. McKinley; Brendon Cahoon

Web search engines, such as AltaVista and Infoseek, handle tremendous loads by exploiting the parallelism implicit in their tasks and using symmetric multiprocessors to support their services. The web searching problem that they solve is a special case of the more general information retrieval (IR) problem of locating documents relevant to the information need of users. In this paper, we investigate how to exploit a symmetric multiprocessor to build high performance IR servers. Although the problem can be solved by throwing lots of CPU and disk resources at it, the important questions are how much of which hardware and what software structure is needed to effectively exploit hardware resources. We have found, to our surprise, that in some cases adding hardware degrades performance rather than improves it. We compare the performance of multithreading and multitasking and show that multiple threads are needed to fully utilize hardware resources. Our investigation is based on InQuery, a state-of-the-art full-text information retrieval engine, that is widely used in Web search engines, large libraries, companies, and government agencies such as Infoseek, Library of Congress, White House, West Publishing, and Lotus.

Concurrency and Computation: Practice and Experience | 2005

Recurrence analysis for effective array prefetching in Java

Brendon Cahoon; Kathryn S. McKinley

Java is an attractive choice for numerical, as well as other, algorithms due to the software engineering benefits of object‐oriented programming. Because numerical programs often use large arrays that do not fit in the cache, they suffer from poor memory performance. To hide memory latency, we describe a new unified compile‐time analysis for software prefetching arrays and linked structures in Java. Our previous work used data‐flow analysis to discover linked data structure accesses. We generalize our prior approach to identify loop induction variables as well, which we call recurrence analysis. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out‐of‐order superscalar processor running a set of array‐based Java programs. Across all of our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Because our analysis is much simpler and quicker than previous techniques, it is suitable for including in a just‐in‐time compiler. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. We further show that the additional loop transformations and careful scheduling of prefetches from previous work are not always necessary for modern architectures and Java programs. Copyright

Archive | 1999