Angela Demke Brown
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Angela Demke Brown.
ACM Transactions on Computer Systems | 2001
Angela Demke Brown; Todd C. Mowry; Orran Krieger
Current operating systems offer poor performance when a numeric applications working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme the compiler provides the crucial information on future access patterns without burdening the programmer; the operating system supports nonbinding prefetch and release hints for managing I/O; and the operating systems cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively insert prefetches ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We implemented our compiler analysis within the SUIF compiler, and used it to target implementations of our run-time and OS support on both research and commercial systems (Hurricane and IRIX 6.5, respectively). Our experimental results show large performance gains for out-of-core scientific applications on both systems: more than 50% of the I/O stall time has been eliminated in most cases, thus translating into overall speedups of roughly twofold in many cases.
Journal of Parallel and Distributed Computing | 2007
Thomas E. Hart; Paul E. McKenney; Angela Demke Brown; Jonathan Walpole
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking require a memory reclamation scheme that reclaims elements once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes-quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation-using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect the memory reclamation performance. We discuss the consequences of our results for programmers and algorithm designers. Finally, we describe the use of one scheme, quiescent-state-based reclamation, in the context of an OS kernel-an execution environment which is well suited to this scheme.
international conference on data engineering | 2011
Suprio Ray; Bogdan Simion; Angela Demke Brown
The volume of spatial data generated and consumed is rising exponentially and new applications are emerging as the costs of storage, processing power and network bandwidth continue to decline. Database support for spatial operations is fast becoming a necessity rather than a niche feature provided by a few products. However, the spatial functionality offered by current commercial and open-source relational databases differs significantly in terms of available features, true geodetic support, spatial functions and indexing. Benchmarks play a crucial role in evaluating the functionality and performance of a particular database, both for application users and developers, and for the database developers themselves. In contrast to transaction processing, however, there is no standard, widely used benchmark for spatial database operations. In this paper, we present a spatial database benchmark called Jackpine. Our benchmark is portable (it can support any database with a JDBC driver implementation) and includes both micro benchmarks and macro workload scenarios. The micro benchmark component tests basic spatial operations in isolation; it consists of queries based on the Dimensionally Extended 9-intersection model of topological relations and queries based on spatial analysis functions. Each macro workload includes a series of queries that are based on a common spatial data application. These macro scenarios include map search and browsing, geocoding, reverse geocoding, flood risk analysis, land information management and toxic spill analysis. We use Jackpine to evaluate the spatial features in 2 open source databases and 1 commercial offering.
symposium on code generation and optimization | 2005
Marc Berndl; Benjamin Vitale; Mathew Zaleski; Angela Demke Brown
Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual programs control flow and the hardware program counter, which we call the context problem, direct threadings indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardwares branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.
virtual execution environments | 2005
Levon Stepanian; Angela Demke Brown; Allan Kielstra; Gita Koblents; Kevin A. Stoodley
We introduce a strategy for inlining native functions into Java™ applications using a JIT compiler. We perform further optimizations to transform inlined callbacks into semantically equivalent lightweight operations. We show that this strategy can substantially reduce the overhead of performing JNI calls, while preserving the key safety and portability properties of the JNI. Our work leverages the ability to store statically-generated IL alongside native binaries, to facilitate native inlining at Java callsites at JIT compilation time. Preliminary results with our prototype implementation show speedups of up to 93X when inlining and callback transformation are combined.
ACM Transactions on Storage | 2012
Daniel Fryer; Kuei Sun; Rahat Mahmood; TingHao Cheng; Shaun Benjamin; Ashvin Goel; Angela Demke Brown
File system bugs that corrupt metadata on disk are insidious. Existing reliability methods, such as checksums, redundancy, or transactional updates, merely ensure that the corruption is reliably preserved. Typical workarounds, based on using backups or repairing the file system, are painfully slow. Worse, the recovery may result in further corruption. We present Recon, a system that protects file system metadata from buggy file system operations. Our approach leverages file systems that provide crash consistency using transactional updates. We define declarative statements called consistency invariants for a file system. These invariants must be satisfied by each transaction being committed to disk to preserve file system integrity. Recon checks these invariants at commit, thereby minimizing the damage caused by buggy file systems. The major challenges to this approach are specifying invariants and interpreting file system behavior correctly without relying on the file system code. Recon provides a framework for file-system specific metadata interpretation and invariant checking. We show the feasibility of interpreting metadata and writing consistency invariants for the Linux ext3 file system using this framework. Recon can detect random as well as targeted file-system corruption at runtime as effectively as the offline e2fsck file-system checker, with low overhead.
architectural support for programming languages and operating systems | 2012
Peter Feiner; Angela Demke Brown; Ashvin Goel
Dynamic binary translation (DBT) is a powerful technique that enables fine-grained monitoring and manipulation of an existing program binary. At the user level, it has been employed extensively to develop various analysis, bug-finding, and security tools. Such tools are currently not available for operating system (OS) binaries since no comprehensive DBT framework exists for the OS kernel. To address this problem, we have developed a DBT framework that runs as a Linux kernel module, based on the user-level DynamoRIO framework. Our approach is unique in that it controls all kernel execution, including interrupt and exception handlers and device drivers, enabling comprehensive instrumentation of the OS without imposing any overhead on user-level code. In this paper, we discuss the key challenges in designing and building an in-kernel DBT framework and how the design differs from user-space. We use our framework to build several sample instrumentations, including simple instruction counting as well as an implementation of shadow memory for the kernel. Using the shadow memory, we build a kernel stack overflow protection tool and a memory addressability checking tool. Qualitatively, the system is fast enough and stable enough to run the normal desktop workload of one of the authors for several weeks.
european conference on computer systems | 2007
Marek Olszewski; Keir Mierle; Adam Czajkowski; Angela Demke Brown
As modern operating systems become more complex, understanding their inner workings is increasingly difficult. Dynamic kernel instrumentation is a well established method of obtaining insight into the workings of an OS, with applications including debugging, profiling and monitoring, and security auditing. To date, all dynamic instrumentation systems for operating systems follow the probe-based instrumentation paradigm. While efficient on fixed-length instruction set architectures, probes are extremely expensive on variable-length ISAs such as the popular Intel x86 and AMD x86-64. We propose using just-in-time (JIT) instrumentation to overcome this problem. While common in user space, JIT instrumentation has not until now been attempted in kernel space. In this work, we show the feasibility and desirability of kernel-based JIT instrumentation for operating systems with our novel prototype, implemented as a Linux kernel module. The prototype is fully SMP capable. We evaluate our prototype against the popular Kprobes Linux instrumentation tool. Our prototype outperforms Kprobes, at both micro and macro levels, by orders of magnitude when applying medium- and fine-grained instrumentation.
international parallel and distributed processing symposium | 2006
Thomas E. Hart; Paul E. McKenney; Angela Demke Brown
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking, require a memory reclamation scheme that reclaims nodes once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes -quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation - using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect memory reclamation performance
international symposium on memory management | 2007
Reza Azimi; Livio Soares; Michael Stumm; Thomas Walsh; Angela Demke Brown
Traditionally, operating systems use a coarse approximation of memory accesses to implement memory management algorithms by monitoring page faults or scanning page table entries. With finer-grained memory access information, however, the operating system can manage memory muchmore effectively. Previous work has proposed the use of a software mechanism based on virtual page protection and soft faults to track page accesses at finer granularity. In this paper, we show that while this approach is effective for some applications, for many others it results in an unacceptably high overhead. We propose simple Page Access Tracking Hardware (PATH)to provide accurate page access information to the operating system. The suggested hardware support is generic andcan be used by various memory management algorithms. In this paper, we show how the information generated by PATH can be used to implement (i) adaptive page replacement policies, (ii) smart process memory allocation to improve performance or to provide isolation and better process prioritization, and (iii) effectively prefetch virtual memory pages when applications have non-trivial memory access patterns. Our simulation results show that these algorithms can dramatically improve performance (up to 500%) with PATH-provided information, especially when the system is under memory pressure. We show that the software overhead of processing PATH information is less than 6% acrossthe applications we examined (less than 3% in all but two applications), which is at least an order of magni.