Matthew Arnold | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew Arnold is active.

Explore More

Publication

Featured researches published by Matthew Arnold.

conference on object-oriented programming systems, languages, and applications | 2000

Adaptive optimization in the Jalapeno JVM

Matthew Arnold; Stephen J. Fink; David Grove; Michael Hind; Peter F. Sweeney

Future high-performance virtual machines will improve performance through sophisticated online feedback-directed optimizations. This paper presents the architecture of the Jalapeno Adaptive Optimization System, a system to support leading-edge virtual machine technology and enable ongoing research on online feedback-directed optimizations. We describe the extensible system architecture, based on a federation of threads with asynchronous communication. We present an implementation of the general architecture that supports adaptive multi-level optimization based purely on statistical sampling. We empirically demonstrate that this profiling technique has low overhead and can improve startup and steady-state performance, even without the presence of online feedback-directed optimizations. The paper also describes and evaluates an online feedback-directed inlining optimization based on statistical edge sampling. The system is written completely in Java, applying the described techniques not only to application code and standard libraries, but also to the virtual machine itself.

programming language design and implementation | 2001

A framework for reducing the cost of instrumented code

Matthew Arnold; Barbara G. Ryder

Instrumenting code to collect profiling information can cause substantial execution overhead. This overhead makes instrumentation difficult to perform at runtime, often preventing many known offline feedback-directed optimizations from being used in online systems. This paper presents a general framework for performing instrumentation sampling to reduce the overhead of previously expensive instrumentation. The framework is simple and effective, using code-duplication and counter-based sampling to allow switching between instrumented and non-instrumented code. Our framework does not rely on any hardware or operating system support, yet provides a high frequency sample rate that is tunable, allowing the tradeoff between overhead and accuracy to be adjusted easily at runtime. Experimental results are presented to validate that our technique can collect accurate profiles (93-98% overlap with a perfect profile) with low overhead (averaging 6% total overhead with a naive implementation). A Jalape~ no-specific optimization is also presented that reduces overhead further, resulting in an average total overhead of 3%.

Proceedings of the IEEE | 2005

A Survey of Adaptive Optimization in Virtual Machines

Matthew Arnold; Stephen J. Fink; David Grove; Michael Hind; Peter F. Sweeney

Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular program representations preclude many forms of whole-program interprocedural optimization. Third, virtual machines incur additional costs for runtime services such as security guarantees and automatic memory management. To address these challenges, vendors have invested considerable resources into adaptive optimization systems in production virtual machines. Today, mainstream virtual machine implementations include substantial infrastructure for online monitoring and profiling, runtime compilation, and feedback-directed optimization. As a result, adaptive optimization has begun to mature as a widespread production-level technology. This paper surveys the evolution and current state of adaptive optimization technology in virtual machines.

conference on object-oriented programming systems, languages, and applications | 2002

Online feedback-directed optimization of Java

Matthew Arnold; Michael Hind; Barbara G. Ryder

This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior (offline) profiling run. It uses a previously developed low-overhead instrumentation sampling framework to collect control flow graph edge profiles. This profile information is used to drive several traditional optimizations, as well as a novel algorithm for performing feedback-directed control flow graph node splitting. We empirically evaluate this system and demonstrate improvements in peak performance of up to 17% while keeping overhead low, with no individual execution being degraded by more than 2% because of instrumentation.

conference on object-oriented programming systems, languages, and applications | 2008

QVM: an efficient runtime for detecting defects in deployed systems

Matthew Arnold; Martin T. Vechev; Eran Yahav

Coping with software defects that occur in the post-deployment stage is a challenging problem: bugs may occur only when the system uses a specific configuration and only under certain usage scenarios. Nevertheless, halting production systems until the bug is tracked and fixed is often impossible. Thus, developers have to try to reproduce the bug in laboratory conditions. Often the reproduction of the bug consists of the lion share of the debugging effort. In this paper we suggest an approach to address the aforementioned problem by using a specialized runtime environment (QVM, for Quality Virtual Machine). QVM efficiently detects defects by continuously monitoring the execution of the application in a production setting. QVM enables the efficient checking of violations of user-specified correctness properties, e.g., typestate safety properties, Java assertions, and heap properties pertaining to ownership. QVM is markedly different from existing techniques for continuous monitoring by using a novel overhead manager which enforces a user-specified overhead budget for quality checks. Existing tools for error detection in the field usually disrupt the operation of the deployed system. QVM, on the other hand, provides a balanced trade off between the cost of the monitoring process and the maintenance of sufficient accuracy for detecting defects. Specifically, the overhead cost of using QVM instead of a standard JVM, is low enough to be acceptable in production environments. We implemented QVM on top of IBMs J9 Java Virtual Machine and used it to detect and fix various errors in real-world applications.

programming language design and implementation | 2010

Finding low-utility data structures

Guoqing Xu; Nick Mitchell; Matthew Arnold; Atanas Rountev; Edith Schonberg; Gary Sevitsky

Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on the programs final output. Constructing a new date formatter to format every date, or populating a large set full of expensively constructed structures only to check its size: these involve costs that are out of line with the benefits gained. This disparity between the formation costs and accrued benefits of data structures is at the heart of much runtime bloat. We introduce a run-time analysis to discover these low-utility data structures. The analysis employs dynamic thin slicing, which naturally associates costs with value flows rather than raw data flows. It constructs a model of the incremental, hop-to-hop, costs and benefits of each data structure. The analysis then identifies suspicious structures based on imbalances of its incremental costs and benefits. To decrease the memory requirements of slicing, we introduce abstract dynamic thin slicing, which performs thin slicing over bounded abstract domains. We have modified the IBM J9 commercial JVM to implement this approach. We demonstrate two client analyses: one that finds objects that are expensive to construct but are not necessary for the forward execution, and second that pinpoints ultimately-dead values. We have successfully applied them to large-scale and long-running Java applications. We show that these analyses are effective at detecting operations that have unbalanced costs and benefits.

programming language design and implementation | 2009

Go with the flow: profiling copies to find runtime bloat

Guoqing Xu; Matthew Arnold; Nick Mitchell; Atanas Rountev; Gary Sevitsky

Many large-scale Java applications suffer from runtime bloat. They execute large volumes of methods, and create many temporary objects, all to execute relatively simple operations. There are large opportunities for performance optimizations in these applications, but most are being missed by existing optimization and tooling technology. While JIT optimizations struggle for a few percent, performance experts analyze deployed applications and regularly find gains of 2x or more. Finding such big gains is difficult, for both humans and compilers, because of the diffuse nature of runtime bloat. Time is spread thinly across calling contexts, making it difficult to judge how to improve performance. Bloat results from a pile-up of seemingly harmless decisions. Each adds temporary objects and method calls, and often copies values between those temporary objects. While data copies are not the entirety of bloat, we have observed that they are excellent indicators of regions of excessive activity. By optimizing copies, one is likely to remove the objects that carry copied values, and the method calls that allocate and populate them. We introduce copy profiling, a technique that summarizes runtime activity in terms of chains of data copies. A flat copy profile counts copies by method. We show how flat profiles alone can be helpful. In many cases, diagnosing a problem requires data flow context. Tracking and making sense of raw copy chains does not scale, so we introduce a summarizing abstraction called the copy graph. We implement three clients analyses that, using the copy graph, expose common patterns of bloat, such as finding hot copy chains and discovering temporary data structures. We demonstrate, with examples from a large-scale commercial application and several benchmarks, that copy profiling can be used by a programmer to quickly find opportunities for large performance gains.

Sigplan Notices | 2000

A comparative study of static and profile-based heuristics for inlining

Matthew Arnold; Stephen J. Fink; Vivek Sarkar; Peter F. Sweeney

In this paper, we present a comparative study of static and profile-based heuristics for inlining. Our motivation for this study is to use the results to design the best inlining algorithm that we can for the Jalapeno dynamic optimizing compiler for Java [6]. We use a well-known approximation algorithm for the KNAPSACK problem as a common “meta-algorithm” for the inlining heuristics studied in this paper. We present performance results for an implementation of these inlining heuristics in the Jalapeno dynamic optimizing compiler. Our performance results show that the inlining heuristics studied in this paper can lead to significant speedups in execution time (up to 1.68x) even with modest limits on code size expansion (at most 10%).

conference on object-oriented programming systems, languages, and applications | 2008

Jolt: lightweight dynamic analysis and removal of object churn

Ajeet Shankar; Matthew Arnold; Rastislav Bodik

It has been observed that component-based applications exhibit object churn, the excessive creation of short-lived objects, often caused by trading performance for modularity. Because churned objects are short-lived, they appear to be good candidates for stack allocation. Unfortunately, most churned objects escape their allocating function, making escape analysis ineffective. We reduce object churn with three contributions. First, we formalize two measures of churn, capture and control (15). Second, we develop lightweight dynamic analyses for measuring both capture and control. Third, we develop an algorithm that uses capture and control to inline portions of the call graph to make churned objects non-escaping, enabling churn optimization via escape analysis. JOLT is a lightweight dynamic churn optimizer that uses our algorithms. We embedded JOLT in the JIT compiler of the IBM J9 commercial JVM, and evaluated JOLT on large application frameworks, including Eclipse and JBoss. We found that JOLT eliminates over 4 times as many allocations as a state-of-the-art escape analysis alone.

conference on object-oriented programming systems, languages, and applications | 2010

Performance analysis of idle programs

Erik R. Altman; Matthew Arnold; Stephen J. Fink; Nick Mitchell

This paper presents an approach for performance analysis of modern enterprise-class server applications. In our experience, performance bottlenecks in these applications differ qualitatively from bottlenecks in smaller, stand-alone systems. Small applications and benchmarks often suffer from CPU-intensive hot spots. In contrast, enterprise-class multi-tier applications often suffer from problems that manifest not as hot spots, but as idle time, indicating a lack of forward motion. Many factors can contribute to undesirable idle time, including locking problems, excessive system-level activities like garbage collection, various resource constraints, and problems driving load. We present the design and methodology for WAIT, a tool to diagnosis the root cause of idle time in server applications. Given lightweight samples of Java activity on a single tier, the tool can often pinpoint the primary bottleneck on a multi-tier system. The methodology centers on an informative abstraction of the states of idleness observed in a running program. This abstraction allows the tool to distinguish, for example, between hold-ups on a database machine, insufficient load, lock contention in application code, and a conventional bottleneck due to a hot method. To compute the abstraction, we present a simple expert system based on an extensible set of declarative rules. WAIT can be deployed on the fly, without modifying or even restarting the application. Many groups in IBM have applied the tool to diagnosis performance problems in commercial systems, and we present a number of examples as case studies.

Explore More