Bjarne Steensgaard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bjarne Steensgaard is active.

Explore More

Publication

Featured researches published by Bjarne Steensgaard.

Software - Practice and Experience | 2000

Marmot: an optimizing compiler for Java

Robert P. Fitzgerald; Todd B. Knoblock; Erik Ruf; Bjarne Steensgaard; David Tarditi

The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native‐code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well‐known representation, optimization, code generation, and runtime techniques with a few Java‐specific features to achieve competitive performance. This paper contains a description of the Marmot system design, along with highlights of our experience applying and adapting traditional implementation techniques to Java. A detailed performance evaluation assesses both Marmots overall performance relative to other Java and C++ implementations, and the relative costs of various Java language features in Marmot‐compiled code. Our experience with Marmot has demonstrated that well‐known compilation techniques can produce very good performance for static Java applications – comparable or superior to other Java systems, and approaching that of C++ in some cases. Copyright

symposium on principles of programming languages | 1994

Value dependence graphs: representation without taxation

Daniel Weise; Roger F. Crew; Michael D. Ernst; Bjarne Steensgaard

The value dependence graph (VDG) is a sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels. We are developing a compiler that builds a VDG representing a program, analyzes and transforms the VDG, then produces a control flow graph (CFG) [ASU86] from the optimized VDG. This framework simplifies transformations and improves upon several published results. For example, it enables more powerful code motion than [CLZ86, FOW87], eliminates as many redundancies as [AWZ88, RWZ88] (except for redundant loops), and provides important information to the code scheduler [BR91]. We exhibit a fast, one-pass method for elimination of partial redundancies that never performs redundant code motion [KFS92, DS93] and is simpler than the classical [MR79, Dha91] or SSA [RWZ88] methods. These results accrue from eliminating the CFG from the analysis/transformation phases and using demand dependences in preference to control dependences.

compiler construction | 2000

Fast Escape Analysis and Stack Allocation for Object-Based Programs

Bjarne Steensgaard

A fast and scalable interprocedural escape analysis algorithm is presented.T he analysis computes a description of a subset of created objects whose lifetime is bounded by the lifetime of a runtime stack frame.T he analysis results can be used for many purposes, including stack allocation of objects, thread synchronization elimination, dead-store removal, code motion, and iterator reduction. A method to use the analysis results for transforming a program to allocate some objects on the runtime stack is also presented.F or non-trivial programs, typically 10%-20% of all allocated objects are placed on the runtime stack after the transformation.

compiler construction | 1996

Points-to Analysis by Type Inference of Programs with Structures and Unions

Bjarne Steensgaard

We present an interprocedural flow-insensitive points-to analysis algorithm based on monomorphic type inference. The source language model the important features of C including pointers, pointer arithmetic, pointers to functions, structured objects, and unions. The algorithm is based on a non-standard type system where types represent nodes and edges in a storage shape graph.

european conference on computer systems | 2007

Sealing OS processes to improve dependability and safety

Galen C. Hunt; Mark Aiken; Manuel Fähndrich; Chris Hawblitzel; Orion Hodson; James R. Larus; Steven P. Levi; Bjarne Steensgaard; David Tarditi; Ted Wobber

In most modern operating systems, a process is a hardware-protected abstraction for isolating code and data. This protection, however, is selective. Many common mechanisms---dynamic code loading, run-time code generation, shared memory, and intrusive system APIs---make the barrier between processes very permeable. This paper argues that this traditional open process architecture exacerbates the dependability and security weaknesses of modern systems. As a remedy, this paper proposes a sealed process architecture, which prohibits dynamic code loading, self-modifying code, shared memory, and limits the scope of the process API. This paper describes the implementation of the sealed process architecture in the Singularity operating system, discusses its merits and drawbacks, and evaluates its effectiveness. Some benefits of this sealed process architecture are: improved program analysis by tools, stronger security and safety guarantees, elimination of redundant overlaps between the OS and language runtimes, and improved software engineering. Conventional wisdom says open processes are required for performance; our experience suggests otherwise. We present the first macrobenchmarks for a sealed-process operating system and applications. The benchmarks show that an experimental sealed-process system can achieve performance competitive with highly-tuned, commercial, open-process systems.

international symposium on memory management | 2000

Thread-specific heaps for multi-threaded programs

Bjarne Steensgaard

Garbage collection for a multi-threaded program typically involves either stopping all threads while doing the collection or involves copious amounts of synchronization between threads. However, a lot of data is only ever visible to a single thread, and such data should ideally be collected without involving other threads. Given an escape analysis, a memory management system may allocate thread-specific data in thread-specific heaps and allocate shared data in a shared heap. Garbage collection of data in a thread-specific heaps can be done independent of other threads and of data in their thread-specific heaps. For multi-threaded programs, thread-specific heaps allow reduced garbage collection latency for active threads. On multi-processor computers, thread-specific heaps allow concurrent garbage collection of different thread-specific heaps with minimal synchronization overhead. We present an escape analysis and a sample memory management system using thread-specific heaps.

international symposium on memory management | 2007

Stopless: a real-time garbage collector for multiprocessors

Filip Pizlo; Daniel Frampton; Erez Petrank; Bjarne Steensgaard

We present Stopless: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that supports real-time on modern platforms is notoriously hard,especially if real-time implies lock-freedom. Known real-time collectors either restrict the real-time guarantees to uniprocessors only, rely on special hardware, or just give up supporting atomic operations (which are crucial for lock-free software). Stopless is the first collector that provides real-time responsiveness while preserving lock-freedom, supporting atomic operations, controlling fragmentation by compaction, and supporting modern parallel platforms. Stopless is adequate for modern languages such as C# or Java. It was implemented on top of the Bartok compiler and runtime for C# and measurements demonstrate high responsiveness (a factor of a 100 better than previously published systems), virtually no pause times, good mutator utilization, and acceptable overheads.

programming language design and implementation | 2008

A study of concurrent real-time garbage collectors

Filip Pizlo; Erez Petrank; Bjarne Steensgaard

Concurrent garbage collection is highly attractive for real-time systems, because offloading the collection effort from the executing threads allows faster response, allowing for extremely short deadlines at the microseconds level. Concurrent collectors also offer much better scalability over incremental collectors. The main problem with concurrent real-time collectors is their complexity. The first concurrent real-time garbage collector that can support fine synchronization, STOPLESS, has recently been presented by Pizlo et al. In this paper, we propose two additional (and different) algorithms for concurrent real-time garbage collection: CLOVER and CHICKEN. Both collectors obtain reduced complexity over the first collector STOPLESS, but need to trade a benefit for it. We study the algorithmic strengths and weaknesses of CLOVER and CHICKEN and compare them to STOPLESS. Finally, we have implemented all three collectors on the Bartok compiler and runtime for C# and we present measurements to compare their efficiency and responsiveness.

Sigplan Notices | 1995

Sparse functional stores for imperative programs

Bjarne Steensgaard

In recent years, the trend in program representations for imperative programs has been to make them more functional, or to make them more sparse. However, new sparse representations have been non-functional, and new functional representations have not been sparse in the presence of pointer operations. In this paper, we present a functional representation that is sparse even in the presence of pointer operations. Conventionally, a store is represented in a functional program representation by a single object—typically a mapping from locations to values. We show how such a store object may be fragmented into several objects, each representing part of the store. The result is a sparser representation, which has not only the usual benefit of directly linking producers to consumers, but which also for static program analysis often leads to smaller domains of abstract values for store objects. Store fragmentation corresponds to assignment factored SSA form (a factorization of SSA form introduced in this paper). We report on experiments with a thorough fragmentation based on a data flow points-to analysis and an intermediate level fragmentation based on an almost linear time complexity points-to analysis by type inference.

virtual execution environments | 2009

A lock-free, concurrent, and incremental stack scanning for garbage collectors

Gabriel Kliot; Erez Petrank; Bjarne Steensgaard

Two major efficiency parameters for garbage collectors are the throughput overheads and the pause times that they introduce. Highly responsive systems need to use collectors with as short as possible pause times. Pause lengths have decreased significantly during the years, especially through the use of concurrent garbage collectors. For modern concurrent collectors, the longest pause is typically created by the need to atomically scan the runtime stack. All practical concurrent collectors that we are aware of must obtain a snapshot of the pointers on each threads runtime stack, in order to reclaim objects correctly. To further reduce the length of the collector pauses, incremental stack scans were proposed. However, previous such methods employ locks to stop the mutator from accessing a stack frame while it is being scanned. Thus, these methods introduce a potential long and unpredictable pauses for a mutator thread. In this work we propose the first concurrent, incremental, and lock-free stack scanning for garbage collectors, allowing high responsiveness and support for programs that employ fine-synchronization to avoid locks. Our solution can be employed by all concurrent collectors that we are aware of, it is lock-free, it imposes a negligible overhead on the program execution, and it supports the special in-stack references existing in languages like C#.

Explore More