David Tarditi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Tarditi is active.

Explore More

Publication

Featured researches published by David Tarditi.

architectural support for programming languages and operating systems | 2006

Accelerator: using data parallelism to program GPUs for general-purpose uses

David Tarditi; Sidd Puri; Jose Oglesby

GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead. Programmers use a conventional imperative programming language and a library that provides only high-level data-parallel operations. No aspects of GPUs are exposed to programmers. The library implementation compiles the data-parallel operations on the fly to optimized GPU pixel shader code and API calls.We describe the compilation techniques used to do this. We evaluate the effectiveness of using data parallelism to program GPUs by providing results for a set of compute-intensive benchmarks. We compare the performance of Accelerator versions of the benchmarks against hand-written pixel shaders. The speeds of the Accelerator versions are typically within 50% of the speeds of hand-written pixel shader code. Some benchmarks significantly outperform C versions on a CPU: they are up to 18 times faster than C code running on a CPU.

programming language design and implementation | 1996

TIL: a type-directed optimizing compiler for ML

David Tarditi; J. Gregory Morrisett; Perry Cheng; Christopher A. Stone; Robert Harper; Peter Lee

The goal of the TIL project was to explore the use of Typed Intermediate Languages to produce high-performance native code from Standard ML (SML). We believed that existing SML compilers were doing a good job of conventional functional language optimizations, as one might find in a LISP compiler, but that inadequate use was made of the rich type information present in the source language. Our goal was to show that we could get much greater performance by propagating type information through to the back end of the compiler, without sacrificing the advantages afforded by loop-oriented and other optimizations.We also confirmed that using typed intermediate languages dramatically improved the reliability and maintainability of the compiler itself. In particular, we were able to use the type system to express critical invariants, and enforce those invariants through type checking. In this respect, TIL introduced and popularized the notion of a certifying compiler, which attaches a checkable certificate of safety to its generated code. In turn, this led directly to the idea of certified object code, inspiring the development of Proof-Carrying Code and Typed Assembly Language as certified object code formats.

Software - Practice and Experience | 2000

Marmot: an optimizing compiler for Java

Robert P. Fitzgerald; Todd B. Knoblock; Erik Ruf; Bjarne Steensgaard; David Tarditi

The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native‐code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well‐known representation, optimization, code generation, and runtime techniques with a few Java‐specific features to achieve competitive performance. This paper contains a description of the Marmot system design, along with highlights of our experience applying and adapting traditional implementation techniques to Java. A detailed performance evaluation assesses both Marmots overall performance relative to other Java and C++ implementations, and the relative costs of various Java language features in Marmot‐compiled code. Our experience with Marmot has demonstrated that well‐known compilation techniques can produce very good performance for static Java applications – comparable or superior to other Java systems, and approaching that of C++ in some cases. Copyright

ACM Letters on Programming Languages and Systems | 1992

No assembly required: compiling standard ML to C

David Tarditi; Peter Lee; Anurag Acharya

C has been used as a portable target language for implementing languages like Standard ML and Scheme. Previous efforts at compiling these languages to C have produced efficient code, but have compromised on portability and proper tail recursion. We show how to compile Standard ML to C wihout making such compromises. The compilation technique is based on converting Standard ML to a continuation-passing style λ-calculus intermediate language and then compiling this language to C. The code generated by this compiler achieves an execution speed that is about a factor of two slower than that generated by a native code compiler. The compiler generates highly portable code, yet still supports advanced features like garbage collection and first-class continuations. We analyze the performance and determine the aspects of the compilation method that lead to the observed slowdown. We also suggest changes to C compilers that would better support such compilation methods.

european conference on computer systems | 2007

Sealing OS processes to improve dependability and safety

Galen C. Hunt; Mark Aiken; Manuel Fähndrich; Chris Hawblitzel; Orion Hodson; James R. Larus; Steven P. Levi; Bjarne Steensgaard; David Tarditi; Ted Wobber

In most modern operating systems, a process is a hardware-protected abstraction for isolating code and data. This protection, however, is selective. Many common mechanisms---dynamic code loading, run-time code generation, shared memory, and intrusive system APIs---make the barrier between processes very permeable. This paper argues that this traditional open process architecture exacerbates the dependability and security weaknesses of modern systems. As a remedy, this paper proposes a sealed process architecture, which prohibits dynamic code loading, self-modifying code, shared memory, and limits the scope of the process API. This paper describes the implementation of the sealed process architecture in the Singularity operating system, discusses its merits and drawbacks, and evaluates its effectiveness. Some benefits of this sealed process architecture are: improved program analysis by tools, stronger security and safety guarantees, elimination of redundant overlaps between the OS and language runtimes, and improved software engineering. Conventional wisdom says open processes are required for performance; our experience suggests otherwise. We present the first macrobenchmarks for a sealed-process operating system and applications. The benchmarks show that an experimental sealed-process system can achieve performance competitive with highly-tuned, commercial, open-process systems.

ACM Transactions on Computer Systems | 1995

Memory system performance of programs with intensive heap allocation

Amer Diwan; David Tarditi; J. Eliot B. Moss

Heap allocation with copying garbage collection is a general storage management technique for programming languages. It is believed to have poor memory system performance. To investigate this, we conducted an in-depth study of the memory system performance of heap allocation for memory systems found on many machines. We studied the performance of mostly functional Standard ML programs which made heavy use of heap allocation. We found that most machines support heap allocation poorly. However, with the appropriate memory system organization, heap allocation can have good performance. The memory system property crucial for achieving good performance was the ability to allocate and initialize a new object into the cache without a penalty. This can be achieved by having subblock by placement with a subblock size of one word with a write-allocate policy, along with fast page-mode writes or a write buffer. For caches with subblock placement, the data cache overhead was under 9% for a 64K or larger data cache; without subblock placement the overhead was often higher than 50%.

international symposium on memory management | 2000

The case for profile-directed selection of garbage collectors

Robert P. Fitzgerald; David Tarditi

Many garbage-collected systems use a single garbage collection algorithm across all applications. It has long been known that this can produce poor performance on applications for which that collector is not well suited. In some systems, such as those that execute stand-alone compiled executables, an appropriate collector for each application can be selected from a pool of available collectors and tuned by using profile information. In a study of 20 benchmarks and several collectors, compiled with the Marmot optimizing Java-to-native compiler, for every collector there was at least one benchmark that would have been at least 15% faster with a more appropriate collector. The collectors are a copying collector, a generational copying collector, which is combined with each of 4 different write barriers, and the null collector, which allocates but never collects. A detailed analysis of storage management costs shows how they vary by application and collector.

symposium on principles of programming languages | 1994

Memory subsystem performance of programs using copying garbage collection

Amer Diwan; David Tarditi; J. Eliot B. Moss

Heap allocation with copying garbage collection is believed to have poor memory subsystem performance. We conducted a study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We found that many machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good memory subsystem performance.

international symposium on memory management | 2000

Compact garbage collection tables

David Tarditi

Garbage collection tables for finding pointers on the stack can be represented in 20-25% of the space previously reported. Live pointer information is often the same at many call sites because there are few pointers live across most call sites. This allows live pointer information to be represented compactly by a small index into a table of descriptions of pointer locations. The mapping from program counter values to those small indexes can be represented compactly using several techniques. The techniques a assign numbers to call sites and use those numbers to index an array of small indexes. One technique is to represent an array of return addresses by using a two-level table with 16-bit off-sets. Another technique is to use a sparse array of return addresses and interpolate the exact number via disassembly of the executable code.

symposium on principles of programming languages | 2005

A simple typed intermediate language for object-oriented languages

Juan Chen; David Tarditi

Traditional class and object encodings are difficult to use in practical type-preserving compilers because of the complexity of the encodings. We propose a simple typed intermediate language for compiling object-oriented languages and prove its soundness. The key ideas are to preserve lightweight notions of classes and objects instead of compiling them away and to separate name-based subclassing from structure-based subtyping. The language can express standard implementation techniques for both dynamic dispatch and runtime type tests. It has decidable type checking even with subtyping between quantified types with different bounds. Because of its simplicity, the language is a more suitable starting point for a practical type-preserving compiler than traditional encoding techniques.

Explore More