Christopher J. Rossbach

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher J. Rossbach is active.

Explore More

Publication

Featured researches published by Christopher J. Rossbach.

symposium on operating systems principles | 2011

PTask: operating system abstractions to manage GPUs as compute devices

Christopher J. Rossbach; Jon Currey; Mark Silberstein; Baishakhi Ray; Emmett Witchel

We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness and performance isolation, and can streamline data movement in ways that are impossible under current GPU programming models. Our experience developing the PTask API, along with a gestural interface on Windows 7 and a FUSE-based encrypted file system on Linux show that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5× improvement in maximum throughput for the gestural interface.

symposium on operating systems principles | 2009

Operating System Transactions

Donald E. Porter; Owen S. Hofmann; Christopher J. Rossbach; Alexander Benn; Emmett Witchel

Applications must be able to synchronize accesses to operating system resources in order to ensure correctness in the face of concurrency and system failures. System transactions allow the programmer to specify updates to heterogeneous system resources with the OS guaranteeing atomicity, consistency, isolation, and durability (ACID). System transactions efficiently and cleanly solve persistent concurrency problems that are difficult to address with other techniques. For example, system transactions eliminate security vulnerabilities in the file system that are caused by time-of-check-to-time-of-use (TOCTTOU) race conditions. System transactions enable an unsuccessful software installation to roll back without disturbing concurrent, independent updates to the file system. This paper describes TxOS, a variant of Linux 2.6.22 that implements system transactions. TxOS uses new implementation techniques to provide fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. The prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. For instance, a transactional installation of OpenSSH incurs only 10% overhead, and a non-transactional compilation of Linux incurs negligible overhead on TxOS. By making transactions a central OS abstraction, TxOS enables new transactional services. For example, one developer prototyped a transactional ext3 file system in less than one month.

symposium on operating systems principles | 2007

TxLinux: using and managing hardware transactional memory in an operating system

Christopher J. Rossbach; Owen S. Hofmann; Donald E. Porter; Hany E. Ramadan; Bhandari Aditya; Emmett Witchel

TxLinux is a variant of Linux that is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and the first to manage HTM in the scheduler. This paper describes and measures TxLinux and discusses two innovations in detail: cooperation between locks and transactions, and theintegration of transactions with the OS scheduler. Mixing locks and transactions requires a new primitive, cooperative transactional spinlocks (cxspinlocks) that allow locks and transactions to protect the same data while maintaining the advantages of both synchronization primitives. Cxspinlocks allow the system to attemptexecution of critical regions with transactions and automatically roll back to use locking if the region performs I/O. Integrating the scheduler with HTM eliminates priority inversion. On a series ofreal-world benchmarks TxLinux has similar performance to Linux, exposing concurrency with as many as 32 concurrent threads on 32 CPUs in the same critical region.

international symposium on computer architecture | 2007

MetaTM/TxLinux: transactional memory for an operating system

Hany E. Ramadan; Christopher J. Rossbach; Donald E. Porter; Owen S. Hofmann; Aditya Bhandari; Emmett Witchel

Hardware transactional memory can reduce synchronization complexity while retaining high performance. MetaTM models changes to the x86 architecture to support transactional memory for user processes and the operating system. TxLinux is an operating system that uses transactional memory to facilitate synchronization in a large, complicated code base, where the burdens of current lock-based approaches are most evident.

international symposium on microarchitecture | 2008

Dependence-aware transactional memory for increased concurrency

Hany E. Ramadan; Christopher J. Rossbach; Emmett Witchel

Transactional memory (TM) is a promising paradigm for helping programmers take advantage of emerging multi-core platforms. Though they perform well under low contention, hardware TM systems have a reputation of not performing well under high contention, as compared to locks. This paper presents a model and implementation of dependence-aware transactional memory (DATM), a novel solution to the problem of scaling under contention. Unlike many proposals to deal with write-shared data (which arise in common data structures like counters and linked lists), DATM operates transparently to the programmer. The main idea in DATM is to accept any transaction execution interleaving that is conflict serializable, including interleavings that contain simple conflicts. Current TM systems reduce useful concurrency by restarting conflicting transactions, even if the execution interleaving is conflict serializable. DATM manages dependences between uncommitted transactions, sometimes forwarding data between them to safely commit conflicting transactions. The evaluation of our prototype shows that DATM increases concurrency, for example by reducing the runtime of STAMP benchmarks by up to 39% and reducing transaction restarts by up to 94%.

symposium on operating systems principles | 2013

Dandelion: a compiler and runtime for heterogeneous systems

Christopher J. Rossbach; Yuan Yu; Jon Currey; Jean-Philippe Martin; Dennis Fetterly

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools. Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.

programming language design and implementation | 2007

Improved error reporting for software that uses black-box components

Jungwoo Ha; Christopher J. Rossbach; Jason V. Davis; Indrajit Roy; Hany E. Ramadan; Donald E. Porter; David L. Chen; Emmett Witchel

An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software becomes more complex and layered. End-users take the cryptic error messages given to them by programsand struggle to fix their problems using search engines and support websites. Developers cannot improve their error messages when they receive an ambiguous or otherwise insufficient error indicator from a black-box software component. We introduce Clarify, a system that improves error reporting by classifying application behavior. Clarify uses minimally invasive monitoring to generate a behavior profile, which is a summary of the programs execution history. A machine learning classifier uses the behavior profile to classify the applications behavior, thereby enabling a more precise error report than the output of the application itself. We evaluate a prototype Clarify system on ambiguous error messages generated by large, modern applications like gcc, La-TeX, and the Linux kernel. For a performance cost of less than 1% on user applications and 4.7% on the Linux kernel, the proto type correctly disambiguates at least 85% of application behaviors that result in ambiguous error reports. This accuracy does not degrade significantly with more behaviors: a Clarify classifier for 81 La-TeX error messages is at most 2.5% less accurate than a classifier for 27 LaTeX error messages. Finally, we show that without any human effort to build a classifier, Clarify can provide nearest-neighbor software support, where users who experience a problem are told about 5 other users who might have had the same problem. On average 2.3 of the 5 users that Clarify identifies have experienced the same problem.

architectural support for programming languages and operating systems | 2009

Maximum benefit from a minimal HTM

Owen S. Hofmann; Christopher J. Rossbach; Emmett Witchel

A minimal, bounded hardware transactional memory implementation significantly improves synchronization performance when used in an operating system kernel. We add HTM to Linux 2.4, a kernel with a simple, coarse-grained synchronization structure. The transactional Linux 2.4 kernel can improve performance of user programs by as much as 40% over the non-transactional 2.4 kernel. It closes 68% of the performance gap with the Linux 2.6 kernel, which has had significant engineering effort applied to improve scalability. We then extend our minimal HTM to a fast, unbounded transactional memory with a novel technique for coordinating hardware transactions and software synchronization. Overflowed transactions run in software, with only a minimal coupling between hardware and software systems. There is no performance penalty for overflow rates of less than 1%. In one instance, at 16 processors and an overflow rate of 4%, performance degrades from an ideal 4.3x to 3.6x.

international symposium on microarchitecture | 2008

MetaTM/TxLinux: Transactional Memory for an Operating System

Hany E. Ramadan; Christopher J. Rossbach; Donald E. Porter; Owen S. Hofmann; Aditya Bhandari; Emmett Witchel

european conference on machine learning | 2006

Cost-Sensitive decision tree learning for forensic classification

Jason V. Davis; Jungwoo Ha; Christopher J. Rossbach; Hany E. Ramadan; Emmett Witchel

In some learning settings, the cost of acquiring features for classification must be paid up front, before the classifier is evaluated. In this paper, we introduce the forensic classification problem and present a new algorithm for building decision trees that maximizes classification accuracy while minimizing total feature costs. By expressing the ID3 decision tree algorithm in an information theoretic context, we derive our algorithm from a well-formulated problem objective. We evaluate our algorithm across several datasets and show that, for a given level of accuracy, our algorithm builds cheaper trees than existing methods. Finally, we apply our algorithm to a real-world system, Clarify. Clarify classifies unknown or unexpected program errors by collecting statistics during program runtime which are then used for decision tree classification after an error has occurred. We demonstrate that if the classifier used by the Clarify system is trained with our algorithm, the computational overhead (equivalently, total feature costs) can decrease by many orders of magnitude with only a slight (<1%) reduction in classification accuracy.

Explore More