Mark Oskin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Oskin is active.

Explore More

Publication

Featured researches published by Mark Oskin.

architectural support for programming languages and operating systems | 2009

DMP: deterministic shared memory multiprocessing

Joseph Devietti; Brandon Lucia; Luis Ceze; Mark Oskin

Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming. In this paper we make the case for fully deterministic shared memory multiprocessing (DMP). The behavior of an arbitrary multithreaded program on a DMP system is only a function of its inputs. The core idea is to make inter-thread communication fully deterministic. Previous approaches to coping with nondeterminism in multithreaded programs have focused on replay, a technique useful only for debugging. In contrast, while DMP systems are directly useful for debugging by offering repeatability by default, we argue that parallel programs should execute deterministically in the field as well. This has the potential to make testing more assuring and increase the reliability of deployed multithreaded software. We propose a range of approaches to enforcing determinism and discuss their implementation trade-offs. We show that determinism can be provided with little performance cost using our architecture proposals on future hardware, and that software-only approaches can be utilized on existing systems.

international symposium on computer architecture | 1998

Active pages: a computation model for intelligent memory

Mark Oskin; Frederic T. Chong; Timothy Sherwood

Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the memory system. An Active Page consists of a page of data and a set of associated functions which can operate upon that data. We describe an implementation of Active Pages on RADram (Reconfigurable Architecture DRAM), a memory system based upon the integration of DRAM and reconfigurable logic. Results from the SimpleScalar simulator [BA97] demonstrate up to 1000X speedups on several applications using the RADram system versus conventional memory systems. We also explore the sensitivity of our results to implementations in other memory technologies.

international symposium on microarchitecture | 2002

Using modern graphics architectures for general-purpose computing: a framework and analysis

Chris J. Thompson; Sahngyun Hahn; Mark Oskin

Recently, graphics hardware architectures have begun to emphasize versatility, offering rich new ways to programmatically reconfigure the graphics pipeline. In this paper we explore whether current graphics architectures can be applied to problems where general-purpose vector processors might traditionally be used. We develop a programming framework and apply it to a variety of problems, including matrix multiplication and 3-SAT. Comparing the speed of our graphics card implementations to standard CPU implementations, we demonstrate startling performance improvements in many cases, as well as room for improvement in others. We analyze the bottlenecks and propose minor extensions to current graphics architectures which would improve their effectiveness for solving general-purpose problems. Based on our results and current trends in microarchitecture, we believe that efficient use of graphics hardware will become increasingly important to high-performance computing on commodity hardware.

international symposium on microarchitecture | 2007

RAMP: Research Accelerator for Multiple Processors

John Wawrzynek; David A. Patterson; Mark Oskin; Shih-Lien Lu; Christoforos E. Kozyrakis; James C. Hoe; Derek Chiou; Krste Asanovic

The RAMP projects goal is to enable the intensive, multidisciplinary innovation that the computing industry will need to tackle the problems of parallel processing. RAMP itself is an open-source, community-developed, FPGA-based emulator of parallel architectures. its design framework lets a large, collaborative community develop and contribute reusable, composable design modules. three complete designs - for transactional memory, distributed systems, and distributed-shared memory - demonstrate the platforms potential.

international symposium on computer architecture | 2000

HLS: combining statistical and symbolic simulation to guide microprocessor designs

Mark Oskin; Frederic T. Chong; Matthew K. Farrens

As microprocessors continue to evolve, many optimizations reach a point of diminishing returns. We introduce HLS, a hybrid processor simulator which uses statistical models and symbolic execution to evaluate design alternatives. This simulation methodology allows for quick and accurate contour maps to be generated of the performance space spanned by design parameters. We validate the accuracy of HLS through correlation with existing cycle-by-cycle simulation techniques and current generation hardware. We demonstrate. The power of HLS by exploring design spaces defined by two parameters: code properties and value prediction. These examples motivate how HLS can be used to set design goals and individual component performance targets.

ACM Transactions on Computer Systems | 2007

The WaveScalar architecture

Steven Swanson; Andrew Schwerin; Martha Mercaldi; Andrew Petersen; Andrew Putnam; Ken Michelson; Mark Oskin; Susan J. Eggers

Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential memory semantics that imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application, or even a single function. To execute WaveScalar programs, we have designed a scalable, tile-based processor architecture called the WaveCache. As a program executes, the WaveCache maps the programs instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new ones in their place. The instructions communicate directly with one another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication. This article presents the WaveScalar instruction set and evaluates a simulated implementation based on current technology. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse-grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7--14 multiply-accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from Spec2000 and speed it up by 9x compared to the serial version.

IEEE Computer | 2002

A practical architecture for reliable quantum computers

Mark Oskin; Frederic T. Chong; Isaac L. Chuang

Quantum computation has advanced to the point where system-level solutions can help close the gap between emerging quantum technologies and real-world computing requirements. Empirical studies of practical quantum architectures are just beginning to appear in the literature. Elementary architectural concepts are still lacking: How do we provide quantum storage, data paths, classical control circuits, parallelism, and system integration? And, crucially, how can we design architectures to reduce error-correction overhead? The authors describe a proposed architecture that uses code teleportation, quantum memory refresh units, dynamic compilation of quantum programs, and scalable error correction to achieve system-level efficiencies. They assert that their work indicates the underlying technologys reliability is crucial; practical architectures will require quantum technologies with error rates between 10/sup -6/ and 10/sup -9/.

ACM Journal on Emerging Technologies in Computing Systems | 2006

Architectural implications of quantum computing technologies

Rodney Van Meter; Mark Oskin

In this article we present a classification scheme for quantum computing technologies that is based on the characteristics most relevant to computer systems architecture. The engineering trade-offs of execution speed, decoherence of the quantum states, and size of systems are described. Concurrency, storage capacity, and interconnection network topology influence algorithmic efficiency, while quantum error correction and necessary quantum state measurement are the ultimate drivers of logical clock speed. We discuss several proposed technologies. Finally, we use our taxonomy to explore architectural implications for common arithmetic circuits, examine the implementation of quantum error correction, and discuss cluster-state quantum computation.

IEEE Journal of Selected Topics in Quantum Electronics | 2003

Toward a scalable, silicon-based quantum computing architecture

Dean Copsey; Mark Oskin; Francois Impens; Tzvetan Metodiev; Andrew W. Cross; Frederic T. Chong; Isaac L. Chuang; John Kubiatowicz

Advances in quantum devices have brought scalable quantum computation closer to reality. We focus on the system-level issues of how quantum devices can be brought together to form a scalable architecture. In particular, we examine promising silicon-based proposals. We discover that communication of quantum data is a critical resource in such proposals. We find that traditional techniques using quantum SWAP gates are exponentially expensive as distances increase and propose quantum teleportation as a means to communicate data over longer distances on a chip. Furthermore, we find that realistic quantum error-correction circuits use a recursive structure that benefits from using teleportation for long-distance communication. We identify a set of important architectural building blocks necessary for constructing scalable communication and computation. Finally, we explore an actual layout scheme for recursive error correction, and demonstrate the exponential growth in communication costs with levels of recursion, and that teleportation limits those costs.

ieee hot chips symposium | 2006

Research accelerator for multiple processors

David A. Patterson; Arvind; Krste Asanovic; Derek Chiou; James C. Hoe; Christos Kozyrakis; Shih-Lien Lu; Mark Oskin; Jan M. Rabaey; John Wawrzynek

This article consists of a collection of slides from the authors conference presentation on RAMP, or research acclerators for multiple processors. Some of the specific topics discussed include: system specifications and architecture; uniprocessor performance capabilities; RAMP hardware and description language features; RAMP applications development; storage capabilities; and future areas of technological development.

Explore More