Seth Copen Goldstein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seth Copen Goldstein is active.

Explore More

Publication

Featured researches published by Seth Copen Goldstein.

international symposium on computer architecture | 1992

Active messages: a mechanism for integrated communication and computation

Thorsten von Eicken; David E. Culler; Seth Copen Goldstein; Klaus E. Schauser

The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages is desirable and we outline a range of enhancements to mainstream processors.

conference on high performance computing (supercomputing) | 1993

Parallel programming in Split-C

Arvind Krishnamurthy; David E. Culler; Andrea C. Dusseau; Seth Copen Goldstein; Steven S. Lumetta; T. von Eicken; Katherine A. Yelick

The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language in optimizing parallel programs. Split-C provides a global address space with a clear concept of locality and unusual assignment operators. These are used as tools to reduce the frequency and cost of remote access. The language allows a mixture of shared memory, message passing, and data parallel programming styles while providing efficient access to the underlying machine. The authors demonstrate the basic language concepts using regular and irregular parallel programs and give performance results for various stages of program optimization.

international symposium on computer architecture | 1999

PipeRench: a co/processor for streaming multimedia acceleration

Seth Copen Goldstein; Herman Schmit; Matthew Moe; Mihai Budiu; Srihari Cadambi; R. Reed Taylor; Ronald Laufer

Future computing workloads will emphasize an architectures ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes a novel reconfigurable fabric architecture, PipeRench, optimized to accelerate these types of computations. PipeRench enables fast, robust compilers, supports forward compatibility, and virtualizes configurations, thus removing the fixed size constraint present in other fabrics. For the first time we explore how the bit-width of processing elements affects performance and show how the PipeRench architecture has been optimized to balance the needs of the compiler against the realities of silicon. Finally, we demonstrate extreme performance speedup on certain computing kernels (up to 190x versus a modern RISC processor), and analyze how this acceleration translates to application speedup.

IEEE Computer | 2005

Programmable matter

Seth Copen Goldstein; Jason Campbell; Todd C. Mowry

In the past 50 years, computers have shrunk from room-size mainframes to lightweight handhelds. This fantastic miniaturization is primarily the result of high-volume nanoscale manufacturing. While this technology has predominantly been applied to logic and memory, its now being used to create advanced microelectromechanical systems using both top-down and bottom-up processes. One possible outcome of continued progress in high-volume nanoscale assembly is the ability to inexpensively produce millimeter-scale units that integrate computing, sensing, actuation, and locomotion mechanisms. A collection of such units can be viewed as a form of programmable matter.

Journal of Parallel and Distributed Computing | 1993

TAM—a compiler controlled threaded abstract machine

David E. Culler; Seth Copen Goldstein; Klaus E. Schauser; Thorsten von Eicken

Abstract The Threaded Abstract Machine (TAM) refines dataflow execution models to address the critical constraints that modern parallel architectures place on the compilation of general-purpose parallel programming languages. TAM defines a self-scheduled machine language of parallel threads, which provides a path from data-flow-graph program representations to conventional control flow. The most important feature of TAM is the way it exposes the interaction between the handling of asynchronous message events, the scheduling of computation, and the utilization of the storage hierarchy. This paper provides a complete description of TAM and codifies the model in terms of a pseudo machine language TL0. Issues in compilation from a high level parallel language to TL0 are discussed in general and specifically in regard to the Id90 language. The implementation of TL0 on the CM-5 multiprocessor is explained in detail. Using this implementation, a cost model is developed for the various TAM primitives. The TAM approach is evaluated on sizable Id90 programs on a 64 processor system. The scheduling hierarchy of quanta and threads is shown to provide substantial locality while tolerating long latencies. This allows the average thread scheduling cost to be extremely low.

international test conference | 2003

Defect tolerance at the end of the roadmap

Mahim Mishra; Seth Copen Goldstein

As feature sizes shrink closer to single digit nanometer dimensions, defect tolerance will become increasingly important. This is true whether the chips are manufactured using top-down methods, such as photolithography, or bottom-up assembly processes such as Chemically Assembled Electronic Nanotechnology (CAEN). In this chapter, we examine the consequences of this increased rate of defects, and describe a defect tolerance methodology centered around reconfigurable devices, a scalable testing method, and dynamic place-and-route. We summarize some of our own results in this area as well as those of others, and enumerate some future research directions required to make nanometer-scale computing a reality.

architectural support for programming languages and operating systems | 2004

Spatial computation

Mihai Budiu; Girish Venkataramani; Tiberiu Chelcea; Seth Copen Goldstein

This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units.In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs; (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x worst-case).

international conference on computer aided design | 2002

Molecular electronics: devices, systems and tools for gigagate, gigabit chips

Michael Butts; Andrée DeHon; Seth Copen Goldstein

New electronics technologies are emerging which may carry us beyond the limits of lithographic processing down to molecular-scale feature sizes. Devices and interconnects can be made from a variety of molecules and materials including bistable and switchable organic molecules, carbon nanotubes, and, single-crystal semiconductor nanowires. They can be self-assembled into organized structures and attached onto lithographic substrates. This tutorial reviews emerging molecular-scale electronics technology for CAD and system designers and highlights where ICCAD research can help support this technology.

Journal of Parallel and Distributed Computing | 1996

Lazy Threads

Seth Copen Goldstein; Klaus E. Schauser; David E. Culler

In this paper, we describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.

field programmable gate arrays | 1998

Managing pipeline-reconfigurable FPGAs

Srihari Cadambi; Jeffrey Weener; Seth Copen Goldstein; Herman Schmit; Donald E. Thomas

While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to significantly reduce both these costs. In this paper we describe the benefits of hardware virtualization, and show how it can be achieved using a combination of pipeline reconfiguration and run-time scheduling of both configuration streams and data streams. The result is PipeRench, an architecture that supports robust compilation and provides forward compatibility. Our preliminary performance analysis predicts that PipeRench will outperform commercial FPGAs and DSPs in both overall performance and in performance per mm2.

Explore More