Is this you? Create Your Porfile

Peter R. Nuth

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter R. Nuth is active.

Explore More

Publication

Featured researches published by Peter R. Nuth.

international symposium on microarchitecture | 1992

The message-driven processor: a multicomputer processing node with efficient mechanisms

William J. Dally; J.A.S. Fiske; John S. Keen; Richard Lethin; Michael D. Noakes; Peter R. Nuth; R.E. Davison; G. Fyler

The message-driven processor (MDP), a 36-b, 1.1-million transistor, VLSI microcomputer, specialized to operate efficiently in a multicomputer, is described. The MDP chip includes a processor, a 4096-word by 36-b memory, and a network port. An on-chip memory controller with error checking and correction (ECC) permits local memory to be expanded to one million words by adding external DRAM chips. The MDP incorporates primitive mechanisms for communication, synchronization, and naming which support most proposed parallel programming models. The MDP system architecture, instruction set architecture, network architecture, implementation, and software are discussed.<<ETX>>

international conference on computer design | 1992

The J-machine network

Peter R. Nuth; William J. Dally

The structure and implementation of J-Machine network, a 3-D mesh using wormhole routing and virtual channels to provide two network priorities, is described. Each network channel is 9-b wide and operates at 32 MHz. Each J-Machine node contains network routers that guide messages between the six bidirectional channels incident on each node, and a network interface that handles messages originating or terminating at the node. The router is fully synchronous, uses absolute addressing, and performs dimension-order routing. A novel pad design permits the bidirectional channels to reverse direction each cycle without danger of conduction overlap. The J-machine network provides low latency communication fabric for up to 64 k Message Driven Processor nodes.<<ETX>>

high performance computer architecture | 1995

The Named-State Register File: implementation and performance

Peter R. Nuth; William J. Dally

Context switches are slow in conventional processors because the entire processor state must be saved and restored, even if much of the state is not used before the next context switch. This paper introduces the Named-State Register File, a fine-grain associative register file. The NSF uses hardware and software techniques to efficiently manage registers among sequential or parallel procedure activations. The NSF holds more live data per register than conventional register files, and requires much less spill and reload traffic to switch between concurrent contexts. The NSF speeds execution of some sequential and parallel programs by 9% to 17% over alternative register file organizations. The NSF has access time comparable to a conventional register file and only adds 5% to the area of a typical processor chip.<<ETX>>

international conference on computer design | 1991

A mechanism for efficient context switching

Peter R. Nuth; William J. Dally

Context switches are slow in conventional processors because the entire processor state must be saved and restored, even if much of the restored state is not used before the next context switch. This unnecessary data movement is required because of the coarse granularity of binding between names and registers. The context cache is introduced, which binds variable names to individual registers. This allows context switches to be very inexpensive, since registers are only loaded and saved out as needed. Analysis shows that the context cache holds more live data than a multithreaded register file, and supports more tasks without spilling to memory. Circuit simulations show that the access time of a context cache is 7% greater than a conventional register file of the same size.<<ETX>>

Archive | 1994

Named State and Efficient Context Switching

Peter R. Nuth; William J. Dally

Context switches are slow in conventional processors because the entire processor state must be saved and restored, even if much of the restored state is not used before the next context switch. This unnecessary data movement is required because of the coarse granularity of binding between names and registers. In this paper we introduce the Named-State Register File (NSF), which binds variable names to individual registers. This allows context switches to be very inexpensive, since registers are only loaded and saved out as needed. Analysis shows that the Named-State Register File uses registers more efficiently than a multithreaded register file, and supports more tasks without spilling to memory. Circuit simulations indicate that the access time of a NSF is only 6% greater than a conventional register file. The NSF only requires 25% more VLSI chip area to implement than a conventional register file.

international conference on computer design | 1992

The Message Driven Processor: an integrated multicomputer processing element

William J. Dally; Andrew A. Chien; J.A.S. Fiske; G. Fyler; Waldemar Horwat; John S. Keen; Richard Lethin; Michael D. Noakes; Peter R. Nuth; D.S. Wills

A description is given of the Message-Driven Processor (MDP), an integrated multicomputer node. It incorporates a 36-bit integer processor, a memory management unit, a router for a 3D mesh network, a network interface, a 4K*36-bit word static RAM (SRAM), and an ECC dynamic RAM (DRAM) controller on a single 1.1 M-transistor VLSI chip. The MDP is not specialized for a single model of computation. Instead, it incorporates efficient primitive mechanisms for communication, synchronization, and naming. These mechanisms support most proposed parallel programming models. Each processing node of the MIT J-Machine consists of an MDP with 1 Mbit of DRAM.<<ETX>>

international symposium on computer architecture | 1998

Retrospective: the J-machine

William J. Dally; Andrew A. Chien; Stuart Fiske; Waldemar Horwat; Richard Lethin; Michael D. Noakes; Peter R. Nuth; Ellen Spertus; Deborah A. Wallach; D. Scott Wills; Andrew Chang; John S. Keen

1 Computer Systems ’ Department of Computer 3 Department of Electrical Laboratory, Stanford Science, University of Illinois, and Computer Engineering, University Urbana-Champaign Georgia Institute of Technology 4 Netscape Communications 5 Equator Technologies 6 Hewlett Packard Consulting Laboratories 7 Department of Computer 8 DEC, Western Research 9 Silicon Graphics Computer Science, Mills College Laboratory Systems

Computing Systems in Engineering | 1992

The J-Machine: A fine-grain parallel computer

William J. Dally; Andrew A. Chien; R.E. Davison; J.A.S. Fiske; S. Furman; G. Fyler; D.B. Gaunce; Waldemar Horwat; S. Kaneshiro; John S. Keen; Richard Lethin; Michael D. Noakes; Peter R. Nuth; Ellen Spertus; Brian Totty; Deborah A. Wallach; D.S. Wills

Abstract Most modern computers, whether parallel or sequential, are coarse grained. They are composed of physically large nodes with tens of megabytes of memory. Only a small fraction of the silicon area in the machine is devoted to computation. By increasing the ratio of computation area to memory area, fine-grain computers offer the potential of improving cost/performance by several orders of magnitude. To efficiently operate at such a fine grain, however, a machine must provide mechanisms that permit rapid access to global data and fast interaction between nodes. The MIT J-Machine is a fine-grain concurrent computer that provides low-overhead mechanisms for parallel computing. Prototype J-Machines have been operational since July 1991. The J-Machine communication mechanism permits a node to send a message to any other node in the machine in μ s. On message arrival, a task is created and dispatched in μ s. A translation mechanism supports a global virtual address space. These mechanisms efficiently support most proposed models of concurrent computation and allow parallelism to be exploited at a grain size of 10 operations. The hardware is an ensemble of up to 65,536 nodes each containing a 36-bit processor, 4K 36-bit words of on-chip memory, 256K words of DRAM and a router. The nodes are connected by a high-speed three-dimensional mesh network.

ACM Sigplan Lisp Pointers | 1989

A study of LISP on a multiprocessor preliminary version

Peter R. Nuth; Robert H. Halstead

Parallel symbolic computation has attracted considerable interest in recent years. Research groups building multiprocessors for such applications have been frustrated by the lack of data on how symbolic programs run on a parallel machine. This report describes the behavior of Multilisp programs running on a shared memory multiprocessor. Data was collected for a set of application programs on the frequency of different instructions, the type of objects accessed, and where the objects were located in the memory of the multiprocessor. The locality of data references for different multiprocessor organizations was measured. Finally, the effect of different task scheduling strategies on the locality of accesses was studied. This data is summarized here, and compared to other studies of LISP performance on uniprocessors.

Archive | 1992