Charles L. Seitz
California Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Charles L. Seitz.
international symposium on microarchitecture | 1995
Nanette J. Boden; Danny Cohen; Robert E. Felderman; Alan E. Kulawik; Charles L. Seitz; Jakov Seizovic; Wen-King Su
The Myrinet local area network employs the same technology used for packet communication and switching within massively parallel processors. In realizing this distributed MPP network, we developed specialized communication channels, cut-through switches, host interfaces, and software. To our knowledge, Myrinet demonstrates the highest performance per unit cost of any current LAN. >
Distributed Computing | 1986
William J. Dally; Charles L. Seitz
The torus routing chip (TRC) is a selftimed chip that performs deadlock-freecut-through routing ink-aryn-cube multiprocessor interconnection networks using a new method of deadlock avoidance calledvirtual channels. A prototype TRC with byte wide self-timed communication channels achieved on first silicon a throughput of 64 Mbits/s in each dimension, about an order of magnitude better performance than the communication networks used by machines such as the Caltech Cosmic Cube or Intel iPSC. The latency of the cut-through routing of only 150 ns per routing step largely eliminates message locality considerations in the concurrent programs for such machines. The design and testing of the TRC as a self-timed chip was no more difficult than it would have been for a synchronous chip.
hypercube concurrent computers and applications | 1988
Charles L. Seitz; William C. Athas; Charles M. Flaig; Alain J. Martin; Jakov Seizovic; Craig S. Steele; Wen-King Su
During the period following the completion of the Cosmic Cube experiment [1], and while commercial descendants of this first-generation multicomputer (message-passing concurrent computer) were spreading through a community that includes many of the attendees of this conference, members of our research group were developing a set of ideas about the physical design and programming for the second generation of medium-grain multicomputers. Our principal goal was to improve by as much as two orders of magnitude the relationship between message-passing and computing performance, and also to make the topology of the message-passing network practically invisible. Decreasing the communication latency relative to instruction execution times extends the application span of multicomputers from easily partitioned and distributed problems (eg, matrix computations, PDE solvers, finite element analysis, finite difference methods, distant or local field many-body problems, FFTs, ray tracing, distributed simulation of systems composed of loosely coupled physical processes) to computing problems characterized by “high flux” [2] or relatively fine-grain concurrent formulations [3, 4] (eg, searching, sorting, concurrent data structures, graph problems, signal processing, image processing, and distributed simulation of systems composed of many tightly coupled physical processes). Such applications place heavy demands on the message-passing network for high bandwidth, low latency, and non-local communication. Decreased message latency also improves the efficiency of the class of applications that have been developed on first-generation systems, and the insensitivity of message latency to process placement simplifies the concurrent formulation of application programs. Our other goals included a streamlined and easily layered set of message primitives, a node operating system based on a reactive programming model, open interfaces for accelerators and peripheral devices, and node performance improvements that could be achieved economically by using the same technology employed in contemporary workstation computers. By the autumn of 1986, these ideas had become sufficiently developed, molded together, and tested through simulation to be regarded as a complete architectural design. We were fortunate that the Ametek Computer Research Division was ready and willing to work with us to develop this system as a commercial product. The Ametek Series 2010 multicomputer is the result of this joint effort.
ACM Sigarch Computer Architecture News | 1991
John Ngai; Charles L. Seitz
Message-passing concurrent computers, also known as multicomputers, such as the Caltech Cosmic Cube [47] and its commercial descendents, consist of many computing nodes that interact with each other by sending and receiving messages over communication channels between the nodes. The communication networks of the second-generation machines, such as the Symult Series 2010 and the Intel iPSC2 [2], employ an oblivious wormhole-routing technique that guarantees deadlock freedom. The network performance of this highly evolved oblivious technique has reached a limit of being capable of delivering, under random traffic, a stable maximum sustained throughput of ~~45 to 50% of the limit set by the network bisection bandwidth, while maintaining acceptable network latency. This thesis examines the possibility of performing adaptive routing as an approach to further improving upon the performance and reliability of these networks. In an adaptive multipath routing scheme, message trajectories are no longer deterministic, but are continuously perturbed by local message loading. Message packets will tend to follow their shortest-distance routes to destinations in normal traffic loading, but can be detoured to longer but less-loaded routes as local congestion occurs. A simple adaptive cut-through packet-switching framework is described, and a number of fundamental issues concerning the theoretical feasibility of the adaptive approach are studied. Freedom of communication deadlock is achieved by following a coherent channel protocol and by applying voluntary misrouting as needed. Packet deliveries are assured by resolving channel-access conflicts according to a priority assignment. Fairness of network access is assured either by sending round-trip packets or by having each node follow a local injection-synchronization protocol. The performance behavior of the proposed adaptive cut-through framework is studied with stochastic modeling and analysis, as well as through extensive simulation experiments for the 2D and 3D rectilinear networks. Theoretical bounds on various average network-performance metrics are derived for these rectilinear networks. These bounds provide a standard frame of reference for interpreting the performance results. In addition to the potential gain in network performance, the adaptive approach offers the potential for exploiting the inherent path redundancy found in richly connected networks in order to perform fault-tolerant routing. Two convexity-related notions are introduced to characterize the conditions under which our adaptive routing formulation is adequate to provide fault-tolerant routing, with minimal change in routing hardware, The effectiveness of these notions is studied through extensive simulations, The 2D octagonal-mesh network is suggested; this displays excellent fault-tolerant potential under the adaptive routing framework. Both performance and reliability behaviors of the octagonal mesh are studied in detail. A number of
Proceedings of the International Conference on Future Tendencies in Computer Science, Control and Applied Mathematics | 1992
Charles L. Seitz
Commercial medium-grain multicomputers aimed at ultra-supercomputer performance are pursuing a less profitable scaling track than fine-grain multicomputers. The Caltech Mosaic C is an experimental, fine-grain multicomputer that employs single-chip nodes and advanced packaging technology to demonstrate the performance/cost advantages of the fine-grain-multicomputer architecture. Each Mosaic node includes 64KB of memory, an 11MIPS processor, a packet interface, and a router. The nodes are tied together with a 60MBytes/s, two-dimensional, routing-mesh network. The compilation-based programming system allows fine-grain, reactive-process, message-passing programs to be expressed in an extension of C++, and the runtime system performs automatic, distributed management of system resources. Mosaic components and programming tools have already been used by another project to implement the 400Mbits/s ATOMIC local-area network, and a 16K-node machine is under construction at Caltech to explore the programmability and application span of the architecture for large-scale computing problems.
Nuclear Physics | 1983
Eugene Brooks; Geoffrey C. Fox; Steve W. Otto; Mohit Randeria; Bill Athas; Erik P. DeBenedictis; Mike Newton; Charles L. Seitz
Abstract We have calculated the mass of the 0 + glueball in SU(2) pure gauge theory in 4 dimensions, with very high statistics. The computation was done on an array of microprocessors with nearest-neighbor connections which run concurrently. We discuss, in detail, the implementation of the pure gauge algorithm for SU(2) and SU(3) and also the algorithm for calculating arbitrarily shaped Wilson loops on the array. The extension of these algorithms to the inclusion of dynamical fermions is also discussed. Finally, we present the results of our variational calculation of glueball masses which are in agreement with published results.
Archive | 1989
John Ngai; Charles L. Seitz
Multicomputer Networks Message-passing concurrent computers, more commonly known as multicomputers, such as the Caltech Cosmic Cube [1] and its commercial descendents, consist of many computing nodes that interact with each other by sending and receiving messages over communication channels between the nodes [2]. The existing communication networks of the second-generation machines such as the Ametek 2010 employ an oblivious wormhole routing technique [6,7] which guarantees deadlock freedom. The network performance of these highly evolved oblivious techniques have reached a limit of being as fast as physically possible while capable of delivering, under random traffic, a stable maximum sustained throughput of ≈ 45 to 50% of the limit set by the network bisection bandwidth. Any further improvements on these networks will require an adaptive utilization of available network bandwidth to diffuse local congestions.
Archive | 1989
Charles L. Seitz
Just as designers of concurrent programs and algorithms for multiple-process programs with shared variables may find the analysis or portrayal of the behavior of these programs clarified by using a simplified model of the shared-memory multiprocessor (eg, the PRAM), designers of concurrent programs and algorithms for multiple-process programs that operate by message passing can benefit from using a simplified model of the message-passing multicomputer. My students and I have developed informally what we believe is a reasonably pleasing and physically credible model of the execution of reactive message-passing programs. This sweep model is described in detail in [1], and its essentials are summarized in [2].
Interconnection networks for high-performance parallel computers | 1994
William J. Dally; Charles L. Seitz
Archive | 1988
Charles M. Flaig; Charles L. Seitz