Is this you? Create Your Porfile

Volker Strumpen

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Volker Strumpen is active.

Explore More

Publication

Featured researches published by Volker Strumpen.

international symposium on microarchitecture | 2002

The Raw microprocessor: a computational fabric for software circuits and general-purpose programs

Michael Bedford Taylor; Jason Kim; Jason Miller; David Wentzlaff; Fae Ghodrat; Ben Greenwald; Henry Hoffman; Paul Johnson; Jaewook Lee; Walter Lee; Albert Ma; Arvind Saraf; Mark Seneski; Nathan Shnidman; Volker Strumpen; Matthew I. Frank; Saman P. Amarasinghe; Anant Agarwal

Wire delay is emerging as the natural limiter to microprocessor scalability. A new architectural approach could solve this problem, as well as deliver unprecedented performance, energy efficiency and cost effectiveness. The Raw microprocessor research prototype uses a scalable instruction set architecture to attack the emerging wire-delay problem by providing a parallel, software interface to the gate, wire and pin resources of the chip. An architecture that has direct, first-class analogs to all of these physical resources will ultimately let programmers achieve the maximum amount of performance and energy efficiency in the face of wire delay.

international symposium on computer architecture | 2004

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Michael Bedford Taylor; James Psota; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Matthew I. Frank; Saman P. Amarasinghe; Anant Agarwal; Walter Lee; Jason E. Miller; David Wentzlaff; Ian Rudolf Bratt; Ben Greenwald; Henry Hoffmann; Paul Johnson; Jason Kim

This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally-exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBMs 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raws ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2/spl times/ for sequential applications with a very low degree of ILP, about 2/spl times/ to 9/spl times/ better for higher levels of ILP, and 10/spl times/-100/spl times/ better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.

international solid-state circuits conference | 2003

A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network

Michael Bedford Taylor; Jang Kim; Jason Miller; David Wentzlaff; Fae Ghodrat; Ben Greenwald; Henry Hoffman; Paul Johnson; Walter Lee; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Saman P. Amarasinghe; Anant Agarwal

This microprocessor explores an architectural solution to scalability problems in scalar operand networks. The 0.15/spl mu/m 6M process, 331 mm/sup 2/ research prototype issues 16 unique instructions per cycle and uses an on-chip point-to-point scalar operand network to transfer operands among distributed functional units.

ieee international symposium on fault tolerant computing | 1997

Portable checkpointing for heterogeneous architectures

Balkrishna Ramkumar; Volker Strumpen

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2fte compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies.

parallel computing | 1993

Efficient parallel computing in distributed workstation environments

Clemens H. Cap; Volker Strumpen

Abstract The typical workstation in a LAN is idle for long periods of time. Within the concept of a hypercomputer this free, distributed computing power can be placed at the disposal of the user. The main problem with this approach is the permanently changing load situation in the network. We show that heterogeneous partitioning with respect to the load situation at startup and dynamic load balancing throughout the entire computation are essential techniques for obtaining high efficiency with the hypercomputer approach. We describe a parallel programming platform called THE PARFORM, which supports these two features and therefore proves faster than related approaches. Performance measurements and a scalability model for an explicit finite difference solver of a partial differential equation conclude the paper.

international conference on parallel and distributed systems | 1994

Exploiting communication latency hiding for parallel network computing: model and analysis

Volker Strumpen; Thomas L. Casavant

Very large problems with high resource requirements of both computation and communication could be tackled with large numbers of workstations. However for LAN-based networks, contention becomes a limiting factor whereas latency appears to limit communication for WAN-based networks, nominally the Internet. We describe a model to analyze the gain of communication latency hiding by overlapping computation and communication. This model illustrates the limitations and opportunities of communication latency hiding for improving speedup of parallel computations that can be structured appropriately. Experiments show that latency hiding techniques increase the feasibility of parallel computing in high-latency networks of workstations across the Internet as well as in multiprocessor systems.

Software - Practice and Experience | 1995

Coupling hundreds of workstations for parallel molecular sequence analysis

Volker Strumpen

We present a highly scalable approach to distributed parallel computing on workstations in the Internet which provides significant speed‐up to molecular biology sequence analysis. Recent developments show that smaller numbers of workstations connected via a local area network can be used efficiently for parallel computing. This work emphasizes scalability with respect to the number of workstations employed. We show that a massively parallel approach using several hundred workstations, dispersed over all continents, can successfully be applied for solving problems with low requirements on communication bandwidth. We calculated the optimal local alignment scores between a single genetic sequence and all sequences of a genetic sequence database using the ssearch code that is well known among molecular biologists. In a heterogeneous network with more than 800 workstations this job terminated after several minutes, in contrast to several days it would have taken on a single machine.

international conference on parallel processing | 1996

Software-based communication latency hiding for commodity workstation networks

Volker Strumpen

A variety of latency hiding techniques has been investigated at the hardware level. However, except multithreading, which may require substantial program structuring effort, other software-based latency hiding methods have not been investigated. In this paper, we consider design alternatives for latency hiding other than multithreading. Furthermore, we present experimental evidence for the validity of a new technique for software-based communication latency hiding for commodity workstation networks: Up to 89 percent of useful computational power can be squeezed out of a workstation CPU while communicating with TCP/IP via an Ethernet and almost 90 percent while communicating across the Internet.

Journal of Parallel and Distributed Computing | 2005

A collision model for randomized routing in fat-tree networks

Volker Strumpen; Arvind Krishnamurthy

We present a proof that in a model of a fat-tree network with n processing nodes m=0. Unlike previously applied proof methods, we use an approximating model for the collision behavior of the network amenable to concise yet simple theoretical analysis. We justify the accuracy of the approximation by means of behavioral simulations based on a gate-level implementation of a fat-tree network.

ieee international conference on high performance computing data and analytics | 1995

Implementing communication latency hiding in high-latency computer networks

Volker Strumpen; Thomas L. Casavant

We present a latency hiding protocol for asynchronous message passing in UNIX environments. With this protocol distributed parallel computing can be utilized to solve applications, which can be structured such that useful computation overlaps communication, in a more efficient way than possible with current standard technologies. To maintain portability our protocol is layered on top of the Berkeley socket interface and the TCP/IP protocol. We present experimental data that validate our model on latency hiding and demonstrate the capability of our implementation.

Explore More