Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Diana Hecht is active.

Publication


Featured researches published by Diana Hecht.


IEEE Transactions on Parallel and Distributed Systems | 2004

Fault-tolerant distributed shared memory on a broadcast-based architecture

Constantine Katsinis; Diana Hecht

Due to advances in fiber-optics and VLSI technology, interconnection networks that allow multiple simultaneous broadcasts are becoming feasible. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper presents the architecture of one such implementation, called the simultaneous optical multiprocessor exchange bus, and examines the performance of augmented DSM protocols that exploit the natural duplication of data to maintain a recovery memory in each processing node and provide basic fault tolerance. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Under certain conditions, data blocks that are duplicated to maintain the recovery memory are utilized by the underlying DSM protocol, reducing network traffic, and increasing the processor utilization significantly.


network computing and applications | 2001

Simulation experiments of a high-performance RapidIO-based processing architecture

J. Adams; C. Katsinis; W. Rosen; Diana Hecht; V. Adams; H.V. Narravula; S. Sukhtankar; R. Lachenmaier

Describes the results of our simulation analysis of a high-performance processing architecture based on the RapidIO network protocol. RapidIO is a 10-Gb/s, low-latency packet-switched interconnect technology designed for processor-to-processor, processor-to-memory, and processor-to-peripheral interconnects. Two network topologies were simulated, a simple network consisting of an 8-port switch and eight processing nodes and a more extensive network consisting of five 8-port switches and 24 processing nodes. The results indicate that latencies as low as 92 ns for a remote 64-bit read request/response transaction may be achieved in an unloaded single-switch system. The effectiveness of various flow control mechanisms provided by the protocol are also explored, and when used in combination, a 10% increase in link utilization is achieved.


network computing and applications | 2004

A novel switch architecture for high-performance computing and signal processing networks

Satyen Sukhtankar; Diana Hecht; Warren Rosen

This work describes low-latency switch architecture for high performance packet-switched networks. The switch architecture is a combination of input buffers capable of avoiding head-of-line blocking and an internal switch interconnect capable of allowing different input ports to access a single output port simultaneously. The switch was designed for the RapidIO protocol, but provides improved performance in other switched fabrics as well. OPNET Modeler was used to develop models of the proposed switch architecture and to evaluate the performance of the switch for three different network topologies. Models of two standard switch architectures were also developed and simulated for comparison.


international parallel and distributed processing symposium | 2003

Performance analysis of a fault-tolerant distributed-shared memory protocol on the SOME-bus multiprocessor architecture

Diana Hecht; Constantine Katsinis

Interconnection networks allowing multiple simultaneous broadcasts are becoming feasible, mostly due to advances in fiber-optics and VLSI technology. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper summarizes the architecture of one such implementation, the simultaneous optical multiprocessor exchange bus, and examines the performance of an augmented DSM protocol which provides fault tolerance by exploiting the natural DSM replication of data in order to maintain a recovery memory in each processing node. Theoretical and simulation results show that the additional data replication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Data blocks which are duplicated to maintain the recovery memory may be utilized by the regular DSM protocol, reducing network traffic, and increasing the processor utilization significantly.


international parallel and distributed processing symposium | 2004

Fault-tolerant DSM on the SOME-Bus multiprocessor architecture with message combining

Constantine Katsinis; Diana Hecht

Summary form only given. We present a broadcast-based architecture called the SOME-Bus interconnection network, which directly links processor nodes without contention, and can efficiently interconnect several hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other nodes output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the distributed shared memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Backward error recovery fault-tolerance techniques can exploit DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation. Simulation results show that in the SOME-Bus architecture under the DSM paradigm, messages tend to wait at the node output network interface. Consequently, to minimize the effect of increased network traffic, messages can be combined at the node output queue to form a new message containing the payloads of all original messages. We use simulation to examine the effect of such message combining on the performance of SOME-Bus, in the presence of additional traffic due to fault tolerance, and we compare it to similar performance measures of a reduced SOME-Bus network where two nodes share one channel.


network computing and applications | 2003

SOME-Bus-NOW: a Network of Wrkstations with broadcast

Constantine Katsinis; Diana Hecht

Networks of Workstations have been mostly designed using switch-based architectures and programming based on message passing. This paper describes a network of workstations based on the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bits) which is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect several hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other nodes output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the Distributed Shared Memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. This paper examines switch-based networks that maintain high performance under varying degrees of application locality, and compares them to the SOME-Bus, in terms of latency and processor utilization.


international parallel and distributed processing symposium | 2002

Protocols for fault-tolerant distributed-shared-memory on the SOME-bus multiprocessor architecture

Diana Hecht; Constantine Katsinis

The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over one hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other nodes output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the Distributed Shared Memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Backward Error Recovery fault-tolerance techniques can rely on DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation. This paper presents three protocols for fault-tolerant DSM and uses simulation to examine the performance of the protocols on the SOME-Bus multiprocessor architecture.


Concurrency and Computation: Practice and Experience | 2006

The performance of parallel matrix algorithms on a broadcast-based architecture

Constantine Katsinis; Diana Hecht; Ming Zhu; Harsha Narravula

Due to advances in fiber‐optics and very large scale integration (VLSI) technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper summarizes one such multiprocessor architecture called the Simultaneous Optical Multiprocessor Exchange Bus (SOME‐Bus). It also presents enhancements to the network interface and the cache and directory controllers which support cache block combining, capture and prefetch, and allow complete overlap of processing time with the communication time due to compulsory misses. The paper uses two fundamental matrix algorithms to characterize the impact of each enhancement on performance. Cache miss analysis and results from the execution of these programs on a SOME‐Bus simulator show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate due to compulsory misses unaffected. Copyright


international parallel and distributed processing symposium | 2004

A channel caching scheme on an optical bus-based distributed architecture

Ming Zhu; Harsha Narravula; Constantine Katsinis; Diana Hecht

Summary form only given. Reducing the effect of hot spots is increasingly important to gain performance out of modern processor clusters. Traditionally, compiler techniques have been used for static analysis of hot spot patterns in parallel applications. The operating system then performs the optimization to reduce the overhead of hot spots. However, hot spots cannot be avoided due to the dynamic nature of applications. We propose a new hot spot optimization scheme based on a broadcast-based optical interconnection network, the SOME-Bus, where each node has a dedicated broadcast channel to connect with other nodes without any contention. The scheme introduces additional hardware to considerably reduce the latency of hot spot request/acknowledges. Hot spots are assumed to be identifiable either through static analysis, or by a run-time profiler. Our scheme then provides a way to cache these hot spot blocks much closer to the network/channel, thereby providing a very low latency path between the input and the output queues in the network. The technique has been implemented in a SOME-Bus simulator, and verified with popular parallel algorithms like matrix-matrix multiplication. Preliminary results show that the scheme results in the reduction of completion times of applications by up to 24% over a system without channel caching.


international parallel and distributed processing symposium | 2004

Parallel matrix algorithms on a broadcast-based architecture

Constantine Katsinis; Diana Hecht; Ming Zhu; Harsha Narravula

Summary form only given. Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper summarizes one such multiprocessor architecture called the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus). It also presents the design of the network interface and the cache and directory controllers which support cache block combining, capture and prefetch and allow complete overlap of processing time with the communication time due to compulsory misses. The paper uses two fundamental matrix algorithms to characterize the architecture performance. Cache miss analysis and results from the execution of these programs on a SOME-Bus simulator show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate due to compulsory misses unaffected.

Collaboration


Dive into the Diana Hecht's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge