Ioannis Schoinas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ioannis Schoinas is active.

Explore More

Publication

Featured researches published by Ioannis Schoinas.

architectural support for programming languages and operating systems | 1994

Fine-grain access control for distributed shared memory

Ioannis Schoinas; Babak Falsafi; Alvin R. Lebeck; Steven K. Reinhardt; James R. Larus; David A. Wood

This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of efficient cache-coherent shared memory. This paper focuses on low-cost implementations that require little or no additional hardware. These techniques permit efficient implementation of shared memory on a wide range of parallel systems, thereby providing shared-memory codes with a portability previously limited to message passing. This paper categorizes techniques based on where access control is enforced and where access conflicts are handled. We incorporated three techniques that require no additional hardware into Blizzard, a system that supports distributed shared memory on the CM-5. The first adds a software lookup before each shared-memory reference by modifying the programs executable. The second uses the memorys error correcting code (ECC) as cache-block valid bits. The third is a hybrid. The software technique ranged from slightly faster to two times slower than the ECC approach. Blizzards performance is roughly comparable to a hardware shared-memory machine. These results argue that clusters of workstations or personal computers with networks comparable to the CM-5s will be able to support the same shared-memory interfaces as supercomputers.

conference on high performance computing (supercomputing) | 1994

Application-specific protocols for user-level shared memory

Babak Falsafi; Alvin R. Lebeck; Steven K. Reinhardt; Ioannis Schoinas; Mark D. Hill; James R. Larus; Anne Rogers; David A. Wood

Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an applications communication patterns and memory semantics. This paper presents evidence that this approach can lead to large performance improvements. It shows that application-specific protocols substantially improved the performance of three application programs-appbt, em3d, and barnes-over carefully tuned transparent shared memory implementations. The speed-ups were obtained on Blizzard, a fine-grained DSM system running on a 32-node Thinking Machines CM-5.<<ETX>>

acm sigplan symposium on principles and practice of parallel programming | 1997

Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

Yuanyuan Zhou; Liviu Iftode; Jaswinder Pal Sing; Kai Li; Brian R. Toonen; Ioannis Schoinas; Mark D. Hill; David A. Wood

During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, we not well understood. This paper studies these tradeoffs on a platform that provides access control in hardware but runs coherence protocols in software, We compare the performance of three protocols across four coherence granularities, using 12 applications on a 16-node cluster of workstations. Our results show that no single combination of protocol and granularity performs best for all the applications. The combination of a sequentially consistent (SC) protocol and fine granularity works well with 7 of the 12 applications. The combination of a multiple-writer, home-based lazy release consistency (HLRC) protocol and page granularity works well with 8 out of the 12 applications. For applications that suffer performance losses in moving to coarser granularity under sequential consistency, the performance can usually be regained quite effectively using relaxed protocols, particularly HLRC. We also find that the HLRC protocol performs substantially better than a single-writer lazy release consistent (SW-LRC) protocol at coase granularity for many irregular applications. For our applications and platform, when we use the original versions of the applications ported directly from hardware-coherent shared memory, we find that the SC protocol with 256-byte granularity performs best on average. However, when the best versions of the applications are compared, the balance shifts in favor of HLRC at page granularity.

high-performance computer architecture | 1998

Address translation mechanisms in network interfaces

Ioannis Schoinas; Mark D. Hill

Good network hardware performance is often squandered by overheads for accessing the network interface (NI) within a host. NIs that support user-level messaging avoid frequent operating system (OS) action yet unnecessary copying can still result in low performance. We explore improving application messaging performance by eliminating all unnecessary copies (minimal messaging). For minimal messaging, NIs must support address translation and must do so more richly than has been done in the past. NI address translation should flexibly support higher-level abstractions, map all user space, exploit translation locality, and degrade gracefully, when locality is poor. We classify NI address translation implementations based on where the lookup and the miss handling are performed (CPU or NI). We present alternative designs and we consider how they interact with the OS. We provide simulation results that evaluate the alternative design points and we demonstrate feasibility with a real implementation using Myrinet. We find: NIs need not have hardware lookup structures, as software schemes are fast enough; it is difficult for an NI to handle its own translation misses unless commercial operating systems are substantially modified to view an NI as CPU peer; in the conventional situation where the operating system views the NI as a device, minimal messaging should be used only when the translation is present, while a single-copy protocol is used when it is not; and alternatively one can currently get acceptable performance when the CPU handle misses if the kernel provides very fast trap interfaces but microprocessor and operating system trends may make this alternative less viable in the long run.

Archive | 2004