Matthew T. O'Keefe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew T. O'Keefe is active.

Explore More

Publication

Featured researches published by Matthew T. O'Keefe.

ieee conference on mass storage systems and technologies | 1999

A 64-bit, shared disk file system for Linux

Kenneth W. Preslan; Andrew P. Barry; Jonathan Brassow; Grant Erickson; Erling Nygaard; Christopher Sabol; Steven R. Soltis; David Teigland; Matthew T. O'Keefe

In computer systems today, speed and responsiveness is often determined by network and storage subsystem performance. Faster, more scalable networking interfaces like Fibre Channel and Gigabit Ethernet provide the scaffolding from which higher performance implementations may be constructed, but new thinking is required about how machines interact with network-enabled storage devices. We have developed a Linux file system called GFS (the Global File System) that allows multiple Linux machines to access and share disk and tape devices on a Fibre Channel or SCSI storage network. We plan to extend GFS by transporting packetized SCSI commands over IP so that any GFS-enabled Linux machine can access shared network devices. GFS will perform well as a local file system, as a traditional network file system running over LP, and as a high-performance cluster file system running over storage networks like Fibre Channel. GFS device sharing provides a key cluster-enabling technology for Linux, helping to bring the availability, scalability, and load balancing benefits of clustering to Linux. Our goal is to develop a scalable, (in number of clients and devices, capacity, connectivity, and bandwidth) server-less file system that integrates IF-based network attached storage (NAS) and Fibre-Channel-based storage area networks (SAN). We call this new architecture Storage Area InterNetworking (SAINT). It exploits the speed and device scalability of SAN clusters, and provides the client scalability and network interoperability of NAS appliances. Our Linux port shows that the GFS architecture is portable across different platforms, and we are currently working on a port to NetBSD. The GFS code is open source (GPL) software freely available on the Internet at http://gfs.lcse.umn.edu.

programming language design and implementation | 1997

Spill code minimization via interference region spilling

Peter Bergner; Peter Dahl; David Robert Engebretsen; Matthew T. O'Keefe

Many optimizing compilers perform global register allocation using a Chaitin-style graph coloring algorithm. Live ranges that cannot be allocated to registers are spilled to memory. The amount of code required to spill the live range depends on the spilling heuristic used. Chaitins spilling heuristic offers some guidance in reducing the amount of spill code produced. However, this heuristic does not allow the partial spilling of live ranges and the reduction in spill code is limited to a local level. In this paper, we present a global technique called interference region spilling that improves the spilling granularity of any local spilling heuristic. Our technique works above the local spilling heuristic, limiting the normal insertion of spill code to a portion of each spilled live range. By partially spilling live ranges, we can achieve large reductions in dynamically executed spill code; up to 75% in some cases and an average of 33.6% across the benchmarks tested.

parallel computing | 1995

A comparison of data-parallel and message-passing versions of the Miami Isopycnic Coordinate Ocean Model (MICOM)

Rainer Bleck; Sumner Dean; Matthew T. O'Keefe; Aaron Sawdey

Abstract A two-pronged effort to convert a recently developed ocean circulation model written in Fortran-77 for execution on massively parallel computers is described. A data-parallel version was developed for the CM-5 manufactured by Thinking Machines, Inc., while a message-passing version was developed for both the Cray T3D and the Silicon Graphics ONYX workstation. Since the time differentiation scheme in the ocean model is fully explicit and does not require solution of elliptic partial differential equations, adequate machine utilization has been achieved without major changes to the original algorithms. We developed a partitioning strategy for the message passing version that significantly reduces memory requirements and increases model speed. On a per-node basis (a T3D node is one Alpha processor, a CM-5 node is one Sparc chip and four vector units), the T3D and CM-5 are found to execute our “large” model version consisting of 511 × 511 horizontal mesh points at roughly the same speed.

ieee conference on mass storage systems and technologies | 2010

High performance solid state storage under Linux

Eric Seppanen; Matthew T. O'Keefe; David J. Lilja

Solid state drives (SSDs) allow single-drive performance that is far greater than disks can produce. Their low latency and potential for parallel operations mean that they are able to read and write data at speeds that strain operating system I/O interfaces. Additionally, their performance characteristics expose gaps in existing benchmarking methodologies. We discuss the impact on Linux system design of a prototype PCI Express SSD that operates at least an order of magnitude faster than most drives available today. We develop benchmarking strategies and focus on several areas where current Linux systems need improvement, and suggest methods of taking full advantage of such high-performance solid state storage. We demonstrate that an SSD can perform with high throughput, high operation rates, and low latency under the most difficult conditions. This suggests that high-performance SSDs can dramatically improve parallel I/O performance for future high performance computing (HPC) systems.

The Journal of Supercomputing | 1992

Static scheduling for barrier MIMD architectures

Henry G. Dietz; Abderrazek Zaafrani; Matthew T. O'Keefe

In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.

ieee conference on mass storage systems and technologies | 1999

Device Locks: mutual exclusion for storage area networks

Kenneth W. Preslan; Steven R. Soltis; C.J. Sabol; Matthew T. O'Keefe; G. Houlder; J. Coomes

Device Locks are mechanisms used in distributed environments to facilitate mutual exclusion of shared resources. They can further be used to maintain coherence of data that is cached in several locations. The locks are implemented on the storage devices and accessed with the SCSI device lock command, Dlock. The paper presents the Dlock command and discusses how it can be used as a mutual exclusion device for storage area networks and shared disk file systems. Methods for the recovery of a Dlock held by a failed initiator are also presented. The Dlock command is in the process of being standardized as part of the SCSI 3 specification.

IEEE Transactions on Parallel and Distributed Systems | 1994

On loop transformations for generalized cycle shrinking

Weijia Shang; Matthew T. O'Keefe; José A. B. Fortes

This paper describes several loop transformation techniques for extracting parallelism from nested loop structures. Nested loops can then be scheduled to run in parallel so that execution time is minimized. One technique is called selective cycle shrinking, and the other is called true dependence cycle shrinking. It is shown how selective shrinking is related to linear scheduling of nested loops and how true dependence shrinking is related to conflict-free mappings of higher dimensional algorithms into lower dimensional processor arrays. Methods are proposed in this paper to find the selective and true dependence shrinkings with minimum total execution time by applying the techniques of finding optimal linear schedules and optimal and conflict-free mappings proposed by W. Shang and A.B. Fortes. >

conference on high performance computing supercomputing | 1989

Static synchronization beyond VLIW

Henry G. Dietz; Thomas Schwederski; Matthew T. O'Keefe; Abderrazek Zaafrani

A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between “processes” is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of what kinds of operations can be parallelized. In this paper, we propose a new kind of architecture — the “static barrier MIMD” or SBM — which can be viewed as a further generalization of the parallel execution abilities of static synchronization machines. Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of loops, subprogram calls, and variable-execution-time instructions. However, instead of using barriers as a synchronization mechanism, the proposed barrier hardware is used to impose static timing constraints. Since the compiler can know at compile time all instructions which each processor could be executing when a particular conceptual synchronization operation is needed, it can resolve most synchronizations by using VLIW-like compile-time instruction scheduling — without use of a runtime synchronization mechanism. The effect is that the proposed barrier mechanism greatly extends the generality of efficient static scheduling without adding a significant hardware cost. Traditional, directed-synchronization, MIMD architectures are more flexible than barrier MIMDs, but the benefits of static scheduling make barrier MIMDs superior for fine to medium grain parallelism. Both the barrier architecture and the supporting compiler technology are discussed in this paper.

IEEE Transactions on Parallel and Distributed Systems | 1993

Loop coalescing and scheduling for barrier MIMD architectures

Matthew T. O'Keefe; Henry G. Dietz

Barrier MIMDs are asynchronous multiple instruction stream, multiple data stream architectures capable of parallel execution of variable execution time instructions and arbitrary control flow (e.g., while loops and calls); however, they differ from conventional MIMDs in that the need for run-time synchronization is significantly reduced. The authors consider the problem of scheduling nested loop structures on a barrier MIMD. The basic approach employs loop coalescing, a technique for transforming a multiply-nested loop into a single loop. Loop coalescing is extended to nested triangular loops, in which inner loop bounds are functions of outer loop indices. In addition, a more efficient scheme to generate the original loop indices from the coalesced index is proposed for the case of constant loop bounds. These results are general, and can be applied to extend previous work using loop coalescing techniques. The authors concentrate on using loop coalescing for scheduling barrier MIMDs, and show how previous work in loop transformations and linear scheduling theory can be applied to this problem. >

Scientific Programming | 1995

The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

Matthew T. O'Keefe; Terence John Parr; Kevin Edgar; Steve Anderson; Paul R. Woodward; Hank Dietz

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

Explore More