Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Orran Krieger is active.

Publication


Featured researches published by Orran Krieger.


operating systems design and implementation | 1996

Automatic compiler-inserted I/O prefetching for out-of-core applications

Todd C. Mowry; Angela K. Demke; Orran Krieger

Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully-automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme, the compiler provides the crucial information on future access patterns without burdening the programmer, the operating system supports non-binding prefetch and release hints for managing I/O, and the operating system cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively move prefetches back ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We have implemented our scheme using the SUIF compiler and the Hurricane operating system. Our experimental results demonstrate that our fully-automatic scheme effectively hides the I/O latency in out-ofcore versions of the entire NAS Parallel benchmark suite, thus resulting in speedups of roughly twofold for five of the eight applications, with one application speeding up by over threefold.


Ibm Systems Journal | 2003

Enabling autonomic behavior in systems software with hot swapping

Jonathan Appavoo; Kevin Hui; Craig A.N. Soules; Robert W. Wisniewski; Dilma Da Silva; Orran Krieger; Marc A. Auslander; David Edelsohn; Benjamin Gamsa; Gregory R. Ganger; Paul E. McKenney; Michal Ostrowski; Bryan S. Rosenburg; Michael Stumm; Jimi Xenidis

Autonomic computing systems are designed to be self-diagnosing and self-healing, such that they detect performance and correctness problems, identify their causes, and react accordingly. These abilities can improve performance, availability, and security, while simultaneously reducing the effort and skills required of system administrators. One way that systems can support these abilities is by allowing monitoring code, diagnostic code, and function implementations to be dynamically inserted and removed in live systems. This hot swapping avoids the requisite prescience and additional complexity inherent in creating systems that have all possible configurations built in ahead of time. For already-complex pieces of code such as operating systems, hot swapping provides a simpler, higher-performance, and more maintainable method of achieving autonomic behavior. In this paper, we discuss hot swapping as a technique for enabling autonomic computing in systems software. First, we discuss its advantages and describe the required system structure. Next, we describe K42, a research operating system that explicitly supports interposition and replacement of active operating system code. Last, we describe the infrastructure of K42 for hot swapping and several instances of its use demonstrating autonomic behavior.


The Journal of Supercomputing | 1995

Hierarchical clustering: a structure for scalable multiprocessor operating system design

Ronald C. Unrau; Orran Krieger; Benjamin Gamsa; Michael Stumm

We introduce the concept ofhierarchical clustering as a way to structure shared-memory multiprocessor operating systems for scalability. The concept is based on clustering and hierarchical system design. Hierarchical clustering leads to a modular system, composed of easy-to-design and efficient building blocks. The resulting structure is scalable because it 1) maximizes locality, which is key to good performance in NUMA (non-uniform memory access) systems and 2) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions that are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system calledHurricane. This prototype system is the first complete and running implementation of its kind and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention.


international conference on parallel processing | 1993

A Fair Fast Scalable Rea,der-Writer Lock

Orran Krieger; Michael Stumm; Ronald C. Unrau; Jonathan Hanna

A reader-writer (RW) lock allows either multiple readers to inspect shared data or a single writer exclusive access for modifying that data. On shared memory multiprocessors,cost of acquiring and releasing these locks can have a large impact on the performance of parallel applications. A major problem with naive implementations of these locks, where processors spin on a global lock variable waiting for the lock to become available, is that the memory containing the lock and the interconnection network to that memory will also become contended when the lock is contended.


IEEE Computer | 1994

The Alloc Stream Facility: a redesign of application-level stream I/O

Orran Krieger; Michael Stumm; Ronald C. Unrau

The authors introduce an application-level I/O facility, the Alloc Stream Facility, that addresses three primary goals. First, ASF addresses recent computing substrate changes to improve performance, allowing applications to benefit from specific features such as mapped files. Second, it is designed for parallel systems, maximizing concurrency and reporting errors properly. Finally, its modular and object-oriented structure allows it to support a variety of popular I/O interfaces (including stdio and C++ stream I/O) and to be tuned to system behavior, exploiting a systems strengths while avoiding its weaknesses. On a number of standard Unix systems, I/O-intensive applications perform substantially better when linked to the Alloc facility. Also, modifying applications to use a new interface provided by the facility can improve performance by another factor of two. These performance improvements are achieved primarily by reducing data copying and the number of system calls. Not visible in these improvements is the extra degree of concurrency the facility brings to multithreaded and parallel applications.<<ETX>>


international conference on parallel processing | 1994

Optimizing IPC Performance for Shared-Memory Multiprocessors

Benjamin Gamsa; Orran Krieger; Michael Stumm

We assert that in order to perform well, a shared-memory multiprocessor inter-process communication (IPC) facility must avoid a) accessing any shared data, and b) acquiring any locks. In addition, such a multiprocessor IPC facility must preserve the locality and concurrency of the applications themselves so that the high performance of the IPC facility can be fully exploited. In this paper we describe the design and implementation of a new shared-memory multiprocessor IPC facility that in the common case internally requires no accesses to shared data and no locking. In addition, the model of IPC we support and our implementation ensure that local resources are made available to the server to allow it to exploit any locality and concurrency available in the service. To the best of our knowledge, this is the first IPC subsystem with these attributes. The performance data we present demonstrates that the end-to- end performance of our multiprocessor IPC facility is competitive with the fastest uniprocessor IPC times.


international workshop on object orientation in operating systems | 1995

(De-)clustering objects for multiprocessor system software

Eric W. Parsons; Ben Gamsa; Orran Krieger; Michael Stumm

Designing system software for large scale shared memory multiprocessors is challenging because of the level of performance demanded by the application workload and the distributed nature of the system. Adopting an object oriented approach for our system, we have developed a framework for de clustering objects, where each object may migrate, replicate, and distribute all or part of its data across the system memory using the policies that will best meet the locality requirements for that data. The mechanism for object invocation hides the internal structure of an object, allowing a request to be made directly to the most suitable part of the object on a per processor basis without any knowledge of how the object is de clustered. Method invocation is very efficient, both within and across address spaces, involving no remote memory accesses in the common case. We describe the design and implementation of this framework in Tornado, our multiprocessor operating system.<<ETX>>


workshop on i/o in parallel and distributed systems | 1996

HFS: a performance-oriented flexible file system based on building-block compositions

Orran Krieger; Michael Stumm

The Hurricane File System (HFS) is designed for (potentially large-scale) shared-memory multiprocessors. Its architecture is based on the principle that, in order to maximize performance for applications with diverse requirements, a file system must support a wide variety of file structures, file system policies, and I/O interfaces. Files in HFS are implemented using simple building blocks composed in potentially complex ways. This approach yields great flexibility, allowing an application to customize the structure and policies of a file to exactly meet its requirements. As an extreme example, HFS allows a file’s structure to be optimized for concurrent random-access write-only operations by 10 threads, something no other file system can do. Similarly, the prefetching, locking, and file cache management policies can all be chosen to match an application’s access pattern. In contrast, most parallel file systems support a single file structure and a small set of policies. We have implemented HFS as part of the Hurricane operating system running on the Hector shared-memory multiprocessor. We demonstrate that the flexibility of HFS comes with little processing or I/O overhead. We also show that for a number of file access patterns, HFS is able to deliver to the applications the full I/O bandwidth of the disks on our system.


workshop on hot topics in operating systems | 2001

Supporting hot-swappable components for system software

Kevin Hui; Jonathan Appavoo; Robert W. Wisniewski; Marc A. Auslander; David Edelsohn; Benjamin Gamsa; Orran Krieger; Bryan S. Rosenburg; Michael Stumm

Summary form only given. A hot-swappable component is one that can be replaced with a new or different implementation while the system is running and actively using the component. For example, a component of a TCP/IP protocol stack, when hot-swappable, can be replaced (perhaps to handle new denial-of-service attacks or improve performance), without disturbing existing network connections. The capability to swap components offers a number of potential advantages such as: online upgrades for high availability systems, improved performance due to dynamic adaptability and simplified software structures by allowing distinct policy and implementation options to be implemented in separate components (rather than as a single monolithic component) and dynamically swapped as needed. In order to hot-swap a component, it is necessary to (i) instantiate a replacement component; (ii) establish a quiescent state in which the component is temporarily idle; (iii) transfer state from the old component to the new component; (iv) swap the new component for the old; and (v) deallocate the old component.


hawaii international conference on system sciences | 1990

An optimistic algorithm for consistent replicated shared data

Orran Krieger; Michael Stumm

The authors present the Toris algorithm for implementing the shared data model of interprocess communication in a distributed environment. Data are replicated at all processing sites. An optimistic algorithm maintains consistency between replicas of shared data and ensures that the synchronization requirements of the processes are met. Transactions are used to improve the performance of the optimistic algorithm and provide a mechanism for application-level synchronization.<<ETX>>

Collaboration


Dive into the Orran Krieger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kevin Hui

University of Toronto

View shared research outputs
Researchain Logo
Decentralizing Knowledge