Ravindra Kuramkote | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ravindra Kuramkote is active.

Explore More

Publication

Featured researches published by Ravindra Kuramkote.

high-performance computer architecture | 1999

Impulse: building a smarter memory controller

John B. Carter; Wilson C. Hsieh; Leigh Stoller; Mark R. Swanson; Lixin Zhang; Erik Brunvand; Al Davis; Chen-Chi Kuo; Ravindra Kuramkote; Michael A. Parker; Lambert Schaelicke; Terry Tateyama

Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can be used to improve the performance of memory-bound programs. For the NAS conjugate gradient benchmark, Impulse improves performance by 67%. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In addition to scientific applications, we expect that Impulse will benefit regularly strided memory-bound applications of commercial importance, such as database and multimedia programs.

international conference on distributed computing systems | 1993

Run-time support and storage management for memory-mapped persistent objects

Bruce R. Millard; Partha Dasgupta; Sanjay G. Rao; Ravindra Kuramkote

The authors present the design and implementation of a persistent store called SPOMS. SPOMS is a runtime system that provides a store for persistent objects and is language independent. The objects are created via calls to SPOMS, and, when used, SPOMS directly maps them into the spaces of all requesting processes. The objects are stored in native format and are concurrently sharable. The store can handle distributed applications. The system uses the concept of a compiled class to manage persistent objects. The compiled class is a template that is used to create and store objects in a language independent manner and so that object reuse can occur without recompilation or relinking of an application that uses it. A prototype of SPOMS has been built on top of the Mach operating system. The motivations, the design, and implementation details are presented. Related and future work are discussed.<<ETX>>

workshop on hot topics in operating systems | 1993

FLEX: a tool for building efficient and flexible systems

John B. Carter; Bryan Ford; Mike Hibler; Ravindra Kuramkote; Jeffrey Law; Jay Lepreau; Douglas B. Orr; Leigh Stoller; Mark R. Swanson

Modern operating systems must support a wide variety of services for a diverse set of users. Designers of these systems face a tradeoff between functionality and performance. Systems like Mach provide a set of general abstractions and attempt to handle every situation, which can lead to poor performance for common cases. Other systems, such as Unix, provide a small set of abstractions that can be made very efficient, at the expense of functionality. We are implementing a flexible system building tool, FLEX, that allows us to support a powerful operating systems interface efficiently by constructing specialized module implementations at runtime. FLEX improves the performance of existing systems by optimizing interprocess communications paths and relocating servers and clients to reduce communications overhead. These facilities improve the performance of Unix system calls on Mach from 20-400%. Furthermore, FLEX can dynamically extend the kernel in a controlled fashion, which gives user programs access to privileged data and devices not envisioned by the original operating system implementor.<<ETX>>

international conference on parallel processing | 1998

ASCOMA: an adaptive hybrid shared memory architecture

Chen-Chi Kuo; John B. Carter; Ravindra Kuramkote; Mark R. Swanson

Scalable shared memory multiprocessors traditionally use either a cache coherent non-uniform memory access (CC-NUMA) or simple cache-only memory architecture (S-COMA) memory architecture. Recently, hybrid architectures that combine aspects of both CC-NUMA and S-COMA have emerged. We present two improvements over other hybrid architectures. The first improvement is a page allocation algorithm that prefers S-COMA pages at low memory pressures. Once the local free page pool is drained, additional pages are mapped in CC-NUMA mode until they suffer sufficient remote misses to warrant upgrading to S-COMA mode. The second improvement is a page replacement algorithm that dynamically backs off the rate of page remappings from CC-NUMA to S-COMA mode at high memory pressure. This design dramatically reduces the amount of kernel overhead and the number of induced cold misses caused by needless thrashing of the page cache. The resulting hybrid architecture is called adaptive S-COMA (AS-COMA). AS-COMA exploits the best of S-COMA and CC-NUMA, performing like an S-COMA machine at low memory pressure and like a CC-NUMA machine at high memory pressure. AS-COMA outperforms CC-NUMA under almost all conditions, and outperforms other hybrid architectures by up to 17% at low memory pressure and up to 90% at high memory pressure.

Lecture Notes in Computer Science | 1998

Memory System Support for Irregular Applications

John B. Carter; Wilson C. Hsieh; Mark R. Swanson; Lixin Zhang; Erik Brunvand; Al Davis; Chen-Chi Kuo; Ravindra Kuramkote; Michael A. Parker; Lambert Schaelicke; Leigh Stoller; Terry Tateyama

Because irregular applications have unpredictable memory access patterns, their performance is dominated by memory behavior. The Impulse configurable memory controller will enable significant performance improvements for irregular applications, because it can be configured to optimize memory accesses on an application-by-application basis. In this paper we describe the optimizations that the Impulse controller supports for sparse matrix-vector product, an important computational kernel, and outline the transformations that the compiler and runtime system must perform to exploit these optimizations.

high-performance computer architecture | 1999

MP-LOCKs: replacing H/W synchronization primitives with message passing

Chen-Chi Kuo; John B. Carter; Ravindra Kuramkote

Shared memory programs guarantee the correctness of concurrent accesses to shared data using interprocessor synchronization operations. The most common synchronization operators are locks, which are traditionally implemented via a mix of shared memory accesses and hardware synchronization primitives like test-and-set. In this paper, we argue that synchronization operations implemented using fast message passing and kernel-embedded lock managers are an attractive alternative to dedicated synchronization hardware. We propose three message passing lock (MP-LOCK) algorithms (centralized, distributed, and reactive) and provide implementation guidelines. MP-LOCKs reduce the design complexity and runtime occupancy of DSM controllers and can exploit softwares inherent flexibility to adapt to differing applications lock access patterns. We compared the performance of MP-LOCKs with two common shared memory lock algorithms: test-and-test-and-set and MCS locks and found that MP-LOCKs scale better. For machines with 16 to 32 nodes, applications using MP-LOCKs ran up to 186% faster than the same applications with shared memory locks. For small systems (up to 8 nodes), three applications with MP-LOCKs slow down by no more than 18%, while the other two slowed by no more than 180% due to higher software overhead. We conclude that locks based on message passing should be considered as a replacement for hardware locks in future scalable multiprocessors that support efficient message passing mechanisms.

ieee international conference on high performance computing data and analytics | 1998

Design alternatives for shared memory multiprocessors

John B. Carter; Chen-Chi Kuo; Ravindra Kuramkote; Mark R. Swanson

We consider the design alternatives available for building the next generation DSM machine (e.g., the choice of memory architecture, network technology, and amount and location of per-node remote data cache). To investigate this design space, we have simulated five applications on a wide variety of possible DSM architectures that employ significantly different caching techniques. We also examine the impact of using a special purpose system interconnect designed specifically to support low latency DSM operation versus using a powerful off the shelf system interconnect. We found that two architectures have the best combination of good average performance and reasonable worst case performance: CC-NUMA employing a moderate sized DRAM remote access cache (RAC) and a hybrid CC-NUMA/S-COMA architecture called AS-COMA or adaptive S-COMA. Both pure CC-NUMA and pure S-COMA have serious performance problems for some applications, while CC-NUMA employing an SRAM RAC does not perform as well as the two architectures that employ larger DRAM caches. The paper concludes with several recommendations to designers of next generation DSM machines, complete with a discussion of the issues that led to each recommendation so that designers can decide which ones are relevant to them given changes in technology and corporate priorities.

international workshop on object orientation in operating systems | 1993

Strange bedfellows: issues in object naming under Unix

Douglas B. Orr; Robert W. Mecklenburg; Ravindra Kuramkote

Naming plays a key role in the design of any system that exports services or resources. Object systems may export many different categories of names: instances, components of records, types, etc. Operating systems export the names of files, devices, and services. Integrating an object base with existing operating system facilities can improve accessibility of the object base resources. We consider the benefits and pitfalls of integrating an object base namespace with the Unix namespace.<<ETX>>

Archive | 1996