Chris Gniady | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Gniady is active.

Explore More

Publication

Featured researches published by Chris Gniady.

international symposium on computer architecture | 1999

Is SC + ILP = RC?

Chris Gniady; Babak Falsafi; T. N. Vijaykumar

Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high performance implementations. Release consistency (RC), however, enables the highest performance implementations but puts the burden on the programmer to specify which memory operations need to be atomic and in program order. This paper shows, for the first time, that SC implementations can perform as well as RC implementations if the hardware provides enough support for speculation. Both SC and RC implementations rely on reordering and overlapping memory operations for high performance. To enforce order when necessary, an RC implementation uses software guarantees, whereas an SC implementation relies on hardware speculation. Our SC implementation, called SC++, closes the performance gap because: (1) the hardware allows not just loads, as some current SC implementations do, but also stores to bypass each other speculatively to hide remote latencies, (2) the hardware provides large speculative state for not just processor, as previously proposed, but also memory to allow out-of-order memory operations, (3) the support for hardware speculation does not add excessive overheads to processor pipeline critical paths, and (4) well-behaved applications incur infrequent rollbacks of speculative execution. Using simulation, we show that SC++ achieves an RC implementations performance in all the six applications we studied.

measurement and modeling of computer systems | 2005

The performance impact of kernel prefetching on buffer cache replacement algorithms

Ali Raza Butt; Chris Gniady; Y. Charlie Hu

A fundamental challenge in improving file system performance is to design effective block replacement algorithms to minimize buffer cache misses. Despite the well-known interactions between prefetching and caching, almost all buffer cache replacement algorithms have been proposed and studied comparatively, without taking into account file system prefetching, which exists in all modern operating systems. This paper shows that such kernel prefetching can have a significant impact on the relative performance in terms of the number of actual disk l/Os of many well-known replacement algorithms; it can not only narrow the performance gap but also change the relative performance benefits of different algorithms. Moreover, since prefetching can increase the number of blocks clustered for each disk I/O and, hence, the time to complete the I/O, the reduction in the number of disk l/Os may not translate into proportional reduction in the total I/O time. These results demonstrate the importance of buffer caching research taking file system prefetching into consideration and comparing the actual disk l/Os and the execution time under different replacement algorithms.

high-performance computer architecture | 2004

Program counter based techniques for dynamic power management

Chris Gniady; Y.C. Hu; Yung-Hsiang Lu

Reducing energy consumption has become one of the major challenges in designing future computing systems. We propose a novel idea of using program counters to predict I/O activities in the operating system. We present a complete design of program-counter access predictor (PCAP) that dynamically learns the access patterns of applications and predicts when an I/O device can be shut down to save energy. PCAP uses path-based correlation to observe a particular sequence of program counters leading to each idle period, and predicts future occurrences of that idle period. PCAP differs from previously proposed shutdown predictors in its ability to: (1) correlate I/O operations to particular behavior of the applications and users, (2) carry prediction information across multiple executions of the applications, and (3) attain better energy savings while incurring low mispredictions.

international conference on parallel architectures and compilation techniques | 2002

Speculative sequential consistency with little custom storage

Chris Gniady; Babak Falsafi

This paper proposes SC++lite, a sequentially consistent system that relaxes memory order speculatively to bridge the performance gap among memory consistency models. Prior proposals to speculatively relax memory order require large custom on-chip storage to maintain a history of speculative processor and memory state while memory order is relaxed. SC++lite uses the memory hierarchy to store the speculative history, providing a scalable path for speculative SC systems across a wide range of applications and system latencies. We use cycle-accurate simulation of shared-memory multiprocessors to show that SC++lite can fully relax memory order while virtually obviating the need for custom on-chip storage. Moreover while demand for storage increases significantly with larger memory latencies, SC++lites ability to relax memory order remains insensitive to memory latency. An SC++lite system can improve performance over a base SC system by 28% with only 2 KB of custom storage in a system with 16 processors. In contrast, speculative SC systems with custom storage require 51 KB of storage to improve performance by 31% over a base SC system.

IEEE Transactions on Computers | 2006

Program counter-based prediction techniques for dynamic power management

Chris Gniady; Ali Raza Butt; Y.C. Hu; Yung-Hsiang Lu

Reducing energy consumption has become one of the major challenges in designing future computing systems. This paper proposes a novel idea of using program counters to predict I/O activities in the operating system. It presents a complete design of program-counter access predictor (PCAP) that dynamically learns the access patterns of applications and predicts when an I/O device can be shut down to save energy. PCAP uses path-based correlation to observe a particular sequence of program counters leading to each idle period and predicts future occurrences of that idle period. PCAP differs from previously proposed shutdown predictors in its ability to: 1) correlate I/O operations to particular behavior of the applications and users, 2) carry prediction information across multiple executions of the applications, and 3) attain higher energy savings while incurring lower mispredictions. We perform an extensive evaluation study of PCAP using a detailed trace-driven simulation and an actual Linux implementation. Our results show that PCAP achieves lower average mispredictions and higher energy savings than the simple timeout scheme and the state-of-the-art learning tree scheme

2011 International Green Computing Conference and Workshops | 2011

Exploring memory energy optimizations in smartphones

Ran Duan; Mingsong Bi; Chris Gniady

Recent development of sophisticated smartphones has made them indispensable part of our everyday life. However, advances in battery technology cannot keep up with the demand for longer battery life. Subsequently, energy efficiency has become one of the most important factors in designing smartphones. Multitasking and better multimedia features in the mobile applications continuously push memory requirements further, making energy optimizations for memory critical. Mobile RAM is already optimized for energy efficiency at the hardware level. It also provides power state switching interfaces to the operating system which enables the OS level energy optimizations. Many RAM optimizations have been explored for computer systems and in this paper we explore their applicability to smartphone hardware. In addition, we apply those optimizations to the newly emerging Phase Change Memory and study their energy efficiency and performance. Finally, we propose a hybrid approach to take the advantage of both Mobile RAM and Phase Change Memory. Results show that our hybrid mechanism can save more than 98% of memory energy as compared to the standard smartphone system with negligible impact on user experience.

IEEE Transactions on Computers | 2007

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

Ali Raza Butt; Chris Gniady; Y.C. Hu

high-performance computer architecture | 2010

Delay-Hiding energy management mechanisms for DRAM

Mingsong Bi; Ran Duan; Chris Gniady

Current trends in data-intensive applications increase the demand for larger physical memory, resulting in the memory subsystem consuming a significant portion of systems energy. Furthermore, data-intensive applications heavily rely on a large buffer cache that occupies a majority of physical memory. Subsequently, we are focusing on the power management for physical memory dedicated to the buffer cache. Several techniques have been proposed to reduce energy consumption by transitioning DRAM into low-power states. However, transitions between different power states incur delays and may affect whole system performance. We take advantage of the I/O handling routines in the OS kernel to hide the delay incurred by the memory state transition so that performance degradation is minimized while maintaining high memory energy savings. Our evaluation shows that the best of the proposed mechanisms hides almost all transition latencies while only consuming 3% more energy as compared to the existing on-demand mechanism, which can expose significant delays.

international conference on parallel architectures and compilation techniques | 2005

Store-ordered streaming of shared memory

Thomas F. Wenisch; Stephen Somogyi; Nikolaos Hardavellas; Jangwoo Kim; Chris Gniady; Anastassia Ailamaki; Babak Falsafi

Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send store-ordered streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.

IEEE Communications Magazine | 2014

Context-aware networking and communications: Part 1 [Guest Editorial]

Jinsong Wu; Igor Bisio; Chris Gniady; Ekram Hossain; Massimo Valla; Haibo Li

Context refers to information characterizing the situation of an entity or a group of entities, and it provides information about the present status of the entities. The term context may be understood differently in different scenarios and for different involved users. The involved entities can be either concrete entities or virtual entities. Involved concrete entities could be either a single entity, such as a person, a machine device, an object, or a location, or a group of entities. An involved virtual entity could be a software function, a software application, a service, an activity, and so on. Conventionally, much of the functionality of communications and networking is context-irrelevant. With the rapid development of modern communications and networking technologies in recent years, especially the increasing functionalities and complexities of the Internet, context-aware communications and networking (CACN) systems and applications have been developed in some limited areas and aspects. In the foreseeable future, context-aware functionalities would be much more extensively applied in information and communication technologies. CACN could be performed at all layers of communications and networking, from the physical and networking layers to transport and application layers. Context awareness may be considered as a response mechanism to the context information obtained from the involved concrete or virtual entities. Context information may have many different meanings, such as activities, geospatial information, network states, battery levels, situations of social networks, energy consumptions, environmental parameters, and signal-to-noise ratios. Context awareness allows for customization or creation of applications to match the preferences of the involved entities.

Explore More