Ann Gordon-Ross | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ann Gordon-Ross is active.

Explore More

Publication

Featured researches published by Ann Gordon-Ross.

design, automation, and test in europe | 2004

Automatic tuning of two-level caches to embedded applications

Ann Gordon-Ross; Frank Vahid; Nikil D. Dutt

The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for optimizations. We present an automated method for tuning two-level caches to embedded applications for reduced energy consumption. The method is applicable to both a simulation-based exploration environment and a hardware-based system prototyping environment. We introduce the two-level cache tuner, or TCaT - a heuristic for searching the huge solution space of possible configurations. The heuristic interlaces the exploration of the two cache levels and searches the various cache parameters in a specific order based on their impact on energy. We show the integrity of our heuristic across multiple memory configurations and even in the presence of hardware/software partitioning - a common optimization capable of achieving significant speedups and/or reduced energy consumption. We apply our exploration heuristic to a large set of embedded applications. Our experiments demonstrate the efficacy of our heuristic: on average the heuristic examines only 7% of the possible cache configurations, but results in cache sub-system energy savings of 53%, only 1% more than the optimal cache configuration. In addition, the configured cache achieves an average speedup of 30% over the base cache configuration due to tuning of cache line size to the applications needs.

IEEE Computer Architecture Letters | 2002

Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

Ann Gordon-Ross; Susan Cotterell; Frank Vahid

Embedded systems commonly execute oneprogram for their lifetime. Designing embedded systemarchitectures with configurable components, such thatthose components can be tuned to that one program basedon a program pre-analysis, can yield significant powerand performance benefits. We illustrate such benefits bydesigning a loop cache specifically with tuning in mind.Our results show a 70% reduction in instruction memoryaccess, for MIPS and 8051 processors – representingtwice the reduction from a regular loop cache, translatingto good power savings.

design automation conference | 2007

A self-tuning configurable cache

Ann Gordon-Ross; Frank Vahid

The memory hierarchy of a system can consume up to 50% of microprocessor system power. Previous work has shown that tuning a configurable cache to a particular application can reduce memory subsystem energy by 62% on average. We introduce a self-tuning cache that performs transparent runtime cache tuning, thus relieving the application designer and/or compiler from predetermining an applications cache configuration. The self-tuning cache applies tuning at a determined tuning interval. A good interval balances tuning process energy overhead against the energy overhead of running in a sub-optimal cache configuration, which we show wastes much energy. We present a self-tuning cache that dynamically varies the tuning interval, resulting in average energy reduction of as much as 29%, falling within 13% of an oracle-based optimal method.

ACM Transactions on Reconfigurable Technology and Systems | 2012

Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing

Adam Jacobs; Grzegorz Cieslewski; Alan D. George; Ann Gordon-Ross; Herman Lam

Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the potential to provide space applications with the necessary performance to meet next-generation mission requirements. However, mitigating an FPGA’s susceptibility to single-event upset (SEU) radiation is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose a reconfigurable fault tolerance (RFT) framework that enables system designers to dynamically adjust a system’s level of redundancy and fault mitigation based on the varying radiation incurred at different orbital positions. This framework includes an adaptive hardware architecture that leverages FPGA reconfigurable techniques to enable significant processing to be performed efficiently and reliably when environmental factors permit. To accurately estimate upset rates, we propose an upset rate modeling tool that captures time-varying radiation effects for arbitrary satellite orbits using a collection of existing, publically available tools and models. We perform fault-injection testing on a prototype RFT platform to validate the RFT architecture and RFT performability models. We combine our RFT hardware architecture and the modeled upset rates using phased-mission Markov modeling to estimate performability gains achievable using our framework for two case-study orbits.

local computer networks | 2008

Real-time performance analysis of Adaptive Link Rate

Baoke Zhang; Karthikeyan Sabhanatarajan; Ann Gordon-Ross; Alan D. George

High speed links are widely deployed in modern day computer networks to meet the ever growing needs for increasing data bandwidth. However, with the increase in the link rate, the power consumption of the network interfaces increases exponentially, compounding growing concerns about network power consumption. Fortunately, network traffic characteristics show that rapid link rates are not always required. During times of reduced network traffic, the adaptive link rate (ALR) mechanism allows link rates to be reduced with little impact on network performance. Current research has focused on policies to control when and how to change link rates, and have shown promising energy savings. However, these works have been largely simulative, and have not addressed many of the challenges involved in implementation. In this paper, we develop a hardware prototype ALR system and address real-time challenges involved in realizing such an implementation. We also identify new considerations for control policy development given current technology capabilities as well as future projections.

field-programmable custom computing machines | 2009

Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks

Rafael Garcia; Ann Gordon-Ross; Alan D. George

Wireless sensor networks (WSNs) are typicallycomposed of very small, battery-operated devices (sensor nodes) containing simple microprocessors with few computational resources. However, the rapidly increasing popularity of WSNs has placed increased computational demands upon these systems, due to increasingly complex operating environments and enhanced data-sensing technology. Whereas introducing more powerful microprocessors into sensor nodes addressesthese demands, sensor nodes do not contain sufficient energy reserves to support these microprocessors. In this paper, we present a partially reconfigurable FPGA-based architecture and methodology to provide increased WSN flexibility and computational resources, resulting in superior power consumption and performance compared to a microprocessor capable of satisfying similar demands.

IEEE Transactions on Mobile Computing | 2010

SIP-Based IMS Signaling Analysis for WiMax-3G Interworking Architectures

Arslan Munir; Ann Gordon-Ross

The third-generation partnership project (3GPP) and 3GPP2 have standardized the IP multimedia subsystem (IMS) to provide ubiquitous and access network-independent IP-based services for next-generation networks via merging cellular networks and the Internet. The application layer Session Initiation Protocol (SIP), standardized by 3GPP and 3GPP2 for IMS, is responsible for IMS session establishment, management, and transformation. The IEEE 802.16 worldwide interoperability for microwave access (WiMax) promises to provide high data rate broadband wireless access services. In this paper, we propose two novel interworking architectures to integrate WiMax and third-generation (3G) networks. Moreover, we analyze the SIP-based IMS registration and session setup signaling delay for 3G and WiMax networks with specific reference to their interworking architectures. Finally, we explore the effects of different WiMax-3G interworking architectures on the IMS registration and session setup signaling delay.

great lakes symposium on vlsi | 2008

Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy

Ann Gordon-Ross; Jeremy Lau; Brad Calder

Phase-based tuning methodologies specialize system parameters for each application phase of execution. Parameters are varied during execution, as opposed to remaining fixed as in an application-based tuning methodology. Prior work and logic suggests phase-based tuning may provide significant savings over application-based tuning. We investigate this hypothesis using a detailed cache model and tune a highly-configurable cache on a per-phase basis compared to tuning once per application, and found phase-based tuning to yield improvements of up to 37% in performance and 20% in energy over application-based tuning. Furthermore, we extend previous phase-based tuning of a configurable cache by significantly increasing configurability and show 14% energy improvement compared to previous methods. In addition, we quantify the overhead imposed due to cache reconfiguration.

ieee computer society annual symposium on vlsi | 2008

Smart-NICs: Power Proxying for Reduced Power Consumption in Network Edge Devices

Karthikeyan Sabhanatarajan; Ann Gordon-Ross; Mark Oden; Mukund Navada; Alan D. George

The number of edge devices connected to the Internet is increasing at a rapid rate. To maintain network connectivity, the majority of these devices remain completely powered on when idle, wasting unnecessary energy. A novel idea to conserve energy while maintaining network connectivity is to place the computer in standby mode during idle periods and delegate the packet-handling functions to its network interface card (NIC). The NIC, acting as a liaison for the host, can proxy a variety of network protocols, increasing the standby time of the host without compromising its active connections. In this paper, we analyze the requirements of such a packet classifier and design a low-power hardware-based packet classification technique, which, compared to a software-based packet classification technique, consumes 59% less energy with a 9x speedup.

design, automation, and test in europe | 2009

Bitstream relocation with local clock domains for partially reconfigurable FPGAs

Adam Flynn; Ann Gordon-Ross; Alan D. George

Partial Reconfiguration (PR) of FPGAs presents many opportunities for application design flexibility, enabling tasks to dynamically swap in and out of the FPGA without entire system interruption. However, mapping a task to any available PR region (PRR) requires a unique partial bitstream for each PRR. This replication can introduce significant overheads in terms of bitstream storage and communication requirements. Previous research in partial bitstream relocation can alleviate these overheads by transforming a single partial bitstream to map to any available PRR. However, careful steps are necessary to ensure proper functionality of relocated partial bitstreams and may result in clock routing inefficiencies. These routing inefficiencies can be alleviated by using regional clock resources introduced in the Virtex-4 FPGAs to implement local clock domains. PRRs can internally drive local clock domains, enabling each PRR to vary its clock frequency with respect to a single global clock signal, as opposed to sending multiple global clock signals (one for each desired clock frequency) to each PRR. We introduce this novel local clock domain (LCD) concept, which provides enhanced PR design flexibility. However, integration of LCDs and partial bitstream relocation introduces new challenges. In this paper, we identify motivating application domains for this integration, analyze integration benefits, and provide a detailed integration methodology.

Explore More