Aditya Yanamandra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aditya Yanamandra is active.

Explore More

Publication

Featured researches published by Aditya Yanamandra.

great lakes symposium on vlsi | 2008

A low-power phase change memory based hybrid cache architecture

Prasanth Mangalagiri; K. Sarpatwari; Aditya Yanamandra; Vijaykrishnan Narayanan; Yuan Xie; Mary Jane Irwin; Osama Awadel Karim

Sub-threshold leakage in SRAM based cache memories is becoming a predominant source of power consumption in deep-sub micron CMOS designs. Phase Change Random Access Memory (PRAM), a high density, fast access, non-volatile memory is being considered as a candidate for future universal memory technologies. In this paper, we investigate the architectural challenges in integrating a PRAM based memory into the conventional cache hierarchy. First, we develop PRAM cache delay and energy models. We then propose a hybrid PRAM architecture for L1 instruction caches on embedded processors. We also propose a PRAM based unified cache architecture for L2 caches on high-end microprocessors. Finally, we evaluate the proposed architectures, in terms of area, performance, and energy. The experimental results show that the PRAM based cache architectures achieve close to 80% reduction in the leakage energy consumption of a L1-L2 cache hierarchy.

IEEE Transactions on Dependable and Secure Computing | 2010

On the Effects of Process Variation in Network-on-Chip Architectures

Chrysostomos Nicopoulos; Suresh Srinivasan; Aditya Yanamandra; Dongkook Park; Vijaykrishnan Narayanan; Chita R. Das; Mary Jane Irwin

The advent of diminutive technology feature sizes has led to escalating transistor densities. Burgeoning transistor counts are casting a dark shadow on modern chip design: global interconnect delays are dominating gate delays and affecting overall system performance. Networks-on-Chip (NoC) are viewed as a viable solution to this problem because of their scalability and optimized electrical properties. However, on-chip routers are susceptible to another artifact of deep submicron technology, Process Variation (PV). PV is a consequence of manufacturing imperfections, which may lead to degraded performance and even erroneous behavior. In this work, we present the first comprehensive evaluation of NoC susceptibility to PV effects, and we propose an array of architectural improvements in the form of a new router design-called SturdiSwitch-to increase resiliency to these effects. Through extensive reengineering of critical components, SturdiSwitch provides increased immunity to PV while improving performance and increasing area and power efficiency.

Journal of Parallel and Distributed Computing | 2011

RAFT: A router architecture with frequency tuning for on-chip networks

Asit K. Mishra; Aditya Yanamandra; Reetuparna Das; Soumya Eachempati; Ravi R. Iyer; Narayanan Vijaykrishnan; Chita R. Das

With increasing number of cores being integrated on a single die, Network-on-Chips (NoCs) have become the de-facto standard in providing scalable communication backbones for these multi-core chips. NoCs have a significant impact on the systems performance, power and reliability. However, NoCs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly. Towards this end, this paper proposes a novel router architecture, where we tune the frequency of a router in response to network load to manage both performance and power. We propose three dynamic frequency tuning techniques, FreqBoost, FreqThrtl and FreqTune, targeted at congestion and power management in NoCs. We also propose and evaluate a novel fine-grained frequency tuning scheme where we vary the number of virtual-channels in a router dynamically. As a further optimization to these schemes, we propose a frequency tuning scheme where we tune the frequency of the four ports of a mesh router separately from the local port. As enablers for these techniques, we exploit Dynamic Voltage and Frequency Scaling (DVFS) and the imbalance in a generic router pipeline through time stealing. We also evaluate and analyze the proposed schemes from the point of view of reliability against soft error vulnerability and provide guidelines in choosing the appropriate scheme when reliability is the prime design constraint. Experiments using synthetic workloads on an 8 x 8 wormhole-switched mesh interconnect show that FreqBoost is a better choice for reducing average latency (maximum 40%) while, FreqThrtl provides the maximum benefits in terms of power saving and energy delay product (EDP). The FreqTune scheme is a better candidate for optimizing both performance and power, achieving on an average 36% reduction in latency, 13% savings in power (up to 24% at high load), and 40% savings (up to 70% at high load) in EDP. With application benchmarks, we observe IPC improvement up to 23% using our design. Our analysis shows FreqBoost to be the most robust scheme amongst the three schemes when reliability is a concern.

asia and south pacific design automation conference | 2010

Optimizing power and performance for reliable on-chip networks

Aditya Yanamandra; Soumya Eachempati; Niranjan Soundararajan; Vijaykrishnan Narayanan; Mary Jane Irwin; Ramakrishnan Krishnan

We propose novel techniques to minimize the power and performance penalties in protecting the NoC against soft errors, while giving desired reliability guarantees. Some applications have inherent error tolerance which can be exploited to save power, by turning off the error correction mechanisms for a fraction of the total time without trading off reliability. To further increase the power savings, we bound the vulnerability of a router by throttling the traffic into the router. In order to minimize the throughput loss due to throttling, we propose dividing the die into domains and using multiple vulnerability bounds across these domains. We explore both static and dynamic selection of vulnerability bounds. We find that for applications with an error tolerance of 10% of the raw error rate, the dynamic multiple vulnerability bound scheme can save up to 44% of power expended for error correction at a marginal network throughput loss of 3%.

asilomar conference on signals, systems and computers | 2007

Variation-Aware Low-Power Buffer Design

Chrysostomos Nicopoulos; Aditya Yanamandra; Suresh Srinivasan; Narayanan Vijaykrishnan; Mary Jane Irwin

Process variation (PV) is a consequence of manufacturing imperfections, which may lead to degraded performance or higher leakage power. In this paper, we focus on the design of an intelligent buffer that logically reorders the entries in FIFO buffer to minimize overall leakage power consumption. The buffer architecture, called IntelliBuffer, has been designed and evaluated in 90 nm and 32 nm CMOS technology. Our synthesized results show that our proposed design is as fast as a conventional buffer structure, while providing the ability to reduce power consumption significantly. When our buffer was used in a network-on-chip (NoC) implementation, we obtained 24% leakage savings at 90 nm, and savings of 28% at 32 nm. To further validate the efficacy of our proposed design, we incorporated IntelliBuffer into ViChaR, a recently introduced dynamic buffer management system for NoC routers. Experimental results indicate a marked reduction in ViChaRs leakage power consumption (21% at 90 nm) when IntelliBuffer is employed.

dependable systems and networks | 2008

Analysis and solutions to issue queue process variation

Niranjan Soundararajan; Aditya Yanamandra; Chrysostomos Nicopoulos; Narayanan Vijaykrishnan; Anand Sivasubramaniam; Mary Jane Irwin

The last few years have witnessed an unprecedented explosion in transistor densities. Diminutive feature sizes have enabled microprocessor designers to break the billion-transistors per chip mark. However various new reliability challenges such as process variation (PV) have emerged that can no longer be ignored by chip designers. In this paper, we provide a comprehensive analysis of the effects of PV on the microprocessorpsilas Issue Queue. Variations can slow down issue queue entries and result in as much as 20.5% performance degradation. To counter this, we look at different solutions that include instruction steering, operand- and port- switching mechanisms. Given that PV is non-deterministic at design-time, our mechanisms allow the fast and slow issue-queue entries to co-exist in turn enabling instruction dispatch, issue and forwarding to proceed with minimal stalls. Evaluation on a detailed simulation environment indicates that the proposed mechanisms can reduce performance degradation due to PV to a low 1.3%.

international parallel and distributed processing symposium | 2008

Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations

Aditya Yanamandra; Bryan Cover; Padma Raghavan; Mary Jane Irwin; Mahmut T. Kandemir

Scratchpad memories (SPMs) have been shown to be more energy efficient and have faster access times than traditional hardware-managed caches. This, coupled with the predictability of data presence, makes SPMs an attractive alternative to cache for many scientific applications. In this work, we consider an SPM based system for increasing the performance and the energy efficiency of sparse matrix-vector multiplication on a chip multi-processor. We ensure the efficient utilization of the SPM by profiling the application for the data structures which do not perform well in traditional cache. We evaluate the impact of using an SPM at all levels of the on-chip memory hierarchy. Our experimental results show an average increase in performance by 13.5-15% and an average decrease in the energy consumption by 28-33% on an 8-core system depending on which level of the hierarchy the SPM is utilized.

high performance embedded architectures and compilers | 2008

In-Network Caching for Chip Multiprocessors

Aditya Yanamandra; Mary Jane Irwin; Vijaykrishnan Narayanan; Mahmut T. Kandemir; Sri Hari Krishna Narayanan

Effective management of data is critical to the performance of emerging multi-core architectures. Our analysis of applications from SpecOMP reveal that a small fraction of shared addresses correspond to a large portion of accesses. Utilizing this observation, we propose a technique that augments a router in a on-chip network with a small data store to reduce the memory access latency of the shared data. In the proposed technique, shared data from read response packets that pass through the router are cached in its data store to reduce number of hops required to service future read requests. Our limit study reveals that such caching has the potential to reduce memory access latency on an average by 27%. Further, two practical caching strategies are shown to reduce memory access latency by 14% and 17% respectively with a data store of just four entries at 2.5% area overhead.

international symposium on nanoscale architectures | 2009

Power and area reduction using carbon nanotube bundle interconnect in global clock tree distribution network

Yuan Xie; Soumya Eachempati; Aditya Yanamandra; Vijaykrishnan Narayanan; Mary Jane Irwin

The gigahertz frequency regime together with the rising delay of on-chip interconnect and increased device densities, has resulted in aggravating clock skew problem. Skew and power dissipation of clock distribution networks are key factors in determining the maximum attainable clock frequency as well as the chip power consumption. The traditional skew balancing schemes incur additional cost of increased area and power. In this paper, we propose a novel skew reduction mechanism using dissimilar interconnect materials for balancing the non-uniform loads in a clock network. Single walled carbon nanotube (SWCNT) bundles have been shown to have high electrical conductivity for future process technology nodes. We design a H-tree clock network made up of both SWCNT bundles and copper interconnect at 22nm technology node. Our experiments show that such a network saves an average of 65% in area and 22% of power over a pure copper distribution network.

Archive | 2010