Soumya Eachempati | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soumya Eachempati is active.

Explore More

Publication

Featured researches published by Soumya Eachempati.

high-performance computer architecture | 2009

Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs

Reetuparna Das; Soumya Eachempati; Asit K. Mishra; Vijaykrishnan Narayanan; Chita R. Das

Performance and power consumption of an on-chip interconnect that forms the backbone of Chip Multiprocessors (CMPs), are directly influenced by the underlying network topology. Both these parameters can also be optimized by application induced communication locality since applications mapped on a large CMP system will benefit from clustered communication, where data is placed in cache banks closer to the cores accessing it. Thus, in this paper, we design a hierarchical network topology that takes advantage of such communication locality. The two-tier hierarchical topology consists of local networks that are connected via a global network. The local network is a simple, high-bandwidth, low-power shared bus fabric, and the global network is a low-radix mesh. The key insight that enables the hybrid topology is that most communication in CMP applications can be limited to the local network, and thus, using a fast, low-power bus to handle local communication will improve both packet latency and power-efficiency. The proposed hierarchical topology provides up to 63% reduction in energy-delay-product over mesh, 47% over flattened butterfly, and 33% with respect to concentrated mesh across network sizes with uniform and non-uniform synthetic traffic. For real parallel workloads, the hybrid topology provides up to 14% improvement in system performance (IPC) and in terms of energy-delay-product, improvements of 70%, 22%, 30% over the mesh, flattened butterfly, and concentrated mesh, respectively, for a 32-way CMP. Although the hybrid topology scales in a power- and bandwidth-efficient manner with network size, while keeping the average packet latency low in comparison to high radix topologies, it has lower throughput due to high concentration. To improve the throughput of the hybrid topology, we propose a novel router micro-architecture, called XShare, which exploits data value locality and bimodal traffic characteristics of CMP applications to transfer multiple small flits over a single channel. This helps in enhancing the network throughput by 35%, providing a latency reduction of 14% with synthetic traffic, and improving IPC on an average 4% with application workloads.

international symposium on microarchitecture | 2009

A case for dynamic frequency tuning in on-chip networks

Asit K. Mishra; Reetuparna Das; Soumya Eachempati; Ravishankar R. Iyer; Narayanan Vijaykrishnan; Chita R. Das

Performance and power are the first order design metrics for network-on-chips (NoCs) that have become the de-facto standard in providing scalable communication backbones for multicores/CMPs. However, NoCs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly. Towards this end, this paper proposes a novel router architecture, where we tune the frequency of a router in response to network load to manage both performance and power. We propose three dynamic frequency tuning techniques, FreqBoost, FreqThrtl and FreqTune, targeted at congestion and power management in NoCs. As enablers for these techniques, we exploit Dynamic Voltage and Frequency Scaling (DVFS) and the imbalance in a generic router pipeline through time stealing. Experiments using synthetic workloads on a 8x8 wormhole-switched mesh interconnect show that FreqBoost is a better choice for reducing average latency (maximum 40%) while, FreqThrtl provides the maximum benefits in terms of power saving and energy delay product (EDP). The FreqTune scheme is a better candidate for optimizing both performance and power, achieving on an average 36% reduction in latency, 13% savings in power (up to 24% at high load), and 40% savings (up to 70% at high load) in EDP. With application benchmarks, we observe IPC improvement up to 23% using our design. The performance and power benefits also scale for larger NoCs.

design, automation, and test in europe | 2007

Assessing carbon nanotube bundle interconnect for future FPGA architectures

Soumya Eachempati; Arthur Nieuwoudt; Aman Gayasen; Narayanan Vijaykrishnan; Yehia Massoud

Field programmable gate arrays (FPGAs) are important hardware platforms in various applications due to increasing design complexity and mask costs. However, as CMOS process technology continues to scale, standard copper interconnect becomes a major bottleneck for FPGA performance. This paper proposed utilizing bundles of single-walled carbon nanotubes (SWCNT) as wires in the FPGA interconnect fabric and compare their performance to standard copper interconnect in future process technologies. To leverage the performance advantages of nanotube-based interconnect, several important aspects of the FPGA routing architecture were explored including the segmentation distribution and the internal population of the wires. The results demonstrate that FPGAs utilizing SWCNT bundle interconnect can achieve a 19% improvement in average area delay product over the best performing architecture for standard copper interconnect in 22 nm process technology

Journal of Parallel and Distributed Computing | 2011

RAFT: A router architecture with frequency tuning for on-chip networks

Asit K. Mishra; Aditya Yanamandra; Reetuparna Das; Soumya Eachempati; Ravi R. Iyer; Narayanan Vijaykrishnan; Chita R. Das

With increasing number of cores being integrated on a single die, Network-on-Chips (NoCs) have become the de-facto standard in providing scalable communication backbones for these multi-core chips. NoCs have a significant impact on the systems performance, power and reliability. However, NoCs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly. Towards this end, this paper proposes a novel router architecture, where we tune the frequency of a router in response to network load to manage both performance and power. We propose three dynamic frequency tuning techniques, FreqBoost, FreqThrtl and FreqTune, targeted at congestion and power management in NoCs. We also propose and evaluate a novel fine-grained frequency tuning scheme where we vary the number of virtual-channels in a router dynamically. As a further optimization to these schemes, we propose a frequency tuning scheme where we tune the frequency of the four ports of a mesh router separately from the local port. As enablers for these techniques, we exploit Dynamic Voltage and Frequency Scaling (DVFS) and the imbalance in a generic router pipeline through time stealing. We also evaluate and analyze the proposed schemes from the point of view of reliability against soft error vulnerability and provide guidelines in choosing the appropriate scheme when reliability is the prime design constraint. Experiments using synthetic workloads on an 8 x 8 wormhole-switched mesh interconnect show that FreqBoost is a better choice for reducing average latency (maximum 40%) while, FreqThrtl provides the maximum benefits in terms of power saving and energy delay product (EDP). The FreqTune scheme is a better candidate for optimizing both performance and power, achieving on an average 36% reduction in latency, 13% savings in power (up to 24% at high load), and 40% savings (up to 70% at high load) in EDP. With application benchmarks, we observe IPC improvement up to 23% using our design. Our analysis shows FreqBoost to be the most robust scheme amongst the three schemes when reliability is a concern.

international symposium on nanoscale architectures | 2008

Reconfigurable BDD based quantum circuits

Soumya Eachempati; Vinay Saripalli; Narayanan Vijaykrishnan; Suman Datta

We propose a novel binary decision diagram (BDD) based reconfigurable logic architecture based on split-gate quantum nanodots using III-V compound semiconductor-based quantum wells. While BDD based quantum devices architectures have already been demonstrated to be attractive for achieving ultra-low power operation, our design provides the ability to reconfigure the functionality of the logic architecture. This work proposes device and architectural innovations to support such reconfiguration. At the device level, a unique programmability feature is incorporated in our proposed nanodot devices which can operate in 3 distinct operation modes: a) active b) open and c) short mode based on the split gate bias voltages and enable functional reconfiguration. At the architectural level, we address programmability and design fabric issues involved with mapping BDDpsilas into a reconfigurable architecture. By mapping a set of logic circuits, we demonstrate that our underlying device and architectural structure is flexible to support different functions.

design automation conference | 2011

Automated mapping for reconfigurable single-electron transistor arrays

Yung-Chih Chen; Soumya Eachempati; Chun-Yao Wang; Suman Datta; Yuan Xie; Vijaykrishnan Narayanan

Reducing power consumption has become one of the primary challenges in chip design, and therefore significant efforts are being devoted to find holistic solutions on power reduction from the device level up to the system level. Among a plethora of low power devices that are being explored, single-electron transistors (SETs) at room temperature are particularly attractive. Although prior work has proposed a binary decision diagram-based reconfigurable logic architecture using SETs, it lacks an automated synthesis tool for the device. Consequently, in this work, we develop a product-term-based approach that synthesizes a logic circuit by mapping all its product terms into the SET architecture. The experimental results show the effectiveness and efficiency of the proposed approach on a set of MCNC benchmarks.

ACM Journal on Emerging Technologies in Computing Systems | 2013

A Synthesis Algorithm for Reconfigurable Single-Electron Transistor Arrays

Yung-Chih Chen; Soumya Eachempati; Chun-Yao Wang; Suman Datta; Yuan Xie; Vijaykrishnan Narayanan

Reducing power consumption has become one of the primary challenges in chip design, and therefore significant efforts are being devoted to find holistic solutions on power reduction from the device level up to the system level. Among a plethora of low power devices that are being explored, single-electron transistors (SETs) at room temperature are particularly attractive. Although prior work has proposed a binary decision diagram-based reconfigurable logic architecture using SETs, it lacks an automatic synthesis algorithm for the architecture. Consequently, in this work, we develop a product-term-based approach that synthesizes a logic circuit by mapping all its product terms into the SET architecture. The experimental results show the effectiveness and efficiency of the proposed approach on a set of MCNC benchmarks.

asia and south pacific design automation conference | 2010

Optimizing power and performance for reliable on-chip networks

Aditya Yanamandra; Soumya Eachempati; Niranjan Soundararajan; Vijaykrishnan Narayanan; Mary Jane Irwin; Ramakrishnan Krishnan

We propose novel techniques to minimize the power and performance penalties in protecting the NoC against soft errors, while giving desired reliability guarantees. Some applications have inherent error tolerance which can be exploited to save power, by turning off the error correction mechanisms for a fraction of the total time without trading off reliability. To further increase the power savings, we bound the vulnerability of a router by throttling the traffic into the router. In order to minimize the throughput loss due to throttling, we propose dividing the die into domains and using multiple vulnerability bounds across these domains. We explore both static and dynamic selection of vulnerability bounds. We find that for applications with an error tolerance of 10% of the raw error rate, the dynamic multiple vulnerability bound scheme can save up to 44% of power expended for error correction at a marginal network throughput loss of 3%.

Archive | 2011

Leveraging Emerging Technology Through Architectural Exploration for the Routing Fabric of Future FPGAs

Soumya Eachempati; Aman Gayasen; Narayanan Vijaykrishnan; Mary Jane Irwin

Field-programmable gate arrays (FPGAs) have become very popular in recent times. With their regular structures, they are particularly amenable to scaling to smaller technologies. They form an excellent platform for studying emerging technologies. Recently, there have been significant advances in nanoelectronics fabrication that make them a viable alternative to CMOS. One such promising alternative is bundles of single-walled carbon nanotube (SWCNT).

Iet Circuits Devices & Systems | 2009

Predicting the performance and reliability of future field programmable gate arrays routing architectures with carbon nanotube bundle interconnect

Soumya Eachempati; Narayanan Vijaykrishnan; Arthur Nieuwoudt; Yehia Massoud

The authors investigate the performance and reliability of routing architectures in field programmable gate arrays (FPGA) that utilise bundles of single-walled carbon nanotubes (SWCNT) as wires in the FPGA interconnect fabric in future process technologies here. To leverage the performance advantages of nanotube-based interconnect, we explore several important aspects of the FPGA routing architecture including the wire length segmentation distribution and the switch/connection block configurations. The authors also investigate the impact of statistical variations in interconnect properties on FPGA timing yield. The results demonstrate that FPGAs utilising SWCNT bundle interconnect can achieve up to a 54% improvement in area-delay product over the best performing architecture with standard copper interconnect in 22 nm process technology. Furthermore, FPGAs implemented using SWCNT-based interconnect can provide a superior performance-yield trade-off of up to 43% over FPGAs implemented using traditional copper interconnect in future process technologies.

Explore More