Sudhir Satpathy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sudhir Satpathy is active.

Explore More

Publication

Featured researches published by Sudhir Satpathy.

international symposium on computer architecture | 2013

Catnap: energy proportional multiple network-on-chip

Reetuparna Das; Satish Narayanasamy; Sudhir Satpathy; Ronald G. Dreslinski

Multiple networks have been used in several processor implementations to scale bandwidth and ensure protocol-level deadlock freedom for different message classes. In this paper, we observe that a multiple-network design is also attractive from a power perspective and can be leveraged to achieve energy proportionality by effective power gating. Unlike a single-network design, a multiple-network design is more amenable to power gating, as its subnetworks (subnets) can be power gated without compromising the connectivity of the network. To exploit this opportunity, we propose the Catnap architecture which consists of synergistic subnet selection and power-gating policies. Catnap maximizes the number of consecutive idle cycles in a router, while avoiding performance loss due to overloading a subnet. We evaluate a 256-core processor with a concentrated mesh topology using synthetic traffic and 35 applications. We show that the average network power of a power-gating optimized multiple-network design with four subnets could be 44% lower than a bandwidth equivalent single-network design for an average performance cost of about 5%.

international solid-state circuits conference | 2012

Centip3De: A 3930DMIPS/W configurable near-threshold 3D stacked system with 64 ARM Cortex-M3 cores

David Fick; Ronald G. Dreslinski; Bharan Giridhar; Gyouho Kim; Sangwon Seo; Matthew Fojtik; Sudhir Satpathy; Yoonmyung Lee; Daeyeon Kim; Nurrachman Liu; Michael Wieckowski; Gregory K. Chen; Trevor N. Mudge; Dennis Sylvester; David T. Blaauw

Recent high performance IC design has been dominated by power density constraints. 3D integration increases device density even further, and these devices will not be usable without viable strategies to reduce power consumption. This paper proposes the use of near-threshold computing (NTC) to address this issue in a stacked 3D system. In NTC, cores are operated near the threshold voltage (~200mV above Vth) to optimally balance power and performance [1]. In Centip3De, we operate cores at 650mV, as opposed to the wear-out limited supply voltage of 1.5V. This improves measured energy efficiency by 5.1×. The dramatically lower power consumption of NTC makes it an attractive match for 3D design, which has limited power dissipation capabilities, but also has improved innate power and performance compared to 2D design.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2012

Swizzle-Switch Networks for Many-Core Systems

Korey Sewell; Ronald G. Dreslinski; Thomas Manville; Sudhir Satpathy; Nathaniel Ross Pinckney; Geoffrey Blake; Michael Cieslak; Reetuparna Das; Thomas F. Wenisch; Dennis Sylvester; David T. Blaauw; Trevor N. Mudge

This work revisits the design of crossbar and high-radix interconnects in light of advances in circuit and layout techniques that improve crossbar scalability, obviating the need for deep multi-stage networks. We employ a new building block, the Swizzle-Switch-an energy and area-efficient switching element that can readily scale to radix 64-that has recently been validated via silicon test chips in 45 nm technology. We evaluate the Swizzle-Switch as both the high-radix building block of a Flattened Butterfly and as a single-stage interconnect, the Swizzle-Switch Network. In the process we address the architectural and layout challenges associated with centralized crossbar systems. Compared to a conventional Mesh, the Flattened Butterfly provides a 15% performance improvement with a 2.5× reduction in the standard deviation of on-chip access times. The Swizzle-Switch Network achieves further gains, providing a 21% improvement in performance, a 3× reduction in on-chip access variability, a 33% reduction in interconnect power, and a 25% reduction in total system energy while only increasing chip area by 7%. Finally, this paper details a 3-D integrated version of the Swizzle-Switch Network, showing up to a 30% gain in performance over the 2-D Swizzle-Switch Network for benchmarks sensitive to interconnect latency. One major concern with 3-D designs is thermal dissipation. We show through detailed thermal analysis that with the highly energy-efficient Swizzle-Switch Network design that the thermal budget is well within that of passive cooling solutions.

IEEE Journal of Solid-state Circuits | 2013

Centip3De: A Cluster-Based NTC Architecture With 64 ARM Cortex-M3 Cores in 3D Stacked 130 nm CMOS

We present Centip3De, a large-scale 3D CMP with a cluster-based near-threshold computing (NTC) architecture. Centip3De uses a 3D stacking technology in conjunction with 130 nm CMOS. Measured results for a two-layer, 64-core system are discussed, with the system achieving 3930 DMIPS/W energy efficiency, which is >; 3x improvement over traditional operation at full supply voltage. This project demonstrates the feasibility of large-scale 3D design, a synergy between 3D and NTC architectures, a unique cluster-based NTC cache design, and how to maximize performance in a thermally-constrained design.

IEEE Micro | 2013

Centip3De: A 64-Core, 3D Stacked Near-Threshold System

Ronald G. Dreslinski; David Fick; Bharan Giridhar; Gyouho Kim; Sangwon Seo; Matthew Fojtik; Sudhir Satpathy; Yoonmyung Lee; Daeyeon Kim; Nurrachman Liu; Michael Wieckowski; Gregory K. Chen; Dennis Sylvester; David T. Blaauw; Trevor N. Mudge

Centip3De uses the synergy between 3D integration and near-threshold computing to create a reconfigurable system that provides both energy-efficient operation and techniques to address single-thread performance bottlenecks. The original Centip3De design is a seven-layer 3D stacked design with 128 cores and 256 Mbytes of DRAM. Silicon results show a two-layer, 64-core system in 130-nm technology, which achieved an energy efficiency of 3,930 DMIPS/W.

international solid-state circuits conference | 2012

A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS

Sudhir Satpathy; Korey Sewell; Thomas Manville; Yen-Po Chen; Ronald G. Dreslinski; Dennis Sylvester; Trevor N. Mudge; David T. Blaauw

High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSNs key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.

design, automation, and test in europe | 2011

Low power interconnects for SIMD computers

Mark Woh; Sudhir Satpathy; Ronald G. Dreslinski; Danny Kershaw; Dennis Sylvester; David T. Blaauw; Trevor N. Mudge

Driven by continued scaling of Moores Law, the number of processing elements on a die are increasing dramatically. Recently there has been a surge of wide single instruction multiple data architectures designed to handle computationally intensive applications like 3D graphics, high definition video, image processing, and wireless communication. A limit of the SIMD width of these types of architectures is the scalability of the interconnect network between the processing elements in terms of both area and power. To mitigate this problem, we propose the use of a new interconnect topology, XRAM, which is a low power high performance matrix style crossbar. It re-uses output buses for control programming, and stores multiple swizzle configurations at the cross points using SRAM cells, significantly reducing routing congestion and control signaling. We show that compared to conventionally implemented crossbars, the area scales with the product of inputx output ports while consuming almost 50% less energy. We present an application case study, color-space conversion, utilizing XRAM and show a 1.4× gain in performance while consuming 1.5–2.5× less power.

symposium on vlsi circuits | 2010

A 1.07 Tbit/s 128×128 swizzle network for SIMD processors

Sudhir Satpathy; Zhiyoong Foo; Bharan Giridhar; Ronald G. Dreslinski; Dennis Sylvester; Trevor N. Mudge; David T. Blaauw

A novel circuit switched swizzle network called XRAM is presented. XRAM uses an SRAM-based approach producing a compact footprint that scales well with network dimensions while supporting all permutations and multicasts. Capable of storing multiple shuffle configurations and aided by a novel sense-amp for robust bit-line evaluation, a 128×128 XRAM fabricated in 65nm achieves a bandwidth exceeding 1Tbit/s, enabling a 64-lane SIMD engine operating at 0.72V to save 46.8% energy over an iso-throughput conventional 16-lane implementation at 1.1V.

design automation conference | 2012

High radix self-arbitrating switch fabric with multiple arbitration schemes and quality of service

Sudhir Satpathy; Reetuparna Das; Ronald G. Dreslinski; Trevor N. Mudge; Dennis Sylvester; David T. Blaauw

A scalable architecture to design high radix switch fabric is presented. It uses circuit techniques to re-use existing input and output data buses and switching logic for fabric configuration and supporting multiple arbitration policies. In addition, it integrates a 4-level message-based priority arbitration for quality of service. Fine grain clock gating, tiled fabric topology and self-regenerating bit-line repeaters enable scaling the router to 8k wires. A 64×64(128b data) switch fabric fabricated in 45nm SOI CMOS spans 4.06mm2 and achieves a throughput of 4.5Tb/s at 3.4Tb/s/W at 1.1V with a peak measured efficiency of 7.4Tb/s/W at 0.6V.

ieee hot chips symposium | 2012

Swizzle Switch: A self-arbitrating high-radix crossbar for NoC systems

Ronald G. Dreslinski; Korey Sewell; Thomas Manville; Sudhir Satpathy; Nathaniel Ross Pinckney; Geoff Blake; Michael Cieslak; Reetuparna Das; Thomas F. Wenisch; Dennis Sylvester; David T. Blaauw; Trevor N. Mudge

This article consists of a collection of slides from the authors conference presentation on Swizzle Switch networks for use in many-core systems. Some of the specific topics discussed include: the special features and design of swizzle-switch networks; data routing capabilities; system architecture; processing capabilities; interconnects; and network evaluation techniques.

Explore More