Sumit Dharampal Mediratta

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sumit Dharampal Mediratta is active.

Explore More

Publication

Featured researches published by Sumit Dharampal Mediratta.

international symposium on circuits and systems | 2007

Characterization of a Fault-tolerant NoC Router

Sumit Dharampal Mediratta; Jeffrey Draper

With increasing reliability concerns for current and next generation VLSI technologies, fault-tolerance is fast becoming an integral part of system-on-chip (SoC) and multi-core architectures. Another concern for these architectures is increasing global wire lengths with associated issues leading to network-on-chips (NoC) becoming standard for on-chip global communication. We recognize these issues and present an on-chip generic fault-tolerant routing algorithm. The microarchitecture of a NoC router implementing the proposed routing algorithm for a k-ary 2-cube topology is provided. The proposed router works in two phases. In the first phase, the network is explored for an existing path between source-destination pairs after reset or during system reconfiguration after fault detection. Existing paths are cached and used in the second phase of data communication during normal system operation. The presented router architecture also proposes a concept of dynamic multiplexing of virtual channels on physical channels to efficiently utilize physical channel bandwidth. The above approaches complement each other and when combined together, result in an efficiently realizable high-performance NoC fault-tolerant router. An implementation characterization of this k-ary 2-cube torus router in terms of area, power and critical path delay in IBM Cu-08 technology is presented, along with bandwidth and latency characterization for relevant cases.

international workshop on computer architecture for machine perception | 2007

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router

Sumit Dharampal Mediratta; Jeffrey Draper

With increasing reliability concerns for current and next generation VLSI technologies, fault-tolerance is fast becoming an integral part of system-on-chip and multi-core architectures. Another trend for such architectures is network-on-chip (NoC) becoming a standard for on-chip global communication. In an earlier work, a generic fault-tolerant routing algorithm in the context of NoCs has been presented. The proposed routing algorithm works in two phases, namely path exploration (PE) and normal communication. This paper presents fundamental insights into various novel PE approaches, their feasibility and performance trade-offs for k-ary 2-cube NoCs. The dependence of the normal communication phase on the probability of finding paths and their quality in the first phase emphasizes the PEs significance. One major contribution of this work is the investigation of application of constrained randomness to PE for optimizing the quality of paths. Another contribution is the proposed use of merging of traffic to reduce the reconfiguration time by a large amount (73.8% on an average).

signal processing systems | 2005

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Jeffrey Draper; Tim Barrett; Jeff Sondeen; Sumit Dharampal Mediratta; Chang Woo Kang; Ihn Kim; Gokhan Daglikoca

The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smart-memory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project has built a prototype development system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently ported several demonstration kernels to this platform and have exhibited a speedup of 35X on a matrix transpose operation.This paper focuses on the 32-bit scalar and 256-bit WideWord integer processing components of the first DIVA prototype PIM chip, which was fabricated in TSMC 0.18 μm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little “smart” logic added to memory devices. A second PIM prototype that includes WideWord floating-point capability is scheduled to tape out in August 2003.

international midwest symposium on circuits and systems | 2006

On-chip Fault-tolerance Utilizing BIST Resources

Sumit Dharampal Mediratta; Jeffrey Draper

Recent and projected advances in VLSI fabrication technology will allow for integration of billions of transistors and advanced architectures on a single chip. According to the International Technology Roadmap for Semiconductors (ITRS), widespread reliability challenges are expected for these VLSI fabrication technologies (65 nm and below). Effective and efficient on-chip fault-tolerance solutions are needed. A new approach of achieving on-chip fault-tolerance using built-in-self-test (BIST) is proposed in this paper. The proposed approach reduces production cost, implementation overhead and time-to-market; increases reusability, post-fabrication reconfigurability and productivity; and is scalable across multiple VLSI processes and feature sizes. This will result in obvious advantages of yield enhancement and prolonged lifetime of VLSI chips as well.

international conference on vlsi design | 2004

An area-efficient router for the Data-Intensive Architecture (DIVA) system

Sumit Dharampal Mediratta; Jeff Sondeen; Jeffrey Draper

A key component of the Data-Intensive Architecture (DIVA) is the Processing-In-Memory (PIM) Routing Component (PiRC) that is responsible for efficient communication between PIM chips. This paper presents the design of a low area, delay and power router for DIVA. A 58.5% saving in area and 86% reduction in load on the clock as compared to an earlier PIM router design makes the presented design ideal for use in the second version of DIVA, with low area being a critical design requirement for DIVA. This paper also gives a comparison of the presented design with an earlier PIM router design in terms of delay and power to justify the new design choice.

international symposium on circuits and systems | 2006

A double-data rate (DDR) processing-in-memory (PIM) device with wideword floating-point capability

Tim Barrett; Sumit Dharampal Mediratta; Taek-Jun Kwon; Ravinder Singh; Sachit Chandra; Jeff Sondeen; Jeffrey Draper

The data-intensive architecture (DIVA) system incorporates processing-in-memory (PIM) chips as smart-memory coprocessors to a microprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications. A recently developed PIM chip in TSMC 0.18mum technology incorporates a DDR SDRAM interface for its inclusion in commodity systems, such as the HP zx6000 workstation used on this project. Each PIM chip includes eight single-precision floating-point units (FPU) in the wideword pipeline, enabling significant speedups in the target system. This paper focuses on the integration of new subcomponents into the PIM chip design, system integration, and measured system results, demonstrating the significant GFLOP/W feature offered by PIM computing

international symposium on circuits and systems | 2005

An area-efficient and protected network interface for processing-in-memory systems

Sumit Dharampal Mediratta; Craig S. Steele; Jeff Sondeen; Jeffrey Draper

This paper describes the implementation of an area-efficient and protected user memory-mapped network interface, the pbuf (parcel buffer), for the data intensive architecture (DIVA) processing-in-memory (PIM) system. This implementation of the pbuf in TSMC 0.18 /spl mu/m CMOS technology displays an aggregate bi-directional throughput of 48.08 Gbps, using low area (0.56 mm/sup 2/) and power consumption (32.30 mW). These characteristics, especially the low area and power, have made the current implementation an ideal choice for assimilation in DIVA PIM systems, since low area and power are critical design requirements in the PIM philosophy. The pbuf implementation has been verified by the execution of a 2-PIM transitive closure benchmark at 140 MHz on an HP Itanium2-based Longs Peak server containing DIMMs populated with DIVA-H PIM chips.

ieee international conference on high performance computing data and analytics | 2005

Performance analysis of user-level PIM communication in the data intensive architecture (DIVA) system

Sumit Dharampal Mediratta; Jeffrey Draper

The performance of user-level messaging in PIM (Processing-In-Memory) to PIM communication is modeled and analyzed for the DIVA (Data IntensiVe Architecture) system. Six benchmarks have been used for this purpose, two from each category, namely single message transfer, parallel transfer and collective communication, as described for the PMB (Pallas MPI Benchmarks). The benchmarks used are PingPong, PingPing, SendReceive, Exchange, Barrier synchronization and AllToAll personalized exchange. The main significance of this work lies in the evaluation of an implementation of system-wide support for memory-to-memory and memory-to-host communi-cation via a parcel buffer (used as a network interface). Another remarkable feature of this evaluation lies in presenting an optimal algorithm for Barrier synchronization and an optimal algorithm, with full channel utilization, for AllToAll personalized exchange for the bi-directional ring configuration of up to 8 DIVA PIMs in the memory system of a Hewlett-Packard’s zx6000 server. The algorithms presented can be scaled for higher number of PIM chips with a little degradation in performance over the optimal solution. Our analysis shows that the currently employed communication mechanism can be used very efficiently for collective communication operations, and it also exposes the bottlenecks in the current design for future improvements.

midwest symposium on circuits and systems | 2004

A 0.18/spl mu/m CMOS implementation of an area efficient precise exception handling unit for processing-in-memory systems

Sumit Dharampal Mediratta; Craig S. Steele; Ravinder Singh; Jeff Sondeen; Jeffrey Draper

This paper describes the implementation of theexception handling mechanism in the second prototype version of the Data-Intensive Architecture (DIVA) processing-inmemory (PIM) chip. This implementation features architectural simplicity, low area (54289 p 2 ) , delay (2.643 nanosecond) and power consumption (7.6 milliwatts), and effective hardware support for complex cases of exception handling. This work provides a description of handling memory-access, execution and communication-related exceptions in an area- and powerefficient manner, which are key design specifications for DIVA. The current implementation has been tested by verifying various exceptions on DIVA-I1 PIM chips running at 140MHz in the memory system of a HP Itanium2-based Long¿s Peak server. The generic nature of the DIVA exceptions and their classification makes the current implementation suitable and easy for use in diverse microarchitectures with little modification.

application-specific systems, architectures, and processors | 2002