Sumeet S. Kumar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sumeet S. Kumar is active.

Explore More

Publication

Featured researches published by Sumeet S. Kumar.

international conference on design and technology of integrated systems in nanoscale era | 2011

A 3D Network-on-Chip for stacked-die transactional chip multiprocessors using Through Silicon Vias

Sumeet S. Kumar; Rene van Leuken

Effective utilization of computing power offered by modern chip multiprocessors (CMP) depends on the design and performance of the interconnect that connects them. We present a three-dimensional Network-on-Chip (NoC) based on the R3 router architecture for transactional CMPs utilizing advanced Through Silicon Vias (TSV) in a stacked-die architecture, facilitating low latency and high throughput communication between CMP nodes. We report the performance of an R3 based three-dimensional mesh in a stacked-die transactional CMP highlighting the limitations of performance scale-up with stacking. Furthermore, we present data on area penalty associated with the use of TSVs in different configurations in 90nm UMC technology.

IEEE Transactions on Very Large Scale Integration Systems | 2014

System Level Methodology for Interconnect Aware and Temperature Constrained Power Management of 3-D MP-SOCs

Sumeet S. Kumar; Arnica Aggarwal; Radhika Sanjeev Jagtap; Amir Zjajo; Rene van Leuken

Modern 3-D multiprocessor systems-on-chip (MP-SoC) incorporate processing elements (PEs) and memories within die-stacks interconnected using through-silicon vias (TSVs). The resulting power density of these systems necessitates the inclusion of thermal effects in the architecture space exploration stage of the design process. The number and placement of TSVs influences the thermal conductivity in the vertical direction in die-stacks, and consequently these must be considered during thermal analysis. However, the special requirement of keep out zones (KOZs) for TSVs due to mechanical stress considerations complicates the design of the vertical interconnect, potentially impacting its electrical performance as well. This paper presents an integrated methodology that allows for TSV topology exploration to evaluate the best vertical interconnect structure while considering crosstalk, area overheads, and KOZ requirements using an initial system floorplan. After incorporating feedback from the exploration, the resulting vertical interconnect is included within a temperature-power simulation that estimates the thermal profile of the 3-D stack. Within this methodology, a novel power management scheme for 3-D MP-SoCs that considers both temperature as well as positional information and thermal relationships between PEs, while performing dynamic voltage-frequency scaling (DVFS), is introduced. The scheme effectively maintains smooth temperature profiles, decreases fluctuations in voltage-frequency levels, and increases the aggregate frequency of operation at a lower total power dissipation. Further, the scheme is applied to a stack partitioned into voltage islands, where it is shown to match the conventional per-core DVFS schemes in its performance.

international symposium on computing and networking | 2013

Low Overhead Message Passing for High Performance Many-Core Processors

Sumeet S. Kumar; Mitzi Tijin A. Djie; Rene van Leuken

Many-core processors provide the raw computation power required by modern high-performance multimedia and signal processing workloads. The translation of this into execution performance is often constrained by the overheads of communication between concurrent tasks. This paper presents Pronto, a low overhead message passing system which simplifies the semantics of data movement between communicating tasks by performing buffer management, message synchronization and address translation directly in hardware. The integration of these functions into hardware results in transfer latencies up to 30% shorter than state of the art MPI derivatives. The overheads for communication in a 16-core processor array are under 5% for 64-word burst transfers with Pronto using workloads such as the JPEG decoder and FIR filter. Furthermore, this paper also studies the effect of task mapping and interconnect traffic on the predictability of data block arrival times, and illustrates a method to reduce variations.

parallel, distributed and network-based processing | 2015

Ctherm: An Integrated Framework for Thermal-Functional Co-simulation of Systems-on-Chip

Sumeet S. Kumar; Amir Zjajo; Rene van Leuken

This paper presents therm, an integrated framework for cycle-accurate thermal and functional evaluation of systems-on-chip. The presented framework enables accurate characterization of thermal behaviour by generating detailed physical models for components based on input specifications, and simulating them within a tightly integrated co-simulation platform with an embedded thermal simulator. Therms fine-grained modelling approach yields 70% higher accuracy in hotspot resolution as compared to conventional approaches that abstract component internals. Simulation runtime time is reduced by up to 36% over conventional continuous approaches through the use of thermal check pointing, enabling the fast-forwarding of thermal simulations without loss of thermal continuity.

digital systems design | 2012

A Methodology for Early Exploration of TSV Placement Topologies in 3D Stacked ICs

Radhika Sanjeev Jagtap; Sumeet S. Kumar; Rene van Leuken

As planar scaling to achieve higher chip integration seems to be on the brink of saturation, three-dimensional (3D) integration has emerged as a promising technology. It is critical to have efficient early stage estimation methodologies to build high performance digital systems as well as to shorten design time. In this paper, a novel methodology is proposed which takes into account key physical effects and explores Through-Silicon-Via (TSV) placement topologies for a 2-tier 3D stack. It estimates the interconnect electrical performance and TSV area penalty across two TSV performance corners. The methodology offers flexibility in selection of the CMOS technology node and the 3D stacking level. A SystemC implementation provides for parameterizability and modeling with ease and also enables integration into a high-level system simulation framework. Using our methodology, TSV placement topologies were explored for a 7-port 3D router. Our results present optimal topologies for the router for typical 45 nm and 32 nm technology nodes. They also point out unreliable topologies and give important feedback for 3D system design.

international conference on electronic packaging technology | 2010

Effects of crosstalk and simultaneous switching noise on high performance digital system packages

Sumeet S. Kumar; Gokulraj Chandramohan; Willem van Driel; G.Q. Zhang

The non-ideal nature of package level interconnects gives rise to issues such as crosstalk induced noise and simultaneous switching noise which affect the reliable operation of high performance digital systems. We examine the causes of various signal losses that occur in the package level interconnect, and with an equivalent circuit model, highlight the effects of crosstalk induced noise and detail methods to mitigate its effects. The causes of simultaneous switching noise are also examined and techniques for better design of the power delivery network are suggested with the use of decoupling capacitors and shielding lines.

international symposium on system on chip | 2017

Energy-efficient neuromorphic receptors for wide-range temporal patterns of post-synaptic responses

Xuefei You; Amir Zjajo; Sumeet S. Kumar; Rene van Leuken

In a neuromorphic integrated circuit synaptic dynamics are of great importance to capture accurate neural behaviors. In this paper, we propose a current-based synapse design mediated with multiple receptor types, namely AMPA, NMDA and GABAa, and a weight-dependent learning algorithm. Due to various biological conducting mechanisms, the receptors demonstrate different kinetics in response to stimulus. The designed circuit offers distinctive features of receptors as well as the joint synaptic function. An increased computation ability is verified through synchrony detection in a two-layer recurrent network of synapse clusters. The design implemented in TSMC 65 nm CMOS technology consumes 1.92, 3.36, 1.11 and 35.22 pJ per spike event of energy for AMPA, NMDA, GABAa receptors and the advanced learning circuit, respectively.

international symposium on circuits and systems | 2015

Physical characterization of steady-state temperature profiles in three-dimensional integrated circuits

Sumeet S. Kumar; Amir Zjajo; Rene van Leuken

The thermal performance of three-dimensional integrated circuits is influenced by a number of design and technology parameters. However, the relationship between these parameters and thermal behaviour of die stacks is complex and not well understood. In this paper, we perform a detailed evaluation of the influence of stack composition and depth, thickness of dies, physical location of power dissipating elements and stack power density on steady-state temperature profiles. We examine how each of these parameters affects heat spread within 3D ICs, and highlight the causes for hotspot formation. The results of our analysis illustrate the implications of effective thermal conductivity on temperature sensing zones on dies, and the significant impact of stack power density on overall operating temperature.

international symposium on computing and networking | 2014

Cache Balancer: Access Rate and Pain Based Resource Management for Chip Multiprocessors

Jurrien de Klerk; Sumeet S. Kumar; Rene van Leuken

This paper presents a runtime resource management scheme named Cache Balancer that improves the utilization of on-chip shared caches and reduces access latencies in chip multiprocessor systems. Cache Balancer incorporates an access rate based memory allocator that improves utilization of on-chip cache resources resulting in up to 60% lower contention at cache banks. Furthermore, it uses information regarding the memory access characteristics of application tasks in order to obtain an optimal task mapping at runtime, and consequently achieves up to 22% lower execution times as compared to existing proposals.

international symposium on circuits and systems | 2014

Improving data cache performance using Persistence Selective Caching

Sumeet S. Kumar; Rene van Leuken

This paper presents Persistence Selective Caching (PSC), a selective caching scheme that tracks the reusability of L1 data cache (L1D) lines at runtime, and moves lines with sufficient potential for reuse to a low-latency, low-energy assist cache from where subsequent references to them are serviced. The selectivity of PSC is configurable, and can be adjusted to suit the varying memory access characteristics of different applications, unlike existing schemes. By effectively identifying reusable cache lines and storing them in the assist, PSC reduces average memory access time by upto 59% as compared to competing schemes and conventional data caches. Furthermore, by ensuring that only reusable lines are cached by the assist, PSC reduces cache line movements, and thus decreases average energy per access by upto 75% over other assists.

Explore More