Sumeet S. Kumar
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sumeet S. Kumar.
international conference on design and technology of integrated systems in nanoscale era | 2011
Sumeet S. Kumar; Rene van Leuken
Effective utilization of computing power offered by modern chip multiprocessors (CMP) depends on the design and performance of the interconnect that connects them. We present a three-dimensional Network-on-Chip (NoC) based on the R3 router architecture for transactional CMPs utilizing advanced Through Silicon Vias (TSV) in a stacked-die architecture, facilitating low latency and high throughput communication between CMP nodes. We report the performance of an R3 based three-dimensional mesh in a stacked-die transactional CMP highlighting the limitations of performance scale-up with stacking. Furthermore, we present data on area penalty associated with the use of TSVs in different configurations in 90nm UMC technology.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Sumeet S. Kumar; Arnica Aggarwal; Radhika Sanjeev Jagtap; Amir Zjajo; Rene van Leuken
Modern 3-D multiprocessor systems-on-chip (MP-SoC) incorporate processing elements (PEs) and memories within die-stacks interconnected using through-silicon vias (TSVs). The resulting power density of these systems necessitates the inclusion of thermal effects in the architecture space exploration stage of the design process. The number and placement of TSVs influences the thermal conductivity in the vertical direction in die-stacks, and consequently these must be considered during thermal analysis. However, the special requirement of keep out zones (KOZs) for TSVs due to mechanical stress considerations complicates the design of the vertical interconnect, potentially impacting its electrical performance as well. This paper presents an integrated methodology that allows for TSV topology exploration to evaluate the best vertical interconnect structure while considering crosstalk, area overheads, and KOZ requirements using an initial system floorplan. After incorporating feedback from the exploration, the resulting vertical interconnect is included within a temperature-power simulation that estimates the thermal profile of the 3-D stack. Within this methodology, a novel power management scheme for 3-D MP-SoCs that considers both temperature as well as positional information and thermal relationships between PEs, while performing dynamic voltage-frequency scaling (DVFS), is introduced. The scheme effectively maintains smooth temperature profiles, decreases fluctuations in voltage-frequency levels, and increases the aggregate frequency of operation at a lower total power dissipation. Further, the scheme is applied to a stack partitioned into voltage islands, where it is shown to match the conventional per-core DVFS schemes in its performance.
international symposium on computing and networking | 2013
Sumeet S. Kumar; Mitzi Tijin A. Djie; Rene van Leuken
Many-core processors provide the raw computation power required by modern high-performance multimedia and signal processing workloads. The translation of this into execution performance is often constrained by the overheads of communication between concurrent tasks. This paper presents Pronto, a low overhead message passing system which simplifies the semantics of data movement between communicating tasks by performing buffer management, message synchronization and address translation directly in hardware. The integration of these functions into hardware results in transfer latencies up to 30% shorter than state of the art MPI derivatives. The overheads for communication in a 16-core processor array are under 5% for 64-word burst transfers with Pronto using workloads such as the JPEG decoder and FIR filter. Furthermore, this paper also studies the effect of task mapping and interconnect traffic on the predictability of data block arrival times, and illustrates a method to reduce variations.
parallel, distributed and network-based processing | 2015
Sumeet S. Kumar; Amir Zjajo; Rene van Leuken
This paper presents therm, an integrated framework for cycle-accurate thermal and functional evaluation of systems-on-chip. The presented framework enables accurate characterization of thermal behaviour by generating detailed physical models for components based on input specifications, and simulating them within a tightly integrated co-simulation platform with an embedded thermal simulator. Therms fine-grained modelling approach yields 70% higher accuracy in hotspot resolution as compared to conventional approaches that abstract component internals. Simulation runtime time is reduced by up to 36% over conventional continuous approaches through the use of thermal check pointing, enabling the fast-forwarding of thermal simulations without loss of thermal continuity.
digital systems design | 2012
Radhika Sanjeev Jagtap; Sumeet S. Kumar; Rene van Leuken
As planar scaling to achieve higher chip integration seems to be on the brink of saturation, three-dimensional (3D) integration has emerged as a promising technology. It is critical to have efficient early stage estimation methodologies to build high performance digital systems as well as to shorten design time. In this paper, a novel methodology is proposed which takes into account key physical effects and explores Through-Silicon-Via (TSV) placement topologies for a 2-tier 3D stack. It estimates the interconnect electrical performance and TSV area penalty across two TSV performance corners. The methodology offers flexibility in selection of the CMOS technology node and the 3D stacking level. A SystemC implementation provides for parameterizability and modeling with ease and also enables integration into a high-level system simulation framework. Using our methodology, TSV placement topologies were explored for a 7-port 3D router. Our results present optimal topologies for the router for typical 45 nm and 32 nm technology nodes. They also point out unreliable topologies and give important feedback for 3D system design.
international conference on electronic packaging technology | 2010
Sumeet S. Kumar; Gokulraj Chandramohan; Willem van Driel; G.Q. Zhang
The non-ideal nature of package level interconnects gives rise to issues such as crosstalk induced noise and simultaneous switching noise which affect the reliable operation of high performance digital systems. We examine the causes of various signal losses that occur in the package level interconnect, and with an equivalent circuit model, highlight the effects of crosstalk induced noise and detail methods to mitigate its effects. The causes of simultaneous switching noise are also examined and techniques for better design of the power delivery network are suggested with the use of decoupling capacitors and shielding lines.
international symposium on system on chip | 2017
Xuefei You; Amir Zjajo; Sumeet S. Kumar; Rene van Leuken
In a neuromorphic integrated circuit synaptic dynamics are of great importance to capture accurate neural behaviors. In this paper, we propose a current-based synapse design mediated with multiple receptor types, namely AMPA, NMDA and GABAa, and a weight-dependent learning algorithm. Due to various biological conducting mechanisms, the receptors demonstrate different kinetics in response to stimulus. The designed circuit offers distinctive features of receptors as well as the joint synaptic function. An increased computation ability is verified through synchrony detection in a two-layer recurrent network of synapse clusters. The design implemented in TSMC 65 nm CMOS technology consumes 1.92, 3.36, 1.11 and 35.22 pJ per spike event of energy for AMPA, NMDA, GABAa receptors and the advanced learning circuit, respectively.
international symposium on circuits and systems | 2015
Sumeet S. Kumar; Amir Zjajo; Rene van Leuken
The thermal performance of three-dimensional integrated circuits is influenced by a number of design and technology parameters. However, the relationship between these parameters and thermal behaviour of die stacks is complex and not well understood. In this paper, we perform a detailed evaluation of the influence of stack composition and depth, thickness of dies, physical location of power dissipating elements and stack power density on steady-state temperature profiles. We examine how each of these parameters affects heat spread within 3D ICs, and highlight the causes for hotspot formation. The results of our analysis illustrate the implications of effective thermal conductivity on temperature sensing zones on dies, and the significant impact of stack power density on overall operating temperature.
international symposium on computing and networking | 2014
Jurrien de Klerk; Sumeet S. Kumar; Rene van Leuken
This paper presents a runtime resource management scheme named Cache Balancer that improves the utilization of on-chip shared caches and reduces access latencies in chip multiprocessor systems. Cache Balancer incorporates an access rate based memory allocator that improves utilization of on-chip cache resources resulting in up to 60% lower contention at cache banks. Furthermore, it uses information regarding the memory access characteristics of application tasks in order to obtain an optimal task mapping at runtime, and consequently achieves up to 22% lower execution times as compared to existing proposals.
international symposium on circuits and systems | 2014
Sumeet S. Kumar; Rene van Leuken
This paper presents Persistence Selective Caching (PSC), a selective caching scheme that tracks the reusability of L1 data cache (L1D) lines at runtime, and moves lines with sufficient potential for reuse to a low-latency, low-energy assist cache from where subsequent references to them are serviced. The selectivity of PSC is configurable, and can be adjusted to suit the varying memory access characteristics of different applications, unlike existing schemes. By effectively identifying reusable cache lines and storing them in the assist, PSC reduces average memory access time by upto 59% as compared to competing schemes and conventional data caches. Furthermore, by ensuring that only reusable lines are cached by the assist, PSC reduces cache line movements, and thus decreases average energy per access by upto 75% over other assists.