Shakti Kapoor | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shakti Kapoor is active.

Explore More

Publication

Featured researches published by Shakti Kapoor.

design automation conference | 2010

Bridging pre-silicon verification and post-silicon validation

Amir Nahir; Avi Ziv; Miron Abramovici; Albert Camilleri; Rajesh Galivanche; Bob Bentley; Harry Foster; Alan J. Hu; Valeria Bertacco; Shakti Kapoor

Post-silicon validation is a necessary step in a designs verification process. Pre-silicon techniques such as simulation and emulation are limited in scope and volume as compared to what can be achieved on the silicon itself. Some parts of the verification, such as full-system functional verification, cannot be practically covered with current pre-silicon technologies. This panel brings together experts from industry, academia, and EDA to review the differences and similarities between pre- and post-silicon, discuss how the fundamental aspects of verification are affected by these differences, and explore how the gaps between the two worlds can be bridged.

ieee international conference on high performance computing data and analytics | 2007

Optimization of collective communication in intra-cell MPI

M. K. Velamati; Arun Kumar; Naresh Jayam; Ganapathy Senthilkumar; Pallav Kumar Baruah; Raghunath Sharma; Shakti Kapoor; Ashok Srinivasan

The Cell is a heterogeneous multi-core processor, which has eight coprocessors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.

international conference on conceptual structures | 2007

A Buffered-Mode MPI Implementation for the Cell BETM Processor

Arun Kumar; Ganapathy Senthilkumar; Murali Krishna; Naresh Jayam; Pallav Kumar Baruah; Raghunath Sharma; Ashok Srinivasan; Shakti Kapoor

The Cell Broadband EngineTMis a heterogeneous multi-core architecture developed by IBM, Sony and Toshiba. It has eight computation intensive cores (SPEs) with a small local memory, and a single PowerPC core. The SPEs have a total peak single precision performance of 204.8 Gflops/s, and 14.64 Gflops/s in double precision. Therefore, the Cell has a good potential for high performance computing. But the unconventional architecture makes it difficult to program. We propose an implementation of the core features of MPI as a solution to this problem. This can enable a large class of existing applications to be ported to the Cell. Our MPI implementation attains bandwidth up to 6.01 GB/s, and latency as small as 0.41 μs. The significance of our work is in demonstrating the effectiveness of intra-Cell MPI, consequently enabling the porting of MPI applications to the Cell with minimal effort.

design automation conference | 2014

Post-Silicon Validation of the IBM POWER8 Processor

Amir Nahir; Manoj Dusanapudi; Shakti Kapoor; Kevin Franklin Reick; Wolfgang Roesner; Klaus-Dieter Schubert; Keith Sharp; Greg Wetli

The post-silicon validation phase in a processors design life cycle is geared towards finding all remaining bugs in the system. It is, in fact, our last opportunity to find functional and electrical bugs in the design before shipping it to customers. In this paper, we provide a high-level overview of the methodology and technologies put into use as part of the POWER8 post-silicon functional validation phase. We describe the results and list the primary factors that contributed to this highly successful bring-up.

acm symposium on parallel algorithms and architectures | 2007

Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Arun Kumar; Naresh Jayam; Ashok Srinivasan; Ganapathy Senthilkumar; Pallav Kumar Baruah; Shakti Kapoor; Murali Krishna; Raghunath Sarma

The Cell Broadband Engine™ is a new heterogeneous multi-core processor from IBM, Sony, and Toshiba. It contains eight co-processors, called Synergistic Processing Elements (SPEs), which operate directly on distinct 256 KB local stores, and also have access to a shared 512 MB to 2 GB main memory. The combined peak speed of the SPEs is 204.8 Gflop/s in single precision and 14.64 Gflop/s in double precision. There is, therefore, much interest in using the Cell BE™ for high performance computing applications. However, the unconventional architecture of the SPEs, in particular their local stores, creates some programming challenges. We describe our implementation of certain core features of MPI, such as blocking point-to-point calls and collective communication calls, which can help meet these challenges, by enabling a large class of MPI applications to be ported to the Cell BE™ processor. This implementation views each SPE as a node for an MPI process. We store the application data in main memory in order to avoid being limited by the local store size. The local store is abstracted in the library and thus hidden from the application with respect to MPI calls. We have achieved bandwidth up to 6.01 GB/s and latency as low as 0.41 ms on the ping-pong test. The contribution of this work lies in (i) demonstrating that the Cell BE™ has good potential for running intra-Cell BE™ MPI applications, (ii) enabling such applications to be ported to the Cell BE™ with minimal effort, and (iii) evaluating the performance impact of different design choices.

international symposium on parallel and distributed processing and applications | 2007

A synchronous mode MPI implementation on the cell BE TM architecture

Murali Krishna; Arun Kumar; Naresh Jayam; Ganapathy Senthilkumar; Pallav Kumar Baruah; Raghunath Sharma; Shakti Kapoor; Ashok Srinivasan

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multicore processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 µs on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

Ibm Journal of Research and Development | 2015

Debugging post-silicon fails in the IBM POWER8 bring-up lab

Manoj Dusanapudi; S. Fields; Michael Stephen Floyd; Guy Lynn Guthrie; Ronald Nick Kalla; Shakti Kapoor; Larry Scott Leitner; C. F. Marino; Joseph McGill; Amir Nahir; Kevin Franklin Reick; Hugh Shen; Kenneth L. Wright

Debugging post-silicon fails continues to be a difficult problem that is becoming even more challenging as chips integrate more functionality and implement increasingly complicated functions. Additionally, the complexity of hardware systems, coupled with the difficulty in observing the state of the system that led to the failure, make the debugging effort a unique challenge. In this paper, we review the techniques and mechanisms used to facilitate effective debugging in the POWER8™ processor post-silicon validation phase. We further describe several functional bugs and describe the debugging process that drove the identification of their root cause.

high level design validation and test | 2007

Challenges in post-silicon verification of IBM’s Cell/B.E. and other game processors

Shakti Kapoor

Recent IBM processors used in various computer systems including gaming systems are a very aggressive design, addressing three main challenges of the processor design -Memory wall, Power wall and ILP wall. To break these walls the some designs utilized multi threaded, multi core and yet high frequency. These kinds of designs increased the complexity of the test stream generation for processor verification especially in a stress test environment. Moreover Cell Broadband Engine TM (Cell/B.E.) utilizes heterogeneous multi core, multi threaded with high. This further increased the complexity of the verification. This paper describes some of the scenarios, the Post Silicon Verification team addressed in their effort of verification of the Cell/B.E. and other game processors.

Archive | 2007