Debapriya Chatterjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Debapriya Chatterjee is active.

Explore More

Publication

Featured researches published by Debapriya Chatterjee.

design automation conference | 2009

Event-driven gate-level simulation with GP-GPUs

Debapriya Chatterjee; Andrew DeOrio; Valeria Bertacco

Logic simulation is a critical component of the design tool flow in modern hardware development efforts. It is used widely from high level descriptions down to gate level ones to validate several aspects of the design, particularly functional correctness. Despite development houses investing vast resources in the simulation task, particularly at the gate level, it is still far from achieving the performance demands required to validate complex modern designs. In this work, we propose the first event driven logic simulator accelerated by a parallel, general purpose graphics processor (GPGPU). Our simulator leverages a gate level event driven design to exploit the benefits of the low switching activity that is typical of large hardware designs. We developed novel algorithms for circuit netlist partitioning and optimized for a highly parallel GPGPU host. Moreover, our flow is structured to extract the best simulation performance from the target hardware platform. We found that our experimental prototype could handle large, industrial scale designs comprised of millions of gates and deliver a 13x speedup on average over current commercial event driven simulators.

design, automation, and test in europe | 2009

GCS: high-performance gate-level simulation with GP-GPUs

Debapriya Chatterjee; Andrew DeOrio; Valeria Bertacco

In recent years, the verification of digital designs has become one of the most challenging, time consuming and critical tasks in the entire hardware development process. Within this area, the vast majority of the verification effort in industry relies on logic simulation tools. However, logic simulators deliver limited performance when faced with vastly complex modern systems, especially synthesized netlists. The consequences are poor design coverage, delayed product releases and bugs that escape into silicon. Thus, we developed a novel GPU-accelerated logic simulator, called GCS, optimized for large structural netlists. By leveraging the vast parallelism offered by GP-GPUs and a novel netlist balancing algorithm tuned for the target architecture, we can attain an order-of-magnitude performance improvement on average over commercial logic simulators, and simulate large industrial-size designs, such as the OpenSPARC processor core design.

international conference on computer aided design | 2011

Simulation-based signal selection for state restoration in silicon debug

Debapriya Chatterjee; Calvin McCarter; Valeria Bertacco

Post-silicon validation has become a crucial part of modern integrated circuit design to capture and eliminate functional bugs that escape pre-silicon verification. The most critical roadblock in post-silicon validation is the limited observability of internal signals of a design, since this aspect hinders the ability to diagnose detected bugs. A solution to address this issue leverage trace buffers: these are register buffers embedded into the design with the goal of recording the value of a small number of state elements, over a time interval, triggered by a user-specified event. Due to the trace buffers area overhead, only a very small fraction of signals can be traced. Thus, the selection of which signals to trace is of paramount importance in post-silicon debugging and diagnosis. Ideally, we would like to select signals enabling the maximum amount of reconstruction of internal signal values. Several signal selection algorithms for post-silicon debug have been proposed in the literature: they rely on a probability-based state-restoration capacity metric coupled with a greedy algorithm. In this work we propose a more accurate restoration capacity metric, based on simulation information, and present a novel algorithm that overcomes some key shortcomings of previous solutions. We show that our technique provides up to 34% better state restoration compared to all previous techniques while showing a much better trend with increasing trace buffer size.

design automation conference | 2012

SAGA: SystemC acceleration on GPU architectures

Sara Vinco; Debapriya Chatterjee; Valeria Bertacco; Franco Fummi

SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. Previous efforts have attempted to parallelize SystemC simulation, targeting both multiprocessors and GPUs. However, for practical designs, those approaches fall far short of satisfactory performance. This paper proposes SAGA, a novel simulation approach that fully exploits the intrinsic parallelism of RTL SystemC descriptions, targeting GPU platforms. By limiting synchronization events with ad-hoc static scheduling and separate independent dataflows, we shows that we can simulate complex SystemC descriptions up to 16 times faster than traditional simulators.

ACM Transactions on Design Automation of Electronic Systems | 2011

Gate-Level Simulation with GPU Computing

Debapriya Chatterjee; Andrew DeOrio; Valeria Bertacco

Functional verification of modern digital designs is a crucial, time-consuming task impacting not only the correctness of the final product, but also its time to market. At the heart of most of today’s verification efforts is logic simulation, used heavily to verify the functional correctness of a design for a broad range of abstraction levels. In mainstream industry verification methodologies, typical setups coordinate the validation effort of a complex digital system by distributing logic simulation tasks among vast server farms for months at a time. Yet, the performance of logic simulation is not sufficient to satisfy the demand, leading to incomplete validation processes, escaped functional bugs, and continuous pressure on the EDA industry to develop faster simulation solutions. In this work we propose GCS, a solution to boost the performance of logic simulation, gate-level simulation in particular, by more than a factor of 10 using recent hardware advances in Graphic Processing Unit (GPU) technology. Noting the vast available parallelism in the hardware of modern GPUs and the inherently parallel structures of gate-level netlists, we propose novel algorithms for the efficient mapping of complex designs to parallel hardware. Our novel simulation architecture maximizes the utilization of concurrent hardware resources while minimizing expensive communication overhead. The experimental results show that our GPU-based simulator is capable of handling the validation of industrial-size designs while delivering more than an order-of-magnitude performance improvements on average, over the fastest multithreaded simulators commercially available.

international conference on hardware/software codesign and system synthesis | 2012

SystemC simulation on GP-GPUs: CUDA vs. OpenCL

Nicola Bombieri; Sara Vinco; Valeria Bertacco; Debapriya Chatterjee

SystemC is a widespread language for developing SoC designs. Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs. Parallelizing SystemC simulation entails a complete re-design of the simulator kernel for the specific target parallel architectures. This paper proposes an automatic methodology to generate a parallel SystemC simulator kernel, exploiting the massive parallelism of GP-GPU architectures. Our solution leverages static scheduling to reduce synchronization overheads. The generated simulator code targets both CUDA and OpenCL libraries, to boost scalability and provide support for multiple GP-GPU architectures. Finally, the paper compares the performance of our solution on CUDA vs. OpenCL platforms, with the goal of investigating advantages and drawbacks that the two thread management libraries offer to concurrent SystemC simulation.

design automation conference | 2012

Checking architectural outputs instruction-by-instruction on acceleration platforms

Debapriya Chatterjee; Anatoly Koyfman; Ronny Morad; Avi Ziv; Valeria Bertacco

Simulation-based verification is an integral part of a modern microprocessors design effort. Commonly, several checking techniques are deployed alongside the simulator to detect and localize each functional bug manifestation. Among these, a widespread technique entails comparing a microprocessor designs outputs with a golden model at the architectural granularity, instruction-by-instruction. However, due to exponential growth in design complexity, the performance of software-based simulation falls far short of achieving an acceptable level of coverage, which typically requires billions of simulation cycles. Hence, verification engineers rely on simulation acceleration platforms. Unfortunately, the intrinsic characteristics of these platforms make the adoption of the checking solutions mentioned above a challenging goal: for instance, the lockstep execution of a software checker together with the designs simulation is no longer feasible. To address this challenge we propose an innovative solution for instruction-by-instruction (IBI) checking tailored to acceleration platforms. We provide novel design techniques to decouple event tracing from checking by including specialized tracing logic and by adding a post-simulation checking phase. Note that simulation performance in acceleration platforms degrades when increasing the number of signals that are traced; hence, it is imperative to generate a compact summary of the information required for checking, collecting and tracing only a few bits of information per cycle.

design, automation, and test in europe | 2013

On the use of GP-GPUs for accelerating compute-intensive EDA applications

Valeria Bertacco; Debapriya Chatterjee; Nicola Bombieri; Franco Fummi; Sara Vinco; Anirudh M. Kaushik; Hiren D. Patel

General purpose graphics processing units (GP-GPUs) have recently been explored as a new computing paradigm for accelerating compute-intensive EDA applications. Such massively parallel architectures have been applied in accelerating the simulation of digital designs during several phases of their development - corresponding to different abstraction levels, specifically: (i) gate-level netlist descriptions, (ii) register-transfer level and (iii) transaction-level descriptions. This embedded tutorial presents a comprehensive analysis of the best results obtained by adopting GP-GPUs in all these EDA applications.

design, automation, and test in europe | 2012

Approximating checkers for simulation acceleration

Biruk Mammo; Debapriya Chatterjee; Dmitry Pidan; Amir Nahir; Avi Ziv; Ronny Morad; Valeria Bertacco

Simulation-based functional verification is the key validation methodology the industry. The performance of logic simulators, however, is not sufficient to attain acceptable verification coverage on large industrial designs within the time-frame available. Acceleration platforms are a valuable addition to the verification effort in that they can provide much higher coverage in less time. Unfortunately, these platforms do not provide the rich checking capability of software-based simulation. We propose a novel solution to deploy those complex checkers, typical of simulation-based environments, onto acceleration platforms. To this end, checkers must be transformed into synthesizable, compact logic blocks with bug-detection capabilities similar to that of their software counterparts. Our “approximate checkers” trade off logic complexity with bug detection accuracy by leveraging novel techniques to approximate complex software checkers into small synthesizable hardware blocks, which can be simulated along with the design on an acceleration platform. We present a general checker taxonomy, propose a range of approximation techniques based on a checkers characteristic and provide metrics for evaluating its bug detection capabilities.

design, automation, and test in europe | 2014

ArChiVED: Architectural checking via event digests for high performance validation

Chang-Hong Hsu; Debapriya Chatterjee; Ronny Morad; Raviv Ga; Valeria Bertacco

Simulation-based techniques play a key role in validating the functional correctness of microprocessor designs. A common approach for validating microprocessors (called instruction-by-instruction, or IBI checking) consists of running a RTL and an architectural simulation in lock-step, while comparing processor architectural state at each instruction retirement. This solution, however, cannot be deployed on long regression tests, because of the limited performance of RTL simulators. Acceleration platforms have the performance power to overcome this issue, but are not amenable to the deployment of an IBI checking methodology. Indeed, validation on these platforms requires logging activity on-platform and then checking it against a golden model off-platform. Unfortunately, an IBI checking approach following this paradigm entails a large slowdown for the acceleration platform, because of the sizable amount of data that must be transferred off-platform for comparison against the golden model. In this work we propose a sequence-by-sequence (SBS) checking approach that is efficient and practical for acceleration platforms. Our solution validates the test execution over sequences of instructions (instead of individual ones), thus greatly reducing the amount of data transferred for off-platform checking. We found that SBS checking delivers the same bug-detection accuracy as traditional IBI checking, while reducing the amount of traced data by more than 90%.

Explore More