Dipan Kumar Mandal
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dipan Kumar Mandal.
international symposium on circuits and systems | 2014
Dipan Kumar Mandal; Jagadeesh Sankaran; Akshay Gupta; Kyle Castille; Shraddha Gondkar; Sanmati Kamath; Pooja Sundar; Alan Phipps
This paper introduces Embedded Vision Engine (EVE) - a fully programmable, specialized vector processor architecture aimed at solving challenging Computer Vision applications encountered in Advanced Driver Assistance Systems (ADAS). The paper outlines the complexity of automotive vision applications, establishes why specialized architecture (like EVE) is needed and outlines the EVE architecture, its components and programming model. We present comparative benchmarks and provide an overview of many carefully crafted features of EVE for power management, inter processor communication, functional safety and software debug that helps in building a scalable, area-power efficient System-on-Chip (SoC) solutions for the cost, power and safety sensitive automotive vision space.
international symposium on circuits and systems | 2014
Mihir Mody; Hrushikesh Garud; Soyeb Nagori; Dipan Kumar Mandal
This paper presents a high performance, silicon area efficient, and software configurable hardware architecture for sample adaptive offset (SAO) encoding. The paper proposes a novel architecture consisting of single largest coding unit (LCU) stage SAO operation, unified data path for luma and chroma channels, add-on external interfaces on frame level statistics collection units to allow fine control over the parameter estimation process, flexible rate control and artifact avoidance algorithms. The unified data path consists of 2D-block based processing with 3 pipeline stages for statistics generation and multiple offset rate-distortion cost estimation blocks for high performance. The proposed design after placement and routing is expected to take-up approximately 0.15 mm2 of silicon area in 28nm CMOS process. The proposed design at 200 MHz supports 4K Ultra HD video encoding at 60fps. Simulation experiments have shown average bit-rate saving of up to 4.3% with in-loop SAO filtering and various encoder configurations.
international symposium on circuits and systems | 2014
Hetul Sanghvi; Mihir Mody; Niraj Nandan; Mahesh Mehendale; Subrangshu Das; Dipan Kumar Mandal; Pavan Shastry
Video codec standards like H.264 and HEVC are driving the need for high computation and high memory bandwidth in current SOCs. On the other hand, portable devices like smartphones and tablets are driving the need to reduce power consumption for enhanced battery life. In this paper, we present a scalable H.264 Ultra-HD video codec engine that dissipates 9 mW of decode and 18 mW of encode power (for a typical HP H.264 1080p30 bit-stream) in 28 nm low power process technology node using various low power optimization techniques across architecture, design, circuit, software and systems.
international conference on acoustics, speech, and signal processing | 2014
Hetul Sanghvi; Mihir Mody; Niraj Nandan; Mahesh Mehendale; Subrangshu Das; Dipan Kumar Mandal; Nainala Vyagrheswarudu; Vijayavardhan Baireddy; Pavan Shastry
With advances in video coding standards like H.264 and HEVC coupled with those in the display technology, Ultra HD contents have started taking the mainstream. This is driving the need for high computation and memory bandwidth in current multi-media SOCs. In this paper, we present a monolithic multi-format video codec engine which achieves Ultra HD performance for H.264 High Profile, reduces the external memory bandwidth requirement by 2X as compared to its predecessor and takes only 5.9 mm2 of silicon area in a low power 28nm process.
international conference on consumer electronics | 2015
Dipan Kumar Mandal; Mihir Mody; Mahesh Mehendale; Naresh Yadav; Ghone Chaitanya; Piyali Goswami; Hetul Sanghvi; Niraj Nandan
Video coding standards (e.g. H.264, HEVC) use slice, consisting of a header and payload video data, as an independent coding unit for low latency encode-decode and better transmission error resiliency. In typical video streams, decoding the slice header is quite simple that can be done on standard embedded RISC processor architectures. However, universal decoding scenarios require handling worst case slice header complexity that grows to un-manageable level, well beyond the capacity of most embedded RISC processors. Hardwiring of slice processing control logic is potentially helpful but it reduces flexibility to tune the decoder for error conditions - an important differentiator for the end user. The paper presents a programmable approach to accelerate slice header decoding using an Application Specific Instruction Set Processor (ASIP). Purpose built instructions, built as extensions to a RISC processor (ARP32), accelerate slice processing by 30% for typical cases, reaching up to 70% for slices with worst case decoding complexity. The approach enables real time universal video decode for all slice-complexity-scenarios without sacrificing the flexibility, adaptability to customize, differentiate the codec solution via software programmability.
international conference on communications | 2014
Prashant Karandikar; Mihir Mody; Hetul Sanghvi; Vasant Easwaran; Y A Prithvi Shankar; Rahul Gulati; Neeraj Nandan; Dipan Kumar Mandal; Subrangshu Das
A typical multimedia SoC consists of all or a subset of hardware components for image capture & processing, video compression and de-compression, computer vision, graphics and display processing. Each of these components access and compete for the limited bandwidth available in the shared external memory. Meeting latency (e.g., display) and throughput (e.g., video encode) is a critical problem to solve for such SoCs. In typical SoCs, this problem is solved using system level caches. However in this paper, we show results that indicate system level caches are not beneficial for multi-media traffic both in terms of DDR bandwidth savings as well as for latency reduction. We also show results of desirable features to improve multimedia performance in SoCs using a system cache.
Archive | 2012
Kanika Ghai Bansal; Dipan Kumar Mandal; Gary A. Cooper; Bryan Thome
Archive | 2006
Dipan Kumar Mandal; Bryan Thome
Archive | 2012
Kanika Ghai Bansal; Dipan Kumar Mandal; Gary A. Cooper; Bryan Thome
Archive | 2010
Dipan Kumar Mandal; Brian Joseph Thome