Hetul Sanghvi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hetul Sanghvi is active.

Explore More

Publication

Featured researches published by Hetul Sanghvi.

ieee international conference on image information processing | 2013

Low power architecture for motion compensation in a 4K Ultra-HD AVC and HEVC video codec system

Hetul Sanghvi

Motion Compensation in Video Codec is a step where blocks of pixels from Reference Picture are fetched and interpolated to form the prediction image for the current picture being processed. There are several challenges in implementing this functionality in hardware - (1) to identify the minimum set of reference pixels needed to give the required prediction image in an efficient manner from external memory, (2) to fetch these pixels from the external memory at a rate to match the 4K Ultra HD frame processing and (3) to perform the processing in a power optimal manner. This paper describes architecture for motion compensation hardware which integrates command preparation, a 2D reference pixel data caching scheme, a DMA engine and a power efficient pixel interpolation engine. The 2D caching technique helps in reducing the LPDDR2 SDRAM power for a 4k Ultra-HD decoder by up to 70 mW and bandwidth by 800 MB/s (50% reduction), increasing the typical 1080p30 HDMI playback time by 2 hours. The motion compensation hardware module dissipates 3 mW for 1080p30 decode.

advances in computing and communications | 2014

High performance and flexible imaging Sub-system

Mihir Mody; Hetul Sanghvi; Niraj Nandan; Shashank Dabral; Rajasekhar Allu; Dharmendra Soni; Sunil Sah; Gayathri Seshadri; Prashant Karandikar

Imaging Sub-system (ISS) enables capturing photographs or live video from raw image sensors. This consists of a set of sensor interfaces and cascaded set of algorithms to improve image/video quality. This paper illustrates typical imaging sub-system architectures consisting of a Sensor front end, an Image Signal Processor (ISP) and an Image Co-processor (sIMCOP). Here we describe the ISS developed by Texas Instruments (TI) for the OMAP 5432 processor. The given solution is flexible to interface with various kinds of image sensors and provides hooks to tune visual quality for specific customers as well as end applications. This solution is also flexible by providing options to enable customized data flows based on actual algorithm needs. The overall solution runs at a high throughput of 1 pixel/clock cycle to enable full HD video at high visual quality.

international conference on consumer electronics | 2014

2D cache architecture for motion compensation in a 4K Ultra-HD AVC and HEVC video codec system

Hetul Sanghvi

Motion Compensation in AVC or HEVC Video codec requires reference pixels stored in the external SDRAM and interpolates it to form the Predictor Image. This is a significant chunk (70-80%) of the total SDRAM bandwidth and hence drives the bandwidth requirements. There is lot of overlap between the reference data required for every partition. This paper describes 2D or a block based caching scheme which exploits the commonality of reference pixel fetches across various partitions and thereby reducing the SDRAM bandwidth and power. Prior techniques heavily rely on using a video CPU to achieve this and still can do this only partially. This technique helps in reducing the LPDDR2 SDRAM power for a 4k Ultra-HD decoder by up to 70 mW and bandwidth by 800 MB/s (50% reduction) and increasing the typical 1080p30 HDMI playback time by 2 hours.

international symposium on circuits and systems | 2014

A 28nm programmable and low power ultra-HD video codec engine

Hetul Sanghvi; Mihir Mody; Niraj Nandan; Mahesh Mehendale; Subrangshu Das; Dipan Kumar Mandal; Pavan Shastry

Video codec standards like H.264 and HEVC are driving the need for high computation and high memory bandwidth in current SOCs. On the other hand, portable devices like smartphones and tablets are driving the need to reduce power consumption for enhanced battery life. In this paper, we present a scalable H.264 Ultra-HD video codec engine that dissipates 9 mW of decode and 18 mW of encode power (for a typical HP H.264 1080p30 bit-stream) in 28 nm low power process technology node using various low power optimization techniques across architecture, design, circuit, software and systems.

international symposium on signal processing and information technology | 2014

Customization of de-blocking filter edge order for high performance: Study of H.264 AVC/SVC, H.265

Niraj Nandan; Mihir Mody; Hetul Sanghvi; Prithvi Shankar

Transformation and quantization in block based video codecs introduces blocking artifacts at edges. Special optimized video filter called de-blocking filter is applied on 4×4/8×8 block boundary to enhance visual quality and improve prediction efficiency. Most of the recent video codecs, H.264, H.265 (HEVC), VC-1 uses in-loop de-blocking (LPF) filter in decoder path. Each video codec standard defines fixed order of filter operation to have consistency in universal decoder output. Standard defined fixed edge order is not optimal for various architectures of de-blocking Hardware Accelerator (HWA), as it will have to compromise on performance, power or area. Pipelining of unfiltered pixel loading with filter operation, internal storage to keep partially or fully filtered pixels and order of storage of fully filtered pixels are some of the challenges that are difficult to meet with standard defined edge order. In this paper, a novel approach of customizing edge order is discussed for differing architectural requirements and for various video codec standards. Resultant filtered data with optimized edge order matches that of with standard defined edge order.

international conference on acoustics, speech, and signal processing | 2014

A monolithic programmable Ultra-HD video codec engine

Hetul Sanghvi; Mihir Mody; Niraj Nandan; Mahesh Mehendale; Subrangshu Das; Dipan Kumar Mandal; Nainala Vyagrheswarudu; Vijayavardhan Baireddy; Pavan Shastry

With advances in video coding standards like H.264 and HEVC coupled with those in the display technology, Ultra HD contents have started taking the mainstream. This is driving the need for high computation and memory bandwidth in current multi-media SOCs. In this paper, we present a monolithic multi-format video codec engine which achieves Ultra HD performance for H.264 High Profile, reduces the external memory bandwidth requirement by 2X as compared to its predecessor and takes only 5.9 mm2 of silicon area in a low power 28nm process.

international conference on consumer electronics | 2015

Accelerating H.264/HEVC video slice processing using application specific instruction set processor

Dipan Kumar Mandal; Mihir Mody; Mahesh Mehendale; Naresh Yadav; Ghone Chaitanya; Piyali Goswami; Hetul Sanghvi; Niraj Nandan

Video coding standards (e.g. H.264, HEVC) use slice, consisting of a header and payload video data, as an independent coding unit for low latency encode-decode and better transmission error resiliency. In typical video streams, decoding the slice header is quite simple that can be done on standard embedded RISC processor architectures. However, universal decoding scenarios require handling worst case slice header complexity that grows to un-manageable level, well beyond the capacity of most embedded RISC processors. Hardwiring of slice processing control logic is potentially helpful but it reduces flexibility to tune the decoder for error conditions - an important differentiator for the end user. The paper presents a programmable approach to accelerate slice header decoding using an Application Specific Instruction Set Processor (ASIP). Purpose built instructions, built as extensions to a RISC processor (ARP32), accelerate slice processing by 30% for typical cases, reaching up to 70% for slices with worst case decoding complexity. The approach enables real time universal video decode for all slice-complexity-scenarios without sacrificing the flexibility, adaptability to customize, differentiate the codec solution via software programmability.

international conference on communications | 2014

Generic transfer scheme for outputting video frame in Ultra-HD video Engine

Mihir Mody; Niraj Nandan; Hetul Sanghvi

Typically, industry video solutions does in-loop filtering on-the-fly (e.g. at Macro Block for H.264 or Coding Unit level for HEVC) and transfer YUV data to external memory using DMA engine. These transfers are un-aligned with processing unit (MB/CU) as well as variable sized due to loop filtering operation. This poses challenges in Loop filer and DMA engine on handling of such transfers. This paper proposes concept of “region” to handle such transfers for H.264 and previous generation standard. In case of H.264, the paper proposes division of video frame in to nine regions with common transfer attributes. In case of HEVC, there is additional complexity of TILES (with & without loop filtering) which creates variable numbers of regions in output frame. This paper also proposed flexible & generic scheme to handle output YUV transfers for handling of TILES in HEVC. The updated scheme is flexible enough to encompass previous generation of video standard e.g. H.264 with better performance.

international conference on communications | 2014

Understanding System Level Caching Behavior in Multimedia SoC

Prashant Karandikar; Mihir Mody; Hetul Sanghvi; Vasant Easwaran; Y A Prithvi Shankar; Rahul Gulati; Neeraj Nandan; Dipan Kumar Mandal; Subrangshu Das

A typical multimedia SoC consists of all or a subset of hardware components for image capture & processing, video compression and de-compression, computer vision, graphics and display processing. Each of these components access and compete for the limited bandwidth available in the shared external memory. Meeting latency (e.g., display) and throughput (e.g., video encode) is a critical problem to solve for such SoCs. In typical SoCs, this problem is solved using system level caches. However in this paper, we show results that indicate system level caches are not beneficial for multi-media traffic both in terms of DDR bandwidth savings as well as for latency reduction. We also show results of desirable features to improve multimedia performance in SoCs using a system cache.

ieee international conference on electronics computing and communication technologies | 2014

Performance estimation and architecture exploration of a video IP design in a smart phone SoC context

Gautam Hazari; Prashant Karandikar; Hetul Sanghvi

H.264 and HEVC are popular video standards for current and next generation systems. Texas Instruments has developed an IP architecture for real-time encode and decode of 4k×2k video streams at 30 frames per second (fps) or 1080p streams at 120 fps, on these standards. We present the architecture modeling methodology driving the design and performance validation of the IP. The IP model contains three key components. First, a pipeline modeling framework to capture the dynamic control flow across the algorithm blocks which are implemented as hardware modules. The framework enables quick representation of all pipeline configurations being considered. Second, micro-architecture models for the DMA engines. Third, trace players to generate the L2 and L3 traffic for various use-cases expressed as traces. The IP model is integrated into an SoC model to study the impact of system traffic on the L3 responses. We run simulations to evaluate the design w.r.t. the performance targets and for architecture exploration. Our H.264 results show that the IP meets targets for typical use-cases, but there is a risk for worst-case streams. Our HEVC architecture exploration study enables us to select recommended design configurations. We see this as a powerful methodology for performance validation and architecture definition.

Explore More