Mihir Mody | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mihir Mody is active.

Explore More

Publication

Featured researches published by Mihir Mody.

international solid-state circuits conference | 2012

A true multistandard, programmable, low-power, full HD video-codec engine for smartphone SoC

Mahesh Mehendale; Subrangshu Das; Mohit Sharma; Mihir Mody; Ratna M. V. Reddy; Joseph Patrick Meehan; Hideo Tamama; Brian Carlson; Mike Polley

In this paper, we present IVA-HD, a true multistandard, programmable, full HD video coding engine which adopts optimal hardware-software partitioning to achieve the low-power and area requirements of the OMAP 4 processor. Unlike the approach of using separate IPs for encoder and decoder, IVA-HD uses an integrated codec engine which is area efficient, as most of the decoder logic is reused for the encoder. IVA-HD is architected to perform stream-rate and pixel- rate processing in a single pipeline (that processes one 16x16 macroblock at a time), so as to support the latency requirements of video conferencing.

international conference on consumer electronics berlin | 2013

High throughput VLSI architecture supporting HEVC loop filter for Ultra HDTV

Mihir Mody; Niraj Nandan; Tamama Hideo

In-Loop filtering in HEVC/H.265 is one of most computation intensive block taking around 15-20% of overall complexity for decoding. The loop filtering in HEVC is more sophisticated with introduction of Sample adaptive offset (SAO) filter in addition to de-blocking filter in comparison to H.264. In this paper, very high performance as well as area efficient VLSI architecture is proposed for HEVC decoder, which supports 4K@60fps for next generation Ultra HDTV at 200 MHz clock. The design can process Largest Coding Unit (LCU) of size 64×64 in less than 1200 cycles with performance directly scaling down based on LCU size. The architecture consists of LCU level pipelining across de-blocking and SAO filtering with four & three stage internal pipeline within each block. The architectures proposes fully on-the-fly filtering avoiding memory bandwidth, custom filtering order as well as scanning, 4×4 based block processing and FIFO based asynchronous architecture to achieve high performance. The final design in 28nm CMOS process is expected to take around 0.2 mm2 after actual place & route. The proposed design is capable of handling 4K@60fps as well as fully compliant to HEVC video standard specification including all corner conditions handling like slice and tiles processing via generic region definition.

midwest symposium on circuits and systems | 2014

Trends in camera based Automotive Driver Assistance Systems (ADAS)

Shashank Dabral; Sanmati Kamath; Vikram V. Appia; Mihir Mody; Buyue Zhang; Umit Batur

Advance Driver Assistance Systems (ADAS), once limited to high end luxury automobiles are fast becoming popular with Mid and entry level segments driven in part by legislation coming in to effect in the latter part of this decade. These systems require support for a wide variety of applications, from surround-view visual systems to safety critical vision applications (eg Pedestrian Detect, automatic braking etc). In this white paper we describe some of the existing and emerging trends and applications in each of these segments along with the requirements and motivations for each of these features. We also highlight TIs automotive class TDA2x device, a state of the art automotive grade device capable of handling complex ADAS applications within a low power and cost budget.

international symposium on circuits and systems | 2014

High throughput VLSI architecture for HEVC SAO encoding for ultra HDTV

Mihir Mody; Hrushikesh Garud; Soyeb Nagori; Dipan Kumar Mandal

This paper presents a high performance, silicon area efficient, and software configurable hardware architecture for sample adaptive offset (SAO) encoding. The paper proposes a novel architecture consisting of single largest coding unit (LCU) stage SAO operation, unified data path for luma and chroma channels, add-on external interfaces on frame level statistics collection units to allow fine control over the parameter estimation process, flexible rate control and artifact avoidance algorithms. The unified data path consists of 2D-block based processing with 3 pipeline stages for statistics generation and multiple offset rate-distortion cost estimation blocks for high performance. The proposed design after placement and routing is expected to take-up approximately 0.15 mm2 of silicon area in 28nm CMOS process. The proposed design at 200 MHz supports 4K Ultra HD video encoding at 60fps. Simulation experiments have shown average bit-rate saving of up to 4.3% with in-loop SAO filtering and various encoder configurations.

advances in computing and communications | 2014

High performance and flexible imaging Sub-system

Mihir Mody; Hetul Sanghvi; Niraj Nandan; Shashank Dabral; Rajasekhar Allu; Dharmendra Soni; Sunil Sah; Gayathri Seshadri; Prashant Karandikar

Imaging Sub-system (ISS) enables capturing photographs or live video from raw image sensors. This consists of a set of sensor interfaces and cascaded set of algorithms to improve image/video quality. This paper illustrates typical imaging sub-system architectures consisting of a Sensor front end, an Image Signal Processor (ISP) and an Image Co-processor (sIMCOP). Here we describe the ISS developed by Texas Instruments (TI) for the OMAP 5432 processor. The given solution is flexible to interface with various kinds of image sensors and provides hooks to tune visual quality for specific customers as well as end applications. This solution is also flexible by providing options to enable customized data flows based on actual algorithm needs. The overall solution runs at a high throughput of 1 pixel/clock cycle to enable full HD video at high visual quality.

ieee international conference on high performance computing data and analytics | 2015

High Performance Front Camera ADAS Applications on TI's TDA3X Platform

Mihir Mody; Pramod Swami; Kedar Chitnis; Shyam Jagannathan; Kumar Desappan; Anshu Jain; Deepak Kumar Poddar; Zoran Nikolic; Prashanth Viswanath; Manu Mathew; Soyeb Nagori; Hrushikesh Garud

Advanced driver assistance systems (ADAS) are designed to increase drivers situational awareness and road safety by providing essential information, warning and automatic intervention to reduce the possibility/severity of an accident. Of the various types of ADAS modalities available, camera based ADAS are being widely adopted for their usefulness in varied applications, overall reliability and adaptability to new requirements. But camera based ADAS also represents a complex, high-performance, and low-power compute problem, requiring specialized solutions. This paper introduces a high performance front camera ADAS based on a small area, low power System-on-Chip (SoC) solution from Texas Instruments called Texas Instruments Driver Assist 3x (TDA3x). The paper illustrates compute capabilities of the device in implementation of a typical front camera ADAS. The paper also introduces key programming concepts related to heterogeneous programmable compute cores in the SoC and the software framework to use those cores in order to develop the front camera solutions. These aspects will be of interest not only to the ADAS developers but for computer vision and compute intensive embedded system development.

ieee international conference on image information processing | 2013

Scalable high performance loop filter architecture for video codecs

Niraj Nandan; Mihir Mody

There is continuous thrust on improved and innovative video solution to facilitate video conferencing, video surveillance, transcoding, streaming video and many more customer centric new solutions. Increasing frame rate and frame size demands high performance hardware accelerators (HWA) to enable efficient 16×16 pixels macroblock level (MB) pipelining inside video processing engine (IVAHD). Inloop de-blocking filter of H.264 codec reduces blocking artifacts in MB and it is very demanding in terms of cycles and resources (memory access and memory storage). Removal of blocking artifacts due to block-based video codecs takes around 20-25% of overall decoder complexity in current generation of standards (H.264) and trend will continue going forward in H.265. Higher adaptability of filter process, smaller block sizes (4×4), motion vector (MV) dependent boundary strength (BS) computation for each edge of 4×4 block, predefined order for doing filtering (vertical edge followed by horizontal edge) and data pixel loading of current and neighbor MB requires large number of accesses to shared memory of IVAHD (SL2), higher processing cycles and larger internal pixel buffer (IPB). This paper discusses novel approach of loop filter (LPF) operation to overcome above barriers and facilitate IVAHD to go up to 240fps frame rate in full HD processing of H.264 codec with leadership area and power. The final design in 28nm CMOS process is expected to take around 0.10 mm2 after actual place and route (consisting of 220 KGate with 5 KB of internal memory). Proposed design is capable of handling 4K@60fps and scalable to support H.265 inloop de-blocking filter.

international symposium on circuits and systems | 2014

A 28nm programmable and low power ultra-HD video codec engine

Hetul Sanghvi; Mihir Mody; Niraj Nandan; Mahesh Mehendale; Subrangshu Das; Dipan Kumar Mandal; Pavan Shastry

Video codec standards like H.264 and HEVC are driving the need for high computation and high memory bandwidth in current SOCs. On the other hand, portable devices like smartphones and tablets are driving the need to reduce power consumption for enhanced battery life. In this paper, we present a scalable H.264 Ultra-HD video codec engine that dissipates 9 mW of decode and 18 mW of encode power (for a typical HP H.264 1080p30 bit-stream) in 28 nm low power process technology node using various low power optimization techniques across architecture, design, circuit, software and systems.

ieee international conference on electronics computing and communication technologies | 2014

Ultra-low latency video codec for video conferencing

Mihir Mody; Pramod Swami; Pavan Shastry

Video codec (e.g. HEVC, H.264, H.263, H.261) are used for real time video conferencing over internet. The amount of latency from end to end (or round trip) has significant impact on perceived quality of video call. This paper explains overall latency for entire signal chain with focus especially on video codec. The paper explains typical configuration to optimize overall latency of video processing down to range of 1 video frame processing time. The paper proposes new sub-frame based data flow to cut down overall latency to significantly to fraction of video frame. The paper proposes new design for video codec engine to enable sub-frame based data flow consisting of novel way of exchanging data between entropy engine and application, pre-fetching of video data without stalling video performance and sending of partial video output to network. The overall design enables reduction of processing latency of video engine from multiple frames to few lines of video. The overall solution on TIs Davinci series (DM816x) device achieves latency up-to 2 msec compared to prior art measurement of 33 msec resulting in better user experience due to large improvements in perceived visual quality.

computer vision and pattern recognition | 2016

A Diverse Low Cost High Performance Platform for Advanced Driver Assistance System (ADAS) Applications

Prashanth Viswanath; Kedar Chitnis; Pramod Swami; Mihir Mody; Sujith Shivalingappa; Soyeb Nagori; Manu Mathew; Kumar Desappan; Shyam Jagannathan; Deepak Kumar Poddar; Anshu Jain; Hrushikesh Garud; Vikram V. Appia; Mayank Mangla; Shashank Dabral

Advanced driver assistance systems (ADAS) are becoming more and more popular. Lot of the ADAS applications such as Lane departure warning (LDW), Forward Collision Warning (FCW), Automatic Cruise Control (ACC), Auto Emergency Braking (AEB), Surround View (SV) that were present only in high-end cars in the past have trickled down to the low and mid end vehicles. Lot of these applications are also mandated by safety authorities such as EUNCAP and NHTSA. In order to make these applications affordable in the low and mid end vehicles, it is important to have a cost effective, yet high performance and low power solution. Texas Instruments (TIs) TDA3x is an ideal platform which addresses these needs. This paper illustrates mapping of different algorithms such as SV, LDW, Object detection (OD), Structure From Motion (SFM) and Camera-Monitor Systems (CMS) to the TDA3x device, thereby demonstrating its compute capabilities. We also share the performance for these embedded vision applications, showing that TDA3x is an excellent high performance device for ADAS applications.

Explore More