Minhua Zhou
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Minhua Zhou.
IEEE Journal of Selected Topics in Signal Processing | 2013
Kiran Misra; C. Andrew Segall; Michael Horowitz; Shilin Xu; Arild Fuldseth; Minhua Zhou
Tiles is a new feature in the High Efficiency Video Coding (HEVC) standard that divides a picture into independent, rectangular regions. This division provides a number of advantages. Specifically, it increases the “parallel friendliness” of the new standard by enabling improved coding efficiency for parallel architectures, as compared to previous sliced based methods. Additionally, tiles facilitate improved maximum transmission unit (MTU) size matching, reduced line buffer memory, and additional region-of-interest functionality. In this paper, we introduce the tiles feature and survey the performance of the tool. Coding efficiency is reported for different parallelization factors and MTU size requirements. Additionally, a tile-based region of interest coding method is developed.
international conference on acoustics, speech, and signal processing | 2008
Madhukar Budagavi; Minhua Zhou
Handheld battery-operated consumer electronics video devices such as camera phones, digital still cameras, digital camcorders, and personal media players have limited system memory bandwidth available because of cost and power consumption constraints. Video coding consumes a significant amount of this limited system memory bandwidth especially at high-definition (HD) resolution. Techniques that reduce memory bandwidth in video coding are crucial for implementing video coding at HD resolutions in portable video devices. Memory bandwidth reduction is also desirable from a power consumption point of view since memory accesses consume a significant amount of power. In this paper we present our technique - in-loop compression of reference frames - for reducing memory bandwidth in video coding.
international conference on image processing | 2008
Vivienne Sze; Anantha P. Chandrakasan; Madhukar Budagavi; Minhua Zhou
With the growing presence of high definition video content on battery-operated handheld devices such as camera phones, digital still cameras, digital camcorders, and personal media players, it is becoming ever more important that video compression be power efficient. A popular form of entropy coding called context-based adaptive binary arithmetic coding (CABAC) provides high coding efficiency but has limited throughput. This can lead to high operating frequencies resulting in high power dissipation. This paper presents a novel parallel CABAC scheme which enables a throughput increase of N-fold (depending on the degree parallelism), reducing the frequency requirement and expected power consumption of the coding engine. Experiments show that this new scheme (with N=2) can deliver ~2x throughput improvement at a cost of 0.76% average increase in bit-rate or equivalently a decrease in average PSNR of 0.025 dB on five 720 p resolution video clips when compared with H.264/AVC.
IEEE Journal of Selected Topics in Signal Processing | 2013
Mahmut E. Sinangil; Vivienne Sze; Minhua Zhou; Anantha P. Chandrakasan
This paper focuses on motion estimation engine design in future high-efficiency video coding (HEVC) encoders. First, a methodology is explained to analyze hardware implementation cost in terms of hardware area, memory size and memory bandwidth for various possible motion estimation engine designs. For 11 different configurations, hardware cost as well as the coding efficiency are quantified and are compared through a graphical analysis to make design decisions. It has been shown that using smaller block sizes (e.g. 4 × 4) imposes significantly larger hardware requirements at the expense of modest improvements in coding efficiency. Secondly, based on the analysis on various configurations, one configuration is chosen and algorithm improvements are presented to further reduce hardware implementation cost of the selected configuration. Overall, the proposed changes provide 56 × on-chip bandwidth, 151 × off-chip bandwidth, 4.3 × core area and 4.5 × on-chip memory area savings when compared to the hardware implementation of the HM-3.0 design.
Proceedings of SPIE | 2012
Minhua Zhou; Vivienne Sze; Madhukar Budagavi
HEVC (High Efficiency Video Coding) is the next-generation video coding standard being jointly developed by the ITU-T VCEG and ISO/IEC MPEG JCT-VC team. In addition to the high coding efficiency, which is expected to provide 50% more bit-rate reduction when compared to H.264/AVC, HEVC has built-in parallel processing tools to address bitrate, pixel-rate and motion estimation (ME) throughput requirements. This paper describes how CABAC, which is also used in H.264/AVC, has been redesigned for improved throughput, and how parallel merge/skip and tiles, which are new tools introduced for HEVC, enable high-throughput processing. CABAC has data dependencies which make it difficult to parallelize and thus limit its throughput. The prediction error/residual, represented as quantized transform coefficients, accounts for the majority of the CABAC workload. Various improvements have been made to the context selection and scans in transform coefficient coding that enable CABAC in HEVC to potentially achieve higher throughput and increased coding gains relative to H.264/AVC. The merge/skip mode is a coding efficiency enhancement tool in HEVC; the parallel merge/skip breaks dependency between the regular and merge/skip ME, which provides flexibility for high throughput and high efficiency HEVC encoder designs. For ultra high definition (UHD) video, such as 4kx2k and 8kx4k resolutions, low-latency and real-time processing may be beyond the capability of a single core codec. Tiles are an effective tool which enables pixel-rate balancing among the cores to achieve parallel processing with a throughput scalable implementation of multi-core UHD video codec. With the evenly divided tiles, a multi-core video codec can be realized by simply replicating single core codec and adding a tile boundary processing core on top of that. These tools illustrate that accounting for implementation cost when designing video coding algorithms can enable higher processing speed and reduce implementation cost, while still delivering high coding efficiency in the next generation video coding standard.
signal processing systems | 1999
Madhukar Budagavi; Jennifer L. H. Webb; Minhua Zhou; Jie Liang; Raj Talluri
The emerging MPEG-4 standard encompasses a wide variety of applications, many of which are suitable for implementation on a Digital Signal Processor (DSP). In particular, consumer products with embedded multimedia capability, such as set-top boxes and wireless communicators, are suitable for DSP-based implementation. With a programmable approach, various algorithmic tradeoffs can be made, based on processing capability. For best performance, careful attention must be paid to memory allocation, data transfer, and ordering of instructions to best match the DSP architecture. We discuss implementing simple profile MPEG-4 video on the low-power TMS320C54x, core profile on the TMS320C6x, and scalable texture profile, which could be implemented on either processor family.
international conference on image processing | 2012
Mahmut E. Sinangil; Anantha P. Chandrakasan; Vivienne Sze; Minhua Zhou
This paper presents a comparison between various High Efficiency Video Coding (HEVC) motion estimation configurations in terms of coding efficiency and memory cost in hardware. An HEVC motion estimation hardware model that is suitable to implement HEVC reference software (HM) search algorithm is created and memory area and data bandwidth requirements are calculated based on this model. 11 different motion estimation configurations are considered. Supporting smaller block sizes is shown to impose significant memory cost in hardware although the coding gain achieved through supporting them is relatively smaller. Hence, depending on target encoder specifications, the decision can be made not to support certain block sizes. Specifically, supporting only 64x64, 32x32 and 16x16 block sizes provide 3.2X on-chip memory area, 26X on-chip bandwidth and 12.5X off-chip bandwidth savings at the expense of 12% bit-rate increase when compared to the anchor configuration supporting all block sizes.
international conference on image processing | 2012
Mahmut E. Sinangil; Anantha P. Chandrakasan; Vivienne Sze; Minhua Zhou
This work presents a hardware-aware search algorithm for HEVC motion estimation. Implications of several decisions in search algorithm are considered with respect to their hardware implementation costs (in terms of area and bandwidth). Proposed algorithm provides 3X logic area in integer motion estimation, 16% on-chip reference buffer area and 47X maximum off-chip bandwidth savings when compared to HM-3.0 fast search algorithm.
conference on image and video communications and processing | 2000
Minhua Zhou; Raj Talluri
This paper describes the implementation of H.263 real-time video encoding on TI TMS320C6X. This series of DSPs utilize a common core based on VelociTITM, the advanced Very Long Instruction Word (VLIW) DSP architecture, which makes them ideal for the high performance embedded multimedia applications. In this paper we discuss in detail the used methodologies to structure video coding algorithm in order to exploit the DSP architecture. In particular, a novel DSP- friendly motion estimation algorithm has been developed to achieve the good trade-off between the coding efficiency and coding complexity. This algorithm plays a key role for the realization of real-time video encoding on DSPs. On the EVM board of this kind of DSP (CPU frequency 167 MHz), we were able to demonstrate the H.263 baseline video encoding, CIF (352 X 288), 1Mbit/s at a speed of about 30 fps. Multimedia applications such as consumer set-top boxes, videophones, videoconferencing, network camera will benefit from this performance.
international conference on image processing | 2011
Madhukar Budagavi; Vivienne Sze; Minhua Zhou
This paper analyzes the decoder implementation complexity of a new tool called Adaptive Loop Filtering (ALF) being considered for the ITU-T/ISO/IEC High Efficiency Video Coding (HEVC) standard, and proposes new luma filters (Nx7 and Nx5) for ALF that reduce memory bandwidth, memory size requirements, and number of computations. The luma filters in ALF of the initial version HEVC Test Model (HM-1.0) have a maximum vertical size of 9. The vertical size of the ALF filters determines the memory size (line buffers) and memory bandwidth requirements. Accordingly, this paper proposes reducing the vertical size of ALF filters to 7 and 5, which are referred to as Nx7 and Nx5 filter sets respectively. These filters reduce memory bandwidth and size requirements by 25% and 50% respectively with minimal impact on coding efficiency. In addition, the worst case computational complexity is reduced by ∼10% and ∼20% respectively. Reduced vertical size luma ALF filters are under consideration for inclusion in HEVC standard with Nx7 being been adopted into HM-2.0 and Nx5 being under consideration for HM-4.0.