Is this you? Create Your Porfile

Mateus Grellert

Universidade Federal do Rio Grande do Sul

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mateus Grellert is active.

Explore More

Publication

Featured researches published by Mateus Grellert.

international conference on multimedia and expo | 2012

Motion Vectors Merging: Low Complexity Prediction Unit Decision Heuristic for the Inter-prediction of HEVC Encoders

Felipe Sampaio; Sergio Bampi; Mateus Grellert; Luciano Volcan Agostini; Júlio C. B. de Mattos

This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant rate-distortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works.

design, automation, and test in europe | 2013

Hardware-software collaborative complexity reduction scheme for the emerging HEVC intra encoder

Muhammad Usman Karim Khan; Muhammad Shafique; Mateus Grellert; Jörg Henkel

High Efficiency Video Coding (HEVC/H.265) is an emerging standard for video compression that provides almost double compression efficiency at the cost of major computational complexity increase as compared to current industry-standard Advanced Video Coding (AVC/H.264). This work proposes a collaborative hardware and software scheme for complexity reduction in an HEVC Intra encoding system, with run-time adaptivity. Our scheme leverages video content properties which drive the complexity management layer (software) to generate a highly probable coding configuration. The intra prediction size and direction are estimated for the prediction unit which provides reduced computational-complexity. At the hardware layer, specialized coprocessors with enhanced reusability are employed as accelerators. Additionally, depending upon the video properties, the software layer administers the energy management of the hardware coprocessors. Experimental results show that a complexity reduction of up to 60 % and the energy reduction up to 42 % are achieved.

southern conference programmable logic | 2012

Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard

Ricardo Jeske; José Cláudio de Souza; Gustavo Wrege; Ruhan Conceição; Mateus Grellert; Júlio C. B. de Mattos; Luciano Volcan Agostini

This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.

international conference on image processing | 2013

An adaptive workload management scheme for HEVC encoding

Mateus Grellert; Muhammad Shafique; Muhammad Usman Karim Khan; Luciano Volcan Agostini; Júlio C. B. de Mattos; Jörg Henkel

Managing the complexity of the emerging HEVC standard is a matter of academic and industrial research since its earlier versions. The sophisticated and computation-intensive tools involved in the encoding process must be leveraged if real-time applications are considered. In this paper, we propose a workload management scheme for dynamically controlling the computational complexity of HEVC, under user-defined operation frequency and target FPS. Our scheme receives these two parameters as input and aims to meet the target FPS by adjusting different encoding parameters during execution time. Experiments demonstrate that our scheme successfully meets the target FPS while introducing negligible rate-distortion losses. A comparison with state-of-the-art shows that our scheme is capable of achieving a time reduction of up to 43% for Full HD sequences, with a maximum loss of 0.03 dB in Y-PSNR and a 3.5% increase in bitrate.

international symposium on circuits and systems | 2011

A multilevel data reuse scheme for Motion Estimation and its VLSI design

Mateus Grellert; Felipe Sampaio; Júlio C. B. de Mattos; Luciano Volcan Agostini

Motion Estimation (ME) in video coding is a vital component that excels not only in computational complexity, but off-chip memory bandwidth as well. These two issues are considered critical constraints in terms of High Definition (HD) video coding, since a large volume of data must be processed. The multilevel data reuse scheme proposed in this paper is able to reduce the off-chip memory bandwidth, with direct impact in throughput and energy consumption. This scheme explores the concept of overlapped Search Windows (SW) in more than one level and poses no harm to video quality. Comparisons with related works show that this solution provides the best tradeoff between the use of on-chip memory and reduction of the off-chip memory bandwidth. The data reuse scheme was applied in a ME architecture and the synthesis results show that this solution presented the lowest use of hardware resources and the highest operation frequency among related works. The proposed architecture is able to process 1080p videos at 25 fps, and the reduction ratio of off-chip memory access achieved by the architecture is greater than 95% when compared to the traditional method.

international symposium on circuits and systems | 2015

Rate-distortion and energy performance of HEVC and H.264/AVC encoders: A comparative analysis

Eduarda Monteiro; Mateus Grellert; Sergio Bampi; Bruno Zatt

A quantitative, systematic, and detailed analysis of the energy impacts of the tools that comprise two of the most recent video coding standards: the High Efficiency Video Coding (HEVC) and the H.264/AVC is presented. Our comparative study measures the energy consumption effects of important video-coding parameters, like Search Range (SR), Quantization Parameter (QP), and video resolution on both encoders. The obtained results for HEVC showed, for the Random Access (RA) prediction structure, gains of 25% in BD-Rate over H.264/AVC at the expense of 17% higher energy consumption. A new metric we defined herein, called BD-Energy, was used in the SR analysis, and the results from this investigation showed HEVC achieved an energy consumption up to 37.6% higher for a BD-Rate gain of 32.2%. The QP analysis demonstrated that the energy consumption gap between both encoders varies greatly as QP increases, resulting in a 15.08% difference from QP 22 to QP 37, on average. The major finding from our work is that the HEVC encoder presents better results in the energy/compression trade-off, but this efficiency is reduced as encoding becomes more complex, as our results discovered that the HEVC energy consumption scales faster.

power and timing modeling optimization and simulation | 2014

Rate-distortion and energy performance of HEVC video encoders

Eduarda Monteiro; Mateus Grellert; Bruno Zatt; Sergio Bampi

A quantitative and systematic analysis of the energy impacts of the tools that comprise the most recent video-coding standard - the High Efficiency Video Coding (HEVC) - is comprehensively presented in this paper. Our comparative study measures the energy consumption effects of several video coding parameters, like Motion Estimation Search Range (SR), Quantization Parameter (QP), and video resolution on a general purpose platform. In order to jointly compare important video-coding results like bitrate and PSNR, three new metrics that combines these values with the energy consumption are proposed, namely: (i) BD-Energy; (ii) ECR (Energy-Compression Rate); and (iii) WNERD (Weighted Normalized Energy Compression Distortion). We conclude in our analysis that increasing the SR causes little to no gains in compression, whereas the energy consumption increases constantly. In the RA prediction structure, the average compression gains achieved were 0.48%, while the energy consumption increased by 13.7% on average. For LB configuration, a similar result was obtained: 0.08% of BD-Rate Savings at the expense of a 19.6% energy increase. The QP investigation demonstrates that compression efficiency and energy consumption are greatly affected as this parameter increases, and there is no QP capable of maximizing the three axes of optimization (rate, distortion, and energy).

international conference on electronics, circuits, and systems | 2010

Memory-aware multiple reference frame motion estimation for the H.264/AVC standard

Mateus Grellert; Felipe Sampaio; Bruno Hecktheuer; Júlio C. B. de Mattos; Luciano Volcan Agostini

This paper presents an architecture for Multiple Reference Frame Motion Estimation (MRF-ME) targeting H.264/AVC standard. MRF introduces issues regarding processing throughput and memory access. In this context, this work proposes a memory-aware architecture for MRF-ME that relies on data reuse and current block parallelism. The data reuse scheme guarantees a reduction of almost 70% in the number of external memory accesses when compared with a traditional approach. The architecture was synthesized targeting a Xilinx FPGA device and it is capable of processing high definition (HD) videos in real time (30fps), with very good processing rates results when compared with related works. This solution is based on one single view, but it is being used as base to design an architecture which will be able to process multi view videos as defined in the H.264/AVC MVC extension.

IEEE Transactions on Circuits and Systems | 2017

Power-Efficient Sum of Absolute Differences Hardware Architecture Using Adder Compressors for Integer Motion Estimation Design

Bianca Silveira; Guilherme Paim; Brunno Abreu; Mateus Grellert; Cláudio Machado Diniz; Eduardo Costa; Sergio Bampi

Sum of absolute differences (SAD) calculation is one of the most time-consuming operations of video encoders compatible with the high efficiency video coding standard. SAD hardware architectures employ an adder tree to accumulate the coefficients from absolute difference between two video blocks. This paper exploits different adder compressors structures into the SAD hardware architecture. The architectures were synthesized to 45-nm CMOS standard cells. Synthesis results show that SAD architecture using 8–2 compressor composed with 4–2 compressors and Kogge–Stone adder in the recombination line reduces power dissipation by 25.5% on average when compared with the SAD architecture using conventional adders from a state-of-the-art synthesis tool. Our throughput analysis shows that the designed SAD units are capable of encoding full HD (

Journal of Real-time Image Processing | 2017