Marcelo Schiavon Porto
Universidade Federal do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcelo Schiavon Porto.
international conference on image processing | 2014
Gustavo Sanchez; Mário Saldanha; Gabriel Balota; Bruno Zatt; Marcelo Schiavon Porto; Luciano Volcan Agostini
This paper presents a new mode decision for the depth maps intra-frame prediction in 3D-HEVC. The proposed technique decides if the traditional High Efficiency Video Coding-based (HEVC) intra-frame prediction should be performed or skipped. This technique is inspired by the fact that traditional intra-frame prediction may generate artifacts in the synthesized views when an edge is encoded. The Simplified Edge Detector (SED) algorithm has been proposed to classify if a block contains an edge or a nearly constant region demanding a minimum processing overhead. Through software evaluations, SED algorithm was capable to obtain an average complexity reduction of 23.8% for depth maps coding with no quality losses.
international conference on image processing | 2010
Bruno Zatt; Marcelo Schiavon Porto; Jacob Scharcanski; Sergio Bampi
This paper presents a new method for high efficiency video coding using an adaptive GOP structure based on video content for the H.264/AVC standard. The available H.264/AVC encoders typically use static GOP sizes that define how the frames I (Intra), P (Predictive) and B (Bi-predictive) are positioned during de coding process. However, by analyzing the video content it is possible to identify the optimum position for each type of frame inside the GOP. The proposed method analyses the video content and finds the best position for inserting I frames in the video sequence. Thus the GOP structure can assume different sizes, depending on the video content. The results for test sequences and real videos show that the proposed method can significantly reduce the required bit rate, comparing to the static GOP sizes, with reduced PSNR losses. The proposed adaptive GOP presents a gain, in terms of bit rate reduction for real movies, of 8.6%, 15%, 24.7% and 40.8% in comparison with static GOP sizes 32, 16, 8 and 4, respectively.
symposium on integrated circuits and systems design | 2011
Marcelo Schiavon Porto; Gustavo Sanchez; Diego Noble; Luciano Volcan Agostini; Sergio Bampi
This paper presents an efficient hardware architecture for motion estimation (ME) in high resolution digital videos. This architecture is built upon the new Multi-Point Diamond Search algorithm (MPDS) which is a fast algorithm that increases the ME quality when compared with other fast algorithms. The MPDS is able to reduce local minima falls, increasing the quality results of the motion vectors since less residual energy needs to be further encoded, especially in high definition videos. Two versions of this architecture are presented. The first one focuses on high performance, and the second one focuses on quality improvement. Both architectures were described in VHDL and synthesized to Stratix 4 FPGA and TSMC 90nm standard cells technology. The synthesis results show that both architectures achieve performance to process HD 1080p videos at 30 fps. These solutions also presented the lowest power consumption (4.5mW and 9mW) in comparison to the related works in the field, with a low hardware resources utilization.
International Journal of Reconfigurable Computing | 2012
Gustavo Sanchez; Felipe Sampaio; Marcelo Schiavon Porto; Sergio Bampi; Luciano Volcan Agostini
This paper presents a new fastmotion estimation (ME) algorithm targeting high resolution digital videos and its efficient hardware architecture design. The new Dynamic Multipoint Diamond Search (DMPDS) algorithm is a fast algorithm which increases the ME quality when compared with other fast ME algorithms. The DMPDS achieves a better digital video quality reducing the occurrence of local minima falls, especially in high definition videos. The quality results show that the DMPDS is able to reach an average PSNR gain of 1.85 dB when compared with the well-known Diamond Search (DS) algorithm. When compared to the optimum results generated by the Full Search (FS) algorithm the DMPDS shows a lose of only 1.03 dB in the PSNR. On the other hand, the DMPDS reached a complexity reduction higher than 45 times when compared to FS. The quality gains related to DS caused an expected increase in the DMPDS complexity which uses 6.4-times more calculations than DS. The DMPDS architecture was designed focused on high performance and low cost, targeting to process Quad Full High Definition (QFHD) videos in real time (30 frames per second). The architecture was described in VHDL and synthesized to Altera Stratix 4 and Xilinx Virtex 5 FPGAs. The synthesis results show that the architecture is able to achieve processing rates higher than 53 QFHD fps, reaching the real-time requirements. The DMPDS architecture achieved the highest processing rate when compared to related works in the literature. This high processing rate was obtained designing an architecture with a high operation frequency and low numbers of cycles necessary to process each block.
symposium on integrated circuits and systems design | 2008
Marcelo Schiavon Porto; Sergio Bampi; Altamiro Amadeu Susin; Luciano Volcan Agostini
This paper presents a high throughput and low hardware cost architecture for motion estimation (ME) using a Quarter Subsampled Diamond Search algorithm (QSDS) with Dynamic Iteration Control (DIC). QSDS-DIC is a new algorithm proposed in this paper, which was developed to focus an efficient hardware design of ME. QSDS-DIC is based on the well known Diamond Search algorithm (DS) and on the sub-sampling technique. Besides this, DIC was developed to allow better synchronization and quality. A software evaluation presents the average results for quality and computational cost of QSDS, QSDS-DIC, Full Search and DS block matching algorithms. The designed hardware architecture considered blocks with 16x16 samples. The architecture was described in VHDL and mapped to a Xilinx Virtex-4 FPGA and to TSMC 0.18¼m CMOS standard cells. Synthesis results for FPGA indicate that QSDS-DIC architecture is able to run at 213.3 MHz, while taking only 3610 LUTs. This architecture can reach real time for HDTV (1920x1080 pixels) in the worst case, and it can process 188 HDTV frames per second in the average case.
international midwest symposium on circuits and systems | 2006
Luciano Volcan Agostini; Marcelo Schiavon Porto; José Luís Almada Güntzel; Roger Endrigo Carvalho Porto; Sergio Bampi
This paper presents the design, the validation and the prototyping of a H.264/AVC inverse transform and quantization architecture. This architecture was designed to reach high throughputs and to be easily integrated with other H.264/AVC modules. The architecture was completely described in VHDL and the VHDL code was behaviorally and post place-and-route validated through simulations, comparing the data generated by the architecture with the data extracted from the H.264/AVC reference software. Finally, the architecture was prototyped using a Digilent XUP V2P board that contains a Virtex-II Pro VP30 Xilinx FPGA. The architecture mapped to the target FPGA was stimulated in the prototyping board using a PowerPC processor that is hardwired in that FPGA. The prototype was validated and the results show that the designed architecture was working in accordance with the H.264/AVC standard. The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.
visual communications and image processing | 2014
Gustavo Sanchez; Mário Saldanha; Gabriel Balota; Bruno Zatt; Marcelo Schiavon Porto; Luciano Volcan Agostini
This paper proposes a complexity reduction algorithm for the depth maps intra prediction of the emerging 3D High Efficiency Video Coding standard (3D-HEVC). The 3D-HEVC introduces a new set of specific tools for the depth map coding that includes four Depth Modeling Modes (DMM) and these new features have inserted extra effort on the intra prediction. This extra effort is undesired and contributes to increasing the power consumption, which is a huge problem especially for embedded-systems. For this reason, this paper proposes a complexity reduction algorithm for the DMM 1, called Gradient-Based Mode One Filter (GMOF). This algorithm applies a filter to the borders of the encoded block and determines the best positions to evaluate the DMM 1, reducing the computational effort of DMM 1 process. Experimental analysis showed that GMOF is capable to achieve, in average, a complexity reduction of 9.8% on depth maps prediction, when evaluating under Common Test Conditions (CTC), with minor impacts on the quality of the synthesized views.
latin american symposium on circuits and systems | 2014
Henrique Maich; Vladimir Afonso; Bruno Zatt; Luciano Volcan Agostini; Marcelo Schiavon Porto
This paper presents a compression analysis about the High Efficiency Video Coding (HEVC) standard targeting a computational effort reduction at the scope of the motion estimation (ME). Restricting the Prediction Units (PUs) - among a total of 24 sizes - to the 4 square-shaped sizes in the HEVC interframes prediction, it is possible to reduce in 74% the number of operations at the cost of 4% increase in the bit-rate, considering the Y-BD-Rate metric. Based on this evaluation, a simple hardware architecture is proposed to implement the Sum of Absolute Differences (SAD) used in the Fractional Motion Estimation (FME). The proposed architecture is able to calculate SAD with a rate of 30 Full HD (1920×1080) frames per second, requiring a frequency of 1.17GHz. It represents a 63% frequency reduction compared to a scenario where all 24 PU sizes are evaluated.
international conference on multimedia and expo | 2013
Dieison Silveira; Marcelo Schiavon Porto; Luciano Volcan Agostini
This paper presents the Reference Frame Context Adaptive Variable-Length Coder (RFCAVLC), which is a lossless solution to external memory bandwidth reduction in current video coding systems. The proposed approach is based on an adaptation of the traditional Huffman algorithm, and it uses eight static tables to avoid the cost of the on-the-fly statistical analysis. The best table to encode a block is defined using a context evaluation, resulting in a context-adaptive configuration. The use of RFCAVLC reached an average compression rate higher than 31% for the evaluated video sequences. The architectures that implement the RFCAVLC encoder and decoder were designed and synthesized to an FPGA device. The RFCAVLC design is able to reach real-time encoding for WQSXGA (3200 × 2048 pixels) at 30 fps. The synthesis results show that this solution can be easily coupled to a complete video encoder system with negligible hardware overhead and without compromising the throughput for real-time high-definition multimedia applications.
symposium on integrated circuits and systems design | 2013
Ruhan Conceicao; J. Claudio de Souza; Ricardo Jeske; Marcelo Schiavon Porto; Júlio C. B. de Mattos; Luciano Volcan Agostini
This paper is focused in the inverse transforms defined in the video coding standard HEVC - High Efficiency Video Coding. The transforms stage is one of the innovations proposed by HEVC since it allows the use of the biggest number of transforms sizes (four) and also the biggest transform sizes (till 32×32) when compared with previous standards. The inverse DCT is performed by the video encoder and decoder as well. This paper presents an efficient hardware design for the 32×32 HEVC IDCT based on the separability principle. The hardware design was planned to reach real time processing (at least 30 frames per second) for high resolution videos, exploiting a high parallelism level (32 samples consumed per clock cycle). The architecture was also planned to reach a low latency and a low cost, then it was designed in a purely combinational way and using a multiplierless approach. The synthesis process was targeted to an Altera Stratix IV FPGA. The synthesis results show that the designed architecture is capable to process more than 30 QFHD frames (3840×2160 pixels) per second, with a latency of 33 clock cycles.