Roger Endrigo Carvalho Porto
Universidade Federal do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roger Endrigo Carvalho Porto.
international symposium on circuits and systems | 2006
Luciano Volcan Agostini; Roger Endrigo Carvalho Porto; José Luís Almada Güntzel; I. Saraiva Silva; Sergio Bampi
This paper presents the design of a high throughput multitransform and multiparallelism IP for H.264/AVC standard. This solution supports the five H.264/AVC transforms and it supports five different levels of parallelism. The proposed architecture were described in VHDL and synthesized to Altera Stratix and Xilinx Virtex-II Pro FPGAs and to TSMC 0.35mum standard cells. The multitransform and multiparallelism architecture mapped to FPGAs could process from 124 millions to 3.2 billions of samples per second, depending on the parallelism level selected. The standard cells version could process from 218.7 millions to 3.5 billions of samples per second. These results indicate that the proposed solution presents a high flexibility and that this solution is able to be used in various H.264/AVC codecs with different performance requirements. The performance results of all experiments realized indicated that this architecture is able to be used in high definition applications, like HDTV
ieee computer society annual symposium on vlsi | 2009
Roger Endrigo Carvalho Porto; Luciano Volcan Agostini; Sergio Bampi
Amongst the video compression standards, the latest one is the H.264/AVC. This standard reaches the highest compression rates when compared to the previous standards. On the other hand, it has a high computational complexity. This high computational complexity makes it difficult the development of software applications running in a current processor when high definitions videos are considered. Thus, hardware implementations become essential. Addressing the hardware architectures, this work presents the architectural design for the variable block size motion estimation (VBSME) defined in the H.264/AVC standard. This architecture is based on full search motion estimation algorithm and SAD calculation. This architecture is able to produce the 41 motion vectors within a macroblock as specified in the standard. The implementation of this architecture was based on standard cell methodology in 0.18μm CMOS technology. The architecture reached a throughput of 34 1080HD frames per second.
international conference on electronics, circuits, and systems | 2010
Guilherme Corrêa; Cláudio Machado Diniz; Sergio Bampi; Daniel Palomino; Roger Endrigo Carvalho Porto; Luciano Volcan Agostini
In order to achieve the best coding performance, the H.264/AVC encoder must choose the best coding mode and the best block size in terms of bit-rate and distortion. The H.264/AVC reference software applies the Rate-Distortion Optimization (RDO) technique, which makes the encoding process a complex task in applications which require real-time operation. This paper presents a fast intra decision process and its architecture, where the mode and the block size decisions are performed based on the block distortion and homogeneity. The performed tests show that the method achieves PSNR results similar to the RDO technique and a low bit-rate increase. On the other hand, the gains in terms of complexity are near to 148 times when compared to RDO method. Also, the implemented architecture is capable of processing 1080p videos in real time.
Journal of the Brazilian Computer Society | 2007
Luciano Volcan Agostini; Arnaldo Pereira de Azevedo Filho; Wagston Tassoni Staehler; Vagner S. Rosa; Bruno Zatt; Ana Cristina Medina Pinto; Roger Endrigo Carvalho Porto; Sergio Bampi; Altamiro Amadeu Susin
This paper presents the architecture, design, validation, and hardware prototyping of the main architectural blocks of main profile H.264/AVC decoder, namely the blocks: inverse transforms and quantization, intra prediction, motion compensation and deblocking filter, for a main profile H.264/AVC decoder. These architectures were designed to reach high throughputs and to be easily integrated with the other H.264/AVC modules. The architectures, all fully H.264/AVC compliant, were completely described in VHDL and further validated through simulations and FPGA prototyping. They were prototyped using a Digilent XUP V2P board, containing a Virtex-II Pro XC2VP30 Xilinx FPGA. The post place-and-route synthesis results indicate that the designed architectures are able to process 114 million samples per second and, in the worst case, they are able to process 64 HDTV frames (1080×1920) per second, allowing their use in H.264/AVC decoders targeting real time HDTV applications.
international midwest symposium on circuits and systems | 2006
Luciano Volcan Agostini; Marcelo Schiavon Porto; José Luís Almada Güntzel; Roger Endrigo Carvalho Porto; Sergio Bampi
This paper presents the design, the validation and the prototyping of a H.264/AVC inverse transform and quantization architecture. This architecture was designed to reach high throughputs and to be easily integrated with other H.264/AVC modules. The architecture was completely described in VHDL and the VHDL code was behaviorally and post place-and-route validated through simulations, comparing the data generated by the architecture with the data extracted from the H.264/AVC reference software. Finally, the architecture was prototyped using a Digilent XUP V2P board that contains a Virtex-II Pro VP30 Xilinx FPGA. The architecture mapped to the target FPGA was stimulated in the prototyping board using a PowerPC processor that is hardwired in that FPGA. The prototype was validated and the results show that the designed architecture was working in accordance with the H.264/AVC standard. The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.
great lakes symposium on vlsi | 2006
Luciano Volcan Agostini; Roger Endrigo Carvalho Porto; Sergio Bampi; Leandro Rosa; José Luís Almada Güntzel; Ivan Saraiva Silva
This paper presents a high throughput hardware for the complete H.264/AVC forward transforms block. There are three different transform inside this block and the presented architecture synchronizes these transforms, generating a constant processing rate in its outputs. This is an important characteristic of this architecture that was designed to be easily integrated to the other H.264/AVC blocks. The architecture does not use memory bits and the transforms in two dimensions are calculated directly, without the use of the separability property. The architecture was described in VHDL and was validated and prototyped using a Xilinx Virtex II Pro FPGA. The synthesis was directed to a VP30 FPGA and to a TSMC 0.35μm standard-cell technology. The throughputs of the T block architecture for these two different technologies reaches a processing rate higher than 120 million of samples per second, allowing its use in H.264/AVC codecs directed to HDTV.
digital systems design | 2005
Luciano Volcan Agostini; Roger Endrigo Carvalho Porto; Sergio Bampi; Ivan Saraiva Silva
This paper presents the design and implementation of a multiplierless JPEG compressor for gray scale images. The modules of this architecture were fully pipelined and targeted to FPGA device implementation. The designed architectures are detailed in this paper and they were described in VHDL, simulated and physically mapped to Altera Flex10KE FPGAs. The JPEG compressor pipeline has a minimum latency of 238 clock cycles, given the full modular pipeline depth. The minimum compressor period is 26.6ns and the compressor is able to process 37.6 millions of pixels per second. For example, the compressor can process a 640x480 pixels still image in 8.2 ms, reaching a maximum processing rate of 122.4 frames per second.
design, automation, and test in europe | 2004
Roger Endrigo Carvalho Porto; Luciano Volcan Agostini
This paper presents a project space exploration on the baseline JPEG compressor proposed and implemented in previous works. This exploration took as basis the substitution of the operators used in the 2-D DCT calculation architecture of the compressor and the consequent evaluation of impact in terms of performance and resources utilization. This substitution was made with main focus in the carry lookahead, hierarchical carry lookahead and carry select architectures, with the objective to increase the JPEG compressor performance. As the compressor architecture was designed in an hierarchical mode the operators substitution was an activity quite simple, because it has not involved the other hierarchy levels. The operators were described in VHDL, synthesized and validated. They were inserted in the 2-D DCT architecture for synthesis in the whole module. The 2-D DCT was synthesized for an altera FPGA. With this project space exploration, the highest performance obtained for the 2-D DCT was 23% higher than the original, using 11% more logic cells.
rapid system prototyping | 2007
Vagner S. Rosa; Wagston Tassoni Staehler; Arnaldo Azevedo; Bruno Zatt; Roger Endrigo Carvalho Porto; Luciano Volcan Agostini; Sergio Bampi; Altamiro Amadeu Susin
This paper presents the prototyping strategy used to validate the designed modules of a main profile H.264/AVC video decoder designed to achieve 1080p HDTV resolution, implemented in a FPGA. All modules designed were completely described in VHDL and further validated through simulations. The post place-and-route synthesis results indicate that the designed architectures are able to target real time when processing HDTV 1080p frames (1080times1920). The architectures were prototyped using a Digilent XUP V2P board, containing a Virtex-II Pro XC2VP30 Xilinx FPGA. The prototyping strategy used an embedded Power PC and associated logic and buffering to control the modules under prototyping. A host computer, running the reference software, was used to generate the input stimuli and to compare the results, through a RS-232 serial interface.
multimedia signal processing | 2017
Roger Endrigo Carvalho Porto; Luciano Volcan Agostini; Bruno Zatt; Marcelo Schiavon Porto; Nuno Roma; Leonel Sousa
Energy efficiency has become a primary concern in the design of multimedia digital systems, particularly when targeting mobile devices. Approximate computing is a highly promising approach to address this challenge. This paper presents an architectural exploration in a variable block size motion estimation (VBSME) architecture using imprecise Lower-Part-OR Adders (LOA). These adders were applied to Sum of Absolute Differences units (SAD) in order to reduce the energy consumption while introducing a minimum impact on the coding efficiency. Three VBSME architectures with LOA operators were developed by considering different imprecision levels. The conducted evaluations, performed using the High-Efficiency Video Coding standard (HEVC) reference software, showed that this technique introduces a negligible impact on the coding efficiency (between 0.6% and 2.5% increase of the BD-Rate). Nevertheless, when the designed architectures were synthesized for a 45nm standard cells technology, significant power savings were observed (between 7% and 11.5%, depending on the used LOA version), demonstrating the viability and significant gains of the proposed approach.