Roberto R. Osorio
University of A Coruña
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roberto R. Osorio.
IEEE Transactions on Circuits and Systems for Video Technology | 2006
Roberto R. Osorio; Javier D. Bruguera
New image and video coding standards have pushed the limits of compression by introducing new techniques with high computational demands. The Advanced Video Coder (ITU-T H.264, AVC MPEG-4 Part 10) is the last international standard, which introduces new enhanced features that require new levels of performance. Among the new tools present in AVC, the context-based binary arithmetic coder (CABAC) offers significant compression advantage over baseline entropy coders. CABAC is meant to be used in AVCs Main and High Profiles, which target broadcast and video storage and distribution of standard and high-definition contents. In these applications, hardware acceleration is needed as the computational load of CABAC is high, challenging programmable processors. Moreover, rate-distortion optimization (RDO) increases CABACs load by two orders of magnitude. In this paper, we present a fast and new architecture for arithmetic coding adapted to the characteristics of CABAC, including optimized use of memory and context managing and fast processing able to encode more than two symbols per cycle. A maximum processing speed of 185 MHz has been obtained for 0.35 mu, able to encode high quality video in real time. Some of the proposed optimization may also be applied to software implementations obtaining significant improvements
digital systems design | 2004
Roberto R. Osorio; Javier D. Bruguera
In this paper we propose an efficient implementation of CABACs binary arithmetic coder and context management system. CABAC is the context adaptive binary arithmetic coder used in new H.264/AVC video standard. Arithmetic coding allows a significant enhancement in compression. However, implementation complexity is a drawback due to hardware cost and slowness. In this paper we show the need for a hardware implementation of arithmetic coding in current video compression systems. We propose a fast and efficient implementation of the encoding algorithm. We prove that memory accesses constitute a bottleneck and propose solutions that apply to the encoding algorithm and context management system. As a result, a fast architecture is presented, able to process one symbol per cycle.
digital systems design | 2005
Roberto R. Osorio; Javier D. Bruguera
In this work, a new architecture for binary arithmetic coding is presented in the context of the new AVC/H.264 standard for video coding. Among the new technologies included in AVC/H.264 a context adaptive binary arithmetic coder (CABAC) is used that outperforms the baseline entropy coder in a significant manner. In this work we justify the need for a new architecture that implements the unique characteristics of CABAC that are not found in other implementations of arithmetic coding. We show that a fast architecture is needed that combines short cycle time and application-aware scheduling in order to accomplish with the high computational demands. A number of optimizations are introduced that allow processing several symbols per cycle and reduce data binarization overhead. Implementation results are shown for a Virtex-II FPGA and the main conclusions are presented.
international conference on application specific array processors | 1995
Roberto R. Osorio; Elisardo Antelo; Javier D. Bruguera; Julio Villalba; E.L. Zapata
Many applications figure the evaluation of rotations at high speeds. However there is a trade-off between the chip area and the latency. In this paper we develop a digit on-line pipelined array architecture based on the radix-4 CORDIC algorithm in rotation mode. The radix-4 CORDIC algorithm halves the number of microrotations with respect the traditionally radix-2 algorithm with the drawback of a non-constant scale factor. Seeking a good compromise between silicon area and latency we have used digit on-line processing. This way the data inputs the processor in blocks of bits (digits) in MSD-first mode of processing. We have used redundant carry-save arithmetic to allow carry-free additions and on-line processing. The designed processor demonstrates to have a better performance than previous digit on-line architectures.
New Generation Computing | 2013
Iván Cores; Gabriel Rodríguez; María J. Martín; Patricia González; Roberto R. Osorio
The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to implement fault-tolerant applications. However, checkpointing parallel applications is expensive in terms of computing time, network utilization and storage resources. Thus, current checkpoint-recovery techniques should minimize these costs in order to be useful for large scale systems. In this paper three different and complementary techniques to reduce the size of the checkpoints generated by application-level checkpointing are proposed and implemented. Detailed experimental results obtained on a multicore cluster show the effectiveness of the proposed methods to reduce checkpointing cost.
application specific systems architectures and processors | 2008
Roberto R. Osorio; Javier D. Bruguera
Arithmetic coding is an efficient entropy compression method that achieves results close to the entropy limit and it is used in modern standards such as JPEG-2000 and H.264. Arithmetic decoding (AD) in H.264 video coding standard is a sequential task that takes a significant part of computing time. In present and future multicore and manycore systems, AD becomes a bottleneck as it cannot be parallelized, limiting the concurrent execution of other tasks. In this paper, an FPGA-based accelerator is proposed to speed-up AD in H.264 and enable parallel decoding at macroblock and frame levels scaling up to tens or hundreds of cores.
Eurasip Journal on Image and Video Processing | 2011
Alejandro Nieto; Victor M. Brea; David López Vilariño; Roberto R. Osorio
This paper examines the implementation of a retinal vessel tree extraction technique on different hardware platforms and architectures. Retinal vessel tree extraction is a representative application of those found in the domain of medical image processing. The low signal-to-noise ratio of the images leads to a large amount of low-level tasks in order to meet the accuracy requirements. In some applications, this might compromise computing speed. This paper is focused on the assessment of the performance of a retinal vessel tree extraction method on different hardware platforms. In particular, the retinal vessel tree extraction method is mapped onto a massively parallel SIMD (MP-SIMD) chip, a massively parallel processor array (MPPA) and onto an field-programmable gate arrays (FPGA).
digital systems design | 2006
Javier D. Bruguera; Roberto R. Osorio
AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and ITU-T, which offers a substantial compression gain when compared with H.263 and MPEG-4 simple profile. One of the main characteristics of H.264 is the introduction of a integer version of the discrete cosine transform initially applied to 4times4 pixels blocks, and later extended to 8times8 pixels for high quality video encoding. In this work, a unified architecture is proposed for parallel 8times8 integer DCT and iDCT, also able to process 4times4 DCT, iDCT and Hadamard transform. A very fast quantization/de-quantization scheme is presented based on prediction that allows parallel quantization with a single multiplier. This architecture also implements all-zero detection, eliminating coefficients with high cost as specified in the standard and anticipates entropy encoding. The proposed design has been synthesized in AMS 0.35mu technology and achieves a maximum speed of 67 MHz
application-specific systems, architectures, and processors | 1997
Roberto R. Osorio; Javier D. Bruguera
In this paper we present new VLSI architectures for the arithmetic encoding and decoding of multilevel images. In these algorithms the speed is limited by their recursive natures and the arithmetic and memory access operations. They become specially critical in the case of decoding. In order to reduce the cycle length we propose working with two executions of the algorithm which alternate in the use of the pipelined hardware with a minimum increase in its cost.
digital systems design | 2006
Roberto R. Osorio; Javier D. Bruguera
In this paper a new technique is presented that combines memory compression in video encoders with fast and efficient motion estimation (ME). This technique is mainly oriented to embedded systems, which demand simple and power aware algorithms. Video encoding needs increasing amounts of memory for storing reference pictures. Memory compression allows reducing the footprint of the application, lowering the total implementation cost. In this paper, we combine memory compression and hierarchical ME so that the overhead associated to implement both techniques is shared. Thus, a net gain in processing speed is obtained, while reducing costs and power consumption