Mauricio Alvarez-Mesa

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauricio Alvarez-Mesa is active.

Explore More

Publication

Featured researches published by Mauricio Alvarez-Mesa.

IEEE Transactions on Circuits and Systems for Video Technology | 2012

Parallel Scalability and Efficiency of HEVC Parallelization Approaches

Chi Ching Chi; Mauricio Alvarez-Mesa; Ben H. H. Juurlink; Gordon Clare; Félix Henry; Stéphane Pateux; Thomas Schierl

Unlike H.264/advanced video coding, where parallelism was an afterthought, High Efficiency Video Coding currently contains several proposals aimed at making it more parallel-friendly. A performance comparison of the different proposals, however, has not yet been performed. In this paper, we will fill this gap by presenting efficient implementations of the most promising parallelization proposals, namely tiles and wavefront parallel processing (WPP). In addition, we present a novel approach called overlapped wavefront (OWF), which achieves higher performance and efficiency than tiles and WPP. Experiments conducted on a 12-core system running at 3.33 GHz show that our implementations achieve average speedups, for 4k sequences, of 8.7, 9.3, and 10.7 for WPP, tiles, and OWF, respectively.

international conference on acoustics, speech, and signal processing | 2012

Parallel video decoding in the emerging HEVC standard

Mauricio Alvarez-Mesa; Chi Ching Chi; Ben H. H. Juurlink; Valeri George; Thomas Schierl

In this paper we propose and evaluate a parallelization strategy for the emerging HEVC video coding standard. The proposed strategy is based on entropy slices which allows exploiting parallelism in the entropy decoding stage while maintaining high coding efficiency. Our approach requires to encode videos with one entropy slice per LCU row in order to decode multiple LCU rows in a wavefront parallel manner. Evaluations performed on a PC with 12 Intel Xeon cores running at 3.3 GHz show that it is possible to achieve real-time performance for 1920×1080p50 (53.1 fps) and 2560×1600 (29.5fps) video resolutions with speedups of 5.2× and 6.3× compared to sequential execution, respectively.

international symposium on performance analysis of systems and software | 2013

How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator

Jan Lucas; Sohan Lal; Michael Andersch; Mauricio Alvarez-Mesa; Ben H. H. Juurlink

Modern GPUs are true power houses in every meaning of the word: While they offer general-purpose (GPGPU) compute performance an order of magnitude higher than that of conventional CPUs, they have also been rapidly approaching the infamous “power wall”, as a single chip sometimes consumes more than 300W. Thus, the design space of GPGPU microarchitecture has been extended by another dimension: power. While GPU researchers have previously relied on cycle-accurate simulators for estimating performance during design cycles, there are no simulation tools that include power as well. To mitigate this issue, we introduce the GPUSimPow power estimation framework for GPGPUs consisting of both analytical and empirical models for regular and irregular hardware components. To validate this framework, we build a custom measurement setup to obtain power numbers from real graphics cards. An evaluation on a set of well-known benchmarks reveals an average relative error of 11.7% between simulated and hardware power for GT240 and an average relative error of 10.8% for GTX580. The simulator has been made available to the public [1].

signal processing systems | 2013

Parallel HEVC Decoding on Multi- and Many-core Architectures

Chi Ching Chi; Mauricio Alvarez-Mesa; Jan Lucas; Ben H. H. Juurlink; Thomas Schierl

The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially.

IEEE Transactions on Circuits and Systems for Video Technology | 2015

SIMD Acceleration for HEVC Decoding

Chi Ching Chi; Mauricio Alvarez-Mesa; Benjamin Bross; Ben H. H. Juurlink; Thomas Schierl

Single instruction multiple data (SIMD) instructions have been commonly used to accelerate video codecs. The recently introduced High Efficiency Video Coding (HEVC) codec like its predecessors is based on the hybrid video codec principle and, therefore, is also well suited to be accelerated with SIMD. In this paper we present the SIMD optimization for the entire HEVC decoder for all major SIMD instruction set architectures. Evaluation has been performed on 14 mobile and PC platforms covering most major architectures released in recent years. With SIMD, up to 5× speedup can be achieved over the entire HEVC decoder, resulting in up to 133 and 37.8 frames/s on average on a single core for Main profile 1080p and Main10 profile 2160p sequences, respectively.

Scalable Parallel Programming Applied to H.264/AVC Decoding | 2012

Scalable Parallel Programming Applied to H.264/AVC Decoding

Ben H. H. Juurlink; Mauricio Alvarez-Mesa; Chi Ching Chi; Arnaldo Azevedo; Cor Meenderinck; Alex Ramirez

Existing software applications should be redesigned if programmers want to benefit from the performance offered by multi- and many-core architectures. Performance scalability now depends on the possibility of finding and exploiting enough Thread-Level Parallelism (TLP) in applications for using the increasing numbers of cores on a chip. Video decoding is an example of an application domain with increasing computational requirements every new generation. This is due, on the one hand, to the trend towards high quality video systems (high definition and frame rate, 3D displays, etc) that results in a continuous increase in the amount of data that has to be processed in real-time. On the other hand, there is the requirement to maintain high compression efficiency which is only possible with video codes like H.264/AVC that use advanced coding techniques. In this book, the parallelization of H.264/AVC decoding is presented as a case study of parallel programming. H.264/AVC decoding is an example of a complex application with many levels of dependencies, different kernels, and irregular data structures. The book presents a detailed methodology for parallelization of this type of applications. Itbegins witha description of the algorithm, an analysis of the data dependencies and an evaluation of the different parallelization strategies. Then the design and implementation of a novel parallelization approach is presented that is scalable to many core architectures. Experimental results on different parallel architectures are discussed in detail. Finally, an outlook is given on parallelization opportunities in the upcoming HEVC standard.

international conference on image processing | 2012

Improving the parallelization efficiency of HEVC decoding

Chi Ching Chi; Mauricio Alvarez-Mesa; Ben H. H. Juurlink; Valeri George; Thomas Schierl

In this paper we present a new parallelization approach for HEVC decoding called Overlapped Wavefront (OWF). It is based on wavefront processing and improves its parallelization efficiency by allowing overlapped execution of consecutive pictures. Furthermore, in this strategy of the decoding steps are performed on a CTB basis rather than on a picture basis, which improves data locality. Our implementation achieves between 29.6%, 42.4%, and 66.6% higher frame rates compared to previous results and 11.3%, 21.0%, and 38.0% higher frame rates compared to Tiles, for 2160p, 1600p, and 1080p, respectively.

Proceedings of SPIE | 2013

HEVC real-time decoding

Benjamin Bross; Mauricio Alvarez-Mesa; Valeri George; Chi Ching Chi; Tobias Mayer; Ben H. H. Juurlink; Thomas Schierl

The new High Efficiency Video Coding Standard (HEVC) was finalized in January 2013. Compared to its predecessor H.264 / MPEG4-AVC, this new international standard is able to reduce the bitrate by 50% for the same subjective video quality. This paper investigates decoder optimizations that are needed to achieve HEVC real-time software decoding on a mobile processor. It is shown that HEVC real-time decoding up to high definition video is feasible using instruction extensions of the processor while decoding 4K ultra high definition video in real-time requires additional parallel processing. For parallel processing, a picture-level parallel approach has been chosen because it is generic and does not require bitstreams with special indication.

ACM Transactions on Architecture and Code Optimization | 2015

Low-Power High-Efficiency Video Decoding using General-Purpose Processors

Chi Ching Chi; Mauricio Alvarez-Mesa; Ben H. H. Juurlink

In this article, we investigate how code optimization techniques and low-power states of general-purpose processors improve the power efficiency of HEVC decoding. The power and performance efficiency of the use of SIMD instructions, multicore architectures, and low-power active and idle states are analyzed in detail for offline video decoding. In addition, the power efficiency of techniques such as “race to idle” and “exploiting slack” with DVFS are evaluated for real-time video decoding. Results show that “exploiting slack” is more power efficient than “race to idle” for all evaluated platforms representing smartphone, tablet, laptop, and desktop computing systems.

Archive | 2012

Understanding the Application: An Overview of the H.264 Standard

Ben H. H. Juurlink; Mauricio Alvarez-Mesa; Chi Ching Chi; Arnaldo Azevedo; Cor Meenderinck; Alex Ramirez

Before any attempt to parallelize an application can be made, it is necessary to understand the application. Therefore, in this chapter we present a brief overview of the state-of-the-art H.264/AVC video coding standard. The H.264/AVC standard is based on the same hybrid structure as previous standards, but contains several new coding tools that increase the coding efficiency and quality. These new features increase the computational complexity of video encoding as well as decoding, however. Therefore, parallelism is a solution to obtain the performance required for real-time processing. The goal of this chapter is not to provide a detailed overview of H.264/AVC, but to provide sufficient background to be able to understand the remaining chapters.

Explore More