Bart Pieters | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bart Pieters is active.

Explore More

Publication

Featured researches published by Bart Pieters.

IEEE Transactions on Circuits and Systems for Video Technology | 2011

Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively Parallel Architectures

Bart Pieters; Charles-Frederik Hollemeersch; Jan De Cock; Peter Lambert; Wesley De Neve; Rik Van de Walle

The deblocking filter in the MPEG-4 AVC/H.264 standard is computationally complex because of its high content adaptivity, resulting in a significant number of data dependencies. These data dependencies interfere with parallel filtering of multiple macroblocks (MBs) on massively parallel architectures. In this letter, we introduce a novel MB partitioning scheme for concurrent deblocking in the MPEG-4 AVC/H.264 standard, based on our idea of deblocking filter independency, a corrected version of the limited error propagation effect proposed in the letter. Our proposed scheme enables concurrent MB deblocking of luma samples with limited synchronization effort, independently of slice configuration, and is compliant with the MPEG-4 H.264/AVC standard. We implemented the method on the massively parallel architecture of the graphics processing unit (GPU). Experimental results show that our GPU implementation achieves faster-than real-time deblocking at 1309 frames per second for 1080p video pictures. Both software-based deblocking filters and state-of-the-art GPU-enabled algorithms are outperformed in terms of speed by factors up to 10.2 and 19.5, respectively, for 1080p video pictures.

Proceedings of SPIE | 2009

Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA

Bart Pieters; Charles Hollemeersch; Peter A. Lambert; Rik Van de Walle

To achieve the high coding efficiency the H.264/AVC standard offers, the encoding process quickly becomes computationally demanding. One of the most intensive encoding phases is motion estimation. Even modern CPUs struggle to process high-definition video sequences in real-time. While personal computers are typically equipped with powerful Graphics Processing Units (GPUs) to accelerate graphics operations, these GPUs lie dormant when encoding a video sequence. Furthermore, recent developments show more and more computer configurations come with multiple GPUs. However, no existing GPU-enabled motion estimation architectures target multiple GPUs. In addition, these architectures provide no early-out behavior nor can they enforce a specific processing order. We developed a motion search architecture, capable of executing motion estimation and partitioning for an H.264/AVC sequence entirely on the GPU using the NVIDIA CUDA (Compute Unified Device Architecture) platform. This paper describes our architecture and presents a novel job scheduling system we designed, making it possible to control the GPU in a flexible way. This job scheduling system can enforce real-time demands of the video encoder by prioritizing calculations and providing an early-out mode. Furthermore, the job scheduling system allows the use of multiple GPUs in one computer system and efficient load balancing of the motion search over these GPUs. This paper focuses on the execution speed of the novel job scheduling system on both single and multi-GPU systems. Initial results show that real-time full motion search of 720p high-definition content is possible with a 32 by 32 search window running on a system with four GPUs.

motion in games | 2011

Hybrid path planning for massive crowd simulation on the GPU

Aljosha Demeulemeester; Charles-Frederik Hollemeersch; Pieter Mees; Bart Pieters; Peter Lambert; Rik Van de Walle

In modern day games, it is often desirable to have many agents navigating intelligently through detailed environments. However, intelligent navigation remains a computationally expensive and complicated problem. In the past, the continuum crowds algorithm demonstrated the value of using a dynamic potential field to guide many agents to a common goal location. However this algorithm is prohibitively resource intensive for real time applications using large and detailed virtual worlds. In this paper, we propose a novel hybrid system that first uses a coarse A* path finding step. This helps to eliminate unnecessary work during the potential field generation by excluding areas of the world from the potential field calculation. Additionally, we show how an optimized potential field solver can be implemented on the GPU using the concepts of persistent threads and inter-block communication. Results show that our system achieves considerable speedups compared to existing path planning systems and that up to 100,000 agents can be simulated and rendered in real time on a mainstream GPU.

workshop on image analysis for multimedia interactive services | 2007

Motion Compensation and Reconstruction of H.264/AVC Video Bitstreams using the GPU

Bart Pieters; D. Van Rijsselbergen; W. De Neve; R. Van de Walle

Most modern computers are equipped with powerful yet cost-effective graphics processing units (GPUs) to accelerate graphics operations. Although programmable shaders on these GPUs were designed for the creation of 3-D rendering effects, they can also be used as generic processing units for vector data. This paper proposes a hardware Tenderer capable of executing motion compensation, reconstruction, and visualization entirely on the GPU by the use of vertex and pixel shaders. Our measurements show that a speedup of 297% can be achieved by relying on the processing power of the GPU, relative to the CPU. As an example, real-time playback of high-definition video (1080 p) was achieved at 62.0 frames per second, consuming only 68.2% of all CPU cycles on a modern machine.

Proceedings of SPIE | 2007

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Bart Pieters; Dieter Van Rijsselbergen; Wesley De Neve; Rik Van de Walle

The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model. This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation (MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based H.264/AVC renderers.

international conference on image processing | 2011

Ultra High Definition video decoding with Motion JPEG XR using the GPU

Bart Pieters; Jan De Cock; Charles-Frederik Hollemeersch; Jeroen Wielandt; Peter Lambert; Rik Van de Walle

Many applications require real-time decoding of high-resolution video pictures, for example, quick editing of video sequences in video editing applications. To increase decoding speed, parallelism can be exploited, yet, block-based image and video coding standards are difficult to decode in parallel because of the high number of dependencies between blocks. This paper investigates the parallel decoding capabilities of the new JPEG XR image coding standard for use on the massively-parallel architecture of the GPU. The potential of parallelism of the hierarchical frequency coding scheme used in the standard is addressed and a parallel decoding scheme is described suitable for real-time decoding of Ultra High Definition (4320p) Motion JPEG XR video sequences. Our results show a decoding speed of up to 46 frames per second for Ultra High Definition (4320p) sequences with high-dynamic range (32-bit/4:2:0) luma and chroma components.

The Visual Computer | 2012

A new approach to combine texture compression and filtering

Charles-Frederik Hollemeersch; Bart Pieters; Peter Lambert; Rik Van de Walle

Texture mapping has been widely used to improve the quality of 3D rendered images. To reduce the storage and bandwidth impact of texture mapping, compression systems are commonly used. To further increase the quality of the rendered images, texture filtering is also often adopted. These two techniques are generally considered to be independent. First, a decompression step is executed to gather texture samples, which is then followed by a separate filtering step. We have investigated a system based on linear transforms that merges both phases together. This allows more efficient decompression and filtering at higher compression ratios. This paper formally presents our approach for any linear transformation, how the commonly used discrete cosine transform can be adapted to this new approach, and how this method can be implemented in real time on current-generation graphics cards using shaders. Through reuse of the existing hardware filtering, fast magnification and minification filtering is achieved. Our implementation provides fully anisotropically filtered samples four to six times faster than an implementation using two separate phases for decompression and filtering. Additionally, our transform-based compression also provides increased and variable compression ratios over standard hardware compression systems at a comparable or better quality level.

conference on multimedia modeling | 2012

Real-time visualizations of gigapixel texture data sets using HTML5

Charles-Frederik Hollemeersch; Bart Pieters; Aljosha Demeulemeester; Peter Lambert; Rik Van de Walle

With the recent standardization of WebGL as part of HTML5, new possibilities have arisen for graphically intensive web-based applications. This paper presents our gigapixel texture visualization system which runs entirely within the limitations of a standards-compatible browser. Compared to existing approaches, our system offers high- performance 3D texture visualization and streaming without any dedicated plugins. We show that real-time performance can be achieved (less than 12ms render time per frame) on current-generation desktop hardware for texture data sets of at least 15 gigapixels.

Signal Processing-image Communication | 2012

Data-parallel intra decoding for block-based image and video coding on massively parallel architectures

Bart Pieters; Charles-Frederik Hollemeersch; Jan De Cock; Peter Lambert; Rik Van de Walle

With the increasing number of processor cores available in modern computing architectures, task or data parallelism is required to maximally exploit the available hardware and achieve optimal processing speed. Current state-of-the-art data-parallel processing methods for decoding image and video bitstreams are limited in parallelism by dependencies introduced by the coding tools and the number of synchronization points introduced by these dependencies, only allowing task or coarse-grain data parallelism. In particular, entropy decoding and data prediction are bottleneck coding tools for parallel image and video decoding. We propose a new data-parallel processing scheme for block-based intra sample and coefficient prediction that allows fine-grain parallelism and is suitable for integration in current and future state-of-the-art image and video codecs. Our prediction scheme enables maximum concurrency, independent of slice or tile configuration, while minimizing synchronization points. This paper describes our data-parallel processing scheme for one- and two-dimensional prediction and investigates its application to block-based image and video codecs using JPEG XR and H.264/AVC Intra as a starting point. We show how our scheme enables faster decoding than the state-of-the-art wavefront method with speedup factors of up to 21.5 and 7.9 for JPEG XR and H.264/AVC Intra coding tools respectively. Using the H.264/AVC Intra coding tool, we discuss the requirements of the algorithm and the impact on decoded image quality when these requirements are not met. Finally, we discuss the impact on coding rate in order to allow for optimal parallel intra decoding.

Computers & Graphics | 2010

Graphics for Serious Games: Infinitex: An interactive editing system for the production of large texture data sets

Charles-Frederik Hollemeersch; Bart Pieters; Aljosha Demeulemeester; Frederik Cornillie; Bert Van Semmertier; Erik Mannens; Peter Lambert; Piet Desmet; Rik Van de Walle

Recent advancements in graphics hardware have made the use of texture streaming methods feasible for real-time applications. Using these methods, not only texture resolution and detail can be increased up to gigapixel resolution, but when used together with well authored textures these techniques can offer dramatically improved visual quality. However, systems aiding in texture data production itself have received a lot less attention than the streaming and rendering problem. When using current production methods in a texture streaming environment, these methods tend to break down and reduce artist efficiency to the point where the technology is no longer used to its full potential. In this paper we describe the details behind our Infinitex system. Infinitex is a texture creation and editing system that allows the users, i.e. artists, to produce large texture data in an intuitive and interactive way. Our system goes beyond a simple editor, as it incorporates the whole production process from the initial empty environment until the final finished product and addresses all the challenges that arise along the way when producing gigabytes of texture data. In particular, we will focus on versioning, management, continuity, and security. We show how our system, through the use of just-in-time tile generation, offers interactive editing and management operations while meeting all the other constraints imposed on the system.

Explore More