Svetislav Momcilovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Svetislav Momcilovic is active.

Explore More

Publication

Featured researches published by Svetislav Momcilovic.

IEEE Transactions on Multimedia | 2014

Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems

Svetislav Momcilovic; Aleksandar Ilic; Nuno Roma; Leonel Sousa

The high computational demands and overall encoding complexity make the processing of high definition video sequences hard to be achieved in real-time. In this manuscript, we target an efficient parallelization and RD performance analysis of H.264/AVC inter-loop modules and their collaborative execution in hybrid multi-core CPU and multi-GPU systems. The proposed dynamic load balancing algorithm allows efficient and concurrent video encoding across several heterogeneous devices by relying on realistic run-time performance modeling and module-device execution affinities when distributing the computations. Due to an online adjustment of load balancing decisions, this approach is also self-adaptable to different execution scenarios. Experimental results show the proposed algorithms ability to achieve real-time encoding for different resolutions of high-definition sequences in various heterogeneous platforms. Speed-up values of up to 2.6 were obtained when compared to the video inter-loop encoding on a single GPU device, and up to 8.5 when compared to a highly optimized multi-core CPU execution. Moreover, the proposed algorithm also provides an automatic tuning of the encoding parameters, in order to meet strict encoding constraints.

Eurasip Journal on Embedded Systems | 2007

Adaptive motion estimation processor for autonomous video devices

Tiago Dias; Svetislav Momcilovic; Nuno Roma; Leonel Sousa

Motion estimation is the most demanding operation of a video encoder, corresponding to at least 80% of the overall computational cost. As a consequence, with the proliferation of autonomous and portable handheld devices that support digital video coding, data-adaptive motion estimation algorithms have been required to dynamically configure the search pattern not only to avoid unnecessary computations and memory accesses but also to save energy. This paper proposes an application-specific instruction set processor (ASIP) to implement data-adaptive motion estimation algorithms that is characterized by a specialized datapath and a minimum and optimized instruction set. Due to its low-power nature, this architecture is highly suitable to develop motion estimators for portable, mobile, and battery-supplied devices. Based on the proposed architecture and the considered adaptive algorithms, several motion estimators were synthesized both for a Virtex-II Pro XC2VP30 FPGA from Xilinx, integrated within an ML310 development platform, and using a StdCell library based on a 0.18 μ m CMOS process. Experimental results show that the proposed architecture is able to estimate motion vectors in real time for QCIF and CIF video sequences with a very low-power consumption. Moreover, it is also able to adapt the operation to the available energy level in runtime. By adjusting the search pattern and setting up a more convenient operating frequency, it can change the power consumption in the interval between 1.6 mW and 15 mW.

signal processing systems | 2009

Development and evaluation of scalable video motion estimators on GPU

Svetislav Momcilovic; Leonel Sousa

This work proposes a scalable parallelization approach for H.264/AVC motion estimation on multi-cores, and its efficient implementation on the most recent Graphical Processing Units (GPUs). Very efficient motion estimators are achieved by applying efficient data reusing techniques and exploiting the computational power of the most recent GPUs. The proposed motion estimators have been programmed on the GPU with the Tesla architecture and CUDA. Experimental results show that the proposed approach outperforms for more than 3 times motion estimators presented in the most recent publications. Moreover, real time motion estimation is achieved even for 720×576 resolution and 25 frames per second. The scalability of the solution is shown by implementing the motion estimators on two GPUs with the same architecture but different number of cores. Therefore, the proposed approach is useful for the more powerful future GPUs.

complex, intelligent and software intensive systems | 2008

A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures

Svetislav Momcilovic; Leonel Sousa

The new advanced video coding (AVC) standards further exploit temporal correlation between images on a sequence by considering multiple reference frames and variable block sizes. It improves the compression efficiency at the cost of a significant computational load increasing. Specialized hardware processors have been proposed to perform real time motion estimation on AVC, but the non-recurring engineering cost of these solutions is too high. This paper describes a parallel algorithm that exploits the capacity of the current multi-core processors to implement real time motion estimation for AVC. In particular, by using the computational capacity and the fast memory system of the heterogeneous multicore CELL processor, synergistic processors can be used to speedup motion estimation while the main processor execute in parallel the other parts of the AVC system. Experimental results show that motion estimation can be performed in less than 40 ms per frame, for CIF video format, up to 5 reference frames, and variable block size, by programming the proposed parallel algorithm to the CELL processor.

conference on ph.d. research in microelectronics and electronics | 2007

An ASIP approach for adaptive AVC Motion Estimation

Svetislav Momcilovic; Nuno Roma; Leonel Sousa

A new algorithm and an adapted hardware architecture of an ASIP are proposed in this paper. When compared with other hardware ASIP implementations, this architecture significantly speeds up the motion estimation procedure and substantially decreases the memory requirements. Moreover, it also makes use of significantly fewer memory accesses, still maintaining its coding quality performances in what concerns both the obtained bit rate and PSNR. As a consequence, the proposed algorithm proves to be specially adequate to be implemented in most embedded systems with restricted computational and power resources that are often adopted by portable and battery supplied devices.

digital systems design | 2006

Application Specific Instruction Set Processor for Adaptive Video Motion Estimation

Svetislav Momcilovic; Tiago Dias; Nuno Roma; Leonel Sousa

Motion estimation is the most demanding operation of a video encoder, corresponding to at least 80% of the overall computational cost. With the proliferation of portable handheld devices that support digital video coding, data-adaptive motion estimation algorithms have been required to dynamically configure the search pattern not only to avoid unnecessary computations and memory accesses but also to save energy. This paper proposes an application specific instruction set processor (ASIP) to implement data-adaptive motion estimation algorithms, that is characterized by a specialized data-path and minimum and optimized instruction set. Due to its low-power nature, this architecture is specially adequate to develop motion estimators for portable, mobile and battery supplied devices. A cycle-based accurate simulator was also developed for the proposed ASIP and fast and data-adaptive search algorithms have been implemented, namely, the four-step search and the motion vector field adaptive search algorithms. Based on the proposed ASIP and the considered adaptive algorithms, several motion estimators were synthesized in 0.13mum CMOS technology. Experimental results show that very-low power adaptive motion estimators have been achieved to encode QCIF video sequences

international conference on parallel processing | 2012

Multi-level parallelization of advanced video coding on hybrid CPU+GPU platforms

Svetislav Momcilovic; Nuno Roma; Leonel Sousa

A dynamic model for parallel H.264/AVC video encoding on hybrid GPU+CPU systems is proposed. The entire inter-prediction loop of the encoder is parallelized on both the CPU and the GPU, and a computationally efficient model is proposed to dynamically distribute the computational load among these processing devices on hybrid platforms. The presented model includes both dependency aware task scheduling and load balancing algorithms. According to the obtained experimental results, the proposed dynamic load balancing model is able to push forward the computational capabilities of these hybrid parallel platforms, achieving a speedup of up to 2 when compared with other equivalent state-of-the-art solutions. With the presented implementation, it was possible to encode 25 frames per second for HD 1920×1080 resolution, even when exhaustive motion estimation is considered.

international conference on digital signal processing | 2007

Adaptive Motion Estimation Algorithm for H.264/AVC

Svetislav Momcilovic; Nuno Roma; Leonel Sousa

A new adaptive motion estimation algorithm is proposed in this paper. When compared with other fast search approaches, such as the H.264/AVC oriented EPZS algorithm, this algorithm significantly speeds up the motion estimation procedure and substantially decreases the memory requirements. Moreover, it also makes use of significantly fewer memory accesses, still maintaining its coding quality performances in what concerns both the obtained bit rate and PSNR. As a consequence, the proposed algorithm proves to be specially adequate to be implemented in most embedded systems with restricted computational and power requirements that are often adopted by portable and battery supplied devices.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems

Aleksandar Ilic; Svetislav Momcilovic; Nuno Roma; Leonel Sousa

To challenge real-time encoding of high-definition video sequences on heterogeneous desktop systems, a collaborative central processing units (CPU) + graphics processing unit (GPU) framework for interloop video encoding is proposed herein. The proposed framework considers the overall complexity of the collaborative interloop encoding as a unified optimization problem. Several functional blocks are integrated for simultaneous execution control, automatic data access management, performance characterization, and adaptive scheduling and load balancing. These blocks aim at fully exploiting the performance of heterogeneous devices, asymmetric bandwidth of communication links, and several levels of concurrency between computation and communication. To support a wide range of CPU and GPU architectures, a specific encoding library is developed with highly optimized algorithms for all interloop modules. The experimental results show that the proposed framework allows achieving a real-time encoding of full high-definition sequences in several CPU + GPU systems. It also delivers performance improvements of up to 61.2% over the state-of-the-art solution, while outperforming individual GPU and quad-core CPU executions by more than 2 and 5 times, respectively.

international conference on image processing | 2014

Reconfigurable data flow engine for HEVC motion estimation

Thomas D'huys; Svetislav Momcilovic; Frederico Pratas; Leonel Sousa

High Efficiency Video Coding (HEVC) standard achieves enhanced compression efficiency in comparison to previous standards, at the cost of a dramatic increase of the computational load. In order to cope with such computational requirements, and to challenge the real-time encoding of High Definition (HD) video sequences with the HEVC standard, we propose herein a reconfigurable architecture design for the most computationally demanding motion estimation module, considering highly efficient Full-Search Block-Matching algorithm. The proposed architecture supports Prediction Blocks (PBs) sizes ranging from 8×8 to 64×64 pixels (also considering non-square shapes), and search areas as large as 256×256 pixels. Furthermore, this reconfigurable approach leverages the trade-off between maximum performance and minimum resource usage. Experimental results show that the proposed architecture is able of achieving real-time motion estimation with more than 26.9 fps, for 1080p video formats, a 64×64 pixels search area and 1 reference frame, by relying on a Xilinx Virtex 5 FPGA implementation. Moreover, a performance superior to the NVIDIA Fermi-based GPU implementation, for up to 25%, was achieved.

Explore More