Hassen Loukil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hassen Loukil is active.

Explore More

Publication

Featured researches published by Hassen Loukil.

international conference on advanced technologies for signal and image processing | 2014

TZSearch pattern search improvement for HEVC motion estimation modules

Hassan Kibeya; Fatma Belghith; Hassen Loukil; Mohamed Ali Ben Ayed; Nouri Masmoudi

Motion estimation (ME) is a key operation for video compression. In fact, it contributes heavily to the compression efficiency by removing temporal redundancies. This process is the most critical part in a video encoder and can consume itself more than 50% of coding complexity or computational coding time. To reduce the computational time, many fast ME algorithms were proposed and implemented. The present paper proposes a fast ME algorithm that improves the basic Test Zone Search (TZS) ME algorithm which is considered to be one of the fastest ME algorithms and was implemented in HEVC reference software HM8.0. Experimental results show an improvement that can reach 8% up to 49% compared to the basic TZSearch.

international multi-conference on systems, signals and devices | 2009

Hardware architecture for H.264/AVC deblocking filter algorithm

Hassen Loukil; A. Ben Atitallah; Nouri Masmoudi

This paper presents novel hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264/AVC baseline profile video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for video conference applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. This architecture presents minimum latency, maximum throughput, full utilization of hardware resources and combining both pipelining and parallel processing techniques. The proposed architecture is implemented in VHDL. The VHDL code is verified to work at 150 MHz in an ALTERA Stratix II FPGA.

international multi-conference on systems, signals and devices | 2010

An efficient pipeline execution of H.264/AVC intra 4×4 frame design

S. Smaoui; Hassen Loukil; A. Ben Atitallah; Nouri Masmoudi

In this paper, we present an implementation of an optimized H.264 intra 4×4 algorithm in order to reduce the time of the intra 4×4 process. However the source of waste time in conventional architecture of intra 4×4 is the serialization of intra prediction and reconstruction of sixteen 4×4 blocks in one macroblock and the intra prediction of the current 4×4 block cannot be performed before the reconstruction of the previous 4×4 block. Therefore, for a high speed implementation we replaced the conventional one by a pipelined architecture while maintaining consistency with the standard. So we have studied ten alternative scanning orders based on rearranging order of intra 4×4 and we choose the best one which reduce dependencies between consecutively executed blocks without performance degradation. This order is implemented by a pipelined architecture using VHDL language. The VHDL code is verified to work at 100 MHz in an ALTERA Stratix II EP2S60F1020C3 FPGA. As a result, the processing time is reduced by 31.25% compared to the conventional implementation. So, it can be a good solution for real-time video application. The H.264 intra 4×4 hardware and software are demonstrated to work together on ALTERA NIOS-II development board with Stratix II EP2S60F1020C3 FPGA.

signal processing systems | 2012

FPGA architecture of the LDPS Motion Estimation for H.264/AVC Video Coding

Moez Kthiri; Hassen Loukil; Ahmed Ben Atitallah; Patrice Kadionik; Dominique Dallet; Nouri Masmoudi

Motion estimation is a highly computational demanding operation during video compression process and significantly affects the output quality of an encoded sequence. Special hardware architectures are required to achieve real-time compression performance. Many fast search block matching motion estimation (BMME) algorithms have been developed in order to minimize search positions and speed up computation but they do not take into account how they can be effectively implemented by hardware. In this paper, we propose three new hardware architectures of fast search block matching motion estimation algorithm using Line Diamond Parallel Search (LDPS) for H.264/AVC video coding system. These architectures use pipeline and parallel processing techniques and present minimum latency, maximum throughput and full utilization of hardware resources. The VHDL code has been tested and can work at high frequency in a Xilinx Virtex-5 FPGA circuit for the three proposed architectures.

international symposium on electronics and telecommunications | 2010

A parallel hardware architecture of deblocking filter in H264/AVC

Moez Kthiri; Patrice Kadionik; H. Levi; Hassen Loukil; A. Ben Atitallah; Nouri Masmoudi

This paper describes an efficient hardware architecture for the deblocking filter used in H.264/AVC baseline profile video coding standard and optimized for real time implementation. Thus, the deblocking filter is a computationally and data intensive tool resulting in an increased execution time of both encoding and decoding processes. In fact, the processing order of the filter and the memory organization are rearranged to facilitate the deblocking of the pixels in a parallel fashion and to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in synthesizable HDL at RTL level and verified with the reference software. This hardware is designed to be used as part of a complete H.264/AVC decoder.

international multi-conference on systems, signals and devices | 2009

Hardware implementation of fast block matching algorithm in FPGA for H.264/AVC

M. Kthiri; Hassen Loukil; Imen Werda; A. Ben Atitallah; A. Samet; Nouri Masmoudi

Motion estimation (ME) is one of the most time-consuming parts in video encoding system, and significantly affects the output quality of an encoded sequence. In this paper, we present hardware implementation of the Large Diamond Parallel search algorithm. This hardware is designed to be used as part of a complete H.264 video coding system. This architecture is simulated and tested using VHDL and synthesized using Altera Quartus II version 5.1. Also, This architecture presents minimum latency, maximum throughput, full utilization of hardware resources and combining both pipelining and parallel processing techniques. The VHDL code is verified to work at 100 MHz in ALTERA Stratix II FPGA.

international symposium on visual computing | 2010

An FPGA implementation of motion estimation algorithm for H.264/AVC

Moez Kthiri; Patrice Kadionik; H. Levi; Hassen Loukil; A. Ben Atitallah; Nouri Masmoudi

The H.264/AVC standard achieves much higher coding efficiency than previous video coding standards. Unfortunately mis comes with a cost in considerably increased complexity at the encoder mainly due to motion estimation. Therefore, various fast algorithms have been proposed for reducing computation but they do not consider how they can be effectively implemented by hardware. In this paper, we propose a hardware architecture of fast search block matching motion estimation algorithm using Line Diamond Parallel Search (LDPS) for H.264/AVC video coding system. This architecture presents pipeline processing techniques, minimum latency, maximum throughput and full utilization of hardware resources. The VHDL code has been tested and can work at high frequency in a Xilinx Virtex-5 FPGA circuit.

Design and Test Workshop, 2008. IDT 2008. 3rd International | 2009

An efficient hardware architecture design for H.264/AVC INTRA 4×4 algorithm

Hassen Loukil; B. Kaanich; N. Masmoudi; A. Ben Atitallah; P. Kadionikp

In this work, we present architecture for real-time implementation of INTRA 4X4 algorithm used in H.264/AVC baseline profile video coding standard. The INTRA 4times4 is composed by intra prediction 4times4, integer transform 4times4, quantization 4times4, inverse integer transform 4times4, inverse quantization 4times4. This hardware is designed to be used as part of a complete H.264 video coding system for video conference applications. This architecture presents minimum latency, maximum throughput, full utilization of hardware resources and combining both pipelining and parallel processing techniques. The proposed architecture is implemented in VHDL. The VHDL code is verified to work at 160 MHz in an ALTERA Stratix II FPGA. This architecture can process one macroblock (MB) for 432 clock cycles.

international conference on sciences and techniques of automatic control and computer engineering | 2013

A fast coding algorithm based on fast mode decision for HEVC standard

Hassan Kibeya; Fatma Belghith; Hassen Loukil; Mohamed Ali Ben Ayed; Nouri Masmoudi

As the new generation standard of video coding, the High Efficiency Video Coding (HEVC) is intended to provide significantly better coding efficiency than all existing video coding standards. One of his improvements is the encoding process of the structure block. It was established that this process requires high computing power because it is performed using all the possible depth levels and prediction modes to find the one with the least rate distortion (RD) cost using Lagrange multiplier. To reduce the computational complexity, fast coding unit size decision algorithms were proposed and implemented in HM. These algorithms can significantly reduce computational complexity while maintaining almost the same performance as the original HEVC encoder.

international multi-conference on systems, signals and devices | 2009

Hardware architecture for H.264/AVC intra 16×16 frame processing

Hassen Loukil; S. Arous; Imen Werda; A. Ben Atitallah; Patrice Kadionik; Nouri Masmoudi

In this paper, we present an efficient H.264 / AVC Intra 16×16 Frame Coder System. The System achieves real-time performance for video conference applications. The INTRA 16×16 is composed by intra 16×16 prediction, integer transform, quantization AC, inverse integer transform, inverse quantization AC, quantization DC, hadamard, inverse quantization DC, and inverse integer transform. The proposed hardware is implemented in VHDL. The VHDL RTL code works at 160 MHz in an Altera Stratix II FPGA and it code 129 Mpixels per second. This work will be used as an Intellectual Property (IP) integrated in H.264/AVC encoder.

Explore More