Randa Khemiri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Randa Khemiri is active.

Explore More

Publication

Featured researches published by Randa Khemiri.

2016 International Image Processing, Applications and Systems (IPAS) | 2016

Fast motion estimation for HEVC video coding

Randa Khemiri; Nejmeddine Bahri; Fatma Belghith; Fatma Ezahra Sayadi; Mohamed Atri; Nouri Masmoudi

In this paper, a fast configuration for Motion Estimation (ME) is described in order to reduce the computational time of the new High Efficient Video Coding (HEVC). This configuration uses the Coded Block Flag (CBF) Fast Method (CFM), the Early Coding Unit (CU) termination (ECU) and the Early Skip Detection (ESD) modes. The Diamond Pattern is used as a search algorithm for ME in the encoding process. Compared to the latest original reference software test model (HM) 16.2 of the HEVC, experimental results had showed that the complexity is reduced, in average, by 56.75% with a small bit-rate and PSNR degradation.

Iet Image Processing | 2018

Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation

Randa Khemiri; Hassan Kibeya; Fatma Ezahra Sayadi; Nejmeddine Bahri; Mohamed Atri; Nouri Masmoudi

The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. Thats why HEVC complexity is a crucial subject. The most time consuming and the most intensive computing part of HEVC is the motion estimation based principally on the sum of absolute differences (SAD) or the sum of square differences (SSD) algorithms. For these reasons, the authors proposed an implementation of these algorithms on a low cost NVIDIA GPU (graphics processing unit) using the Fermi architecture developed with Compute Unified Device Architecture language. The proposed algorithm is based on the parallel-difference and the parallel-reduction process. The investigational results show a significant speed-up in terms of execution time for most 64 × 64 pixel blocks. In fact, the proposed parallel algorithm permits a significant reduction in the execution time that reaches up to 56.17 and 30.4%, compared to the CPU, for SAD and SSD algorithms, respectively. This improvement proves that parallelising the algorithm with the new proposed reduction process for the Fermi-GPU generation leads to better results. These findings are based on a static study that determines the PU percentage utilisation for each dimension in the HEVC. This study shows that the larger PUs are the most utilised in temporal levels 3 and 4, which attain 84.56% for class E. This improvement is accompanied by an average peak signal-to-noise ratio loss of 0.095 dB and a decrease of 0.64% in terms of BitRate.

Iet Computers and Digital Techniques | 2017

Image feature extraction algorithm based on CUDA architecture: case study GFD and GCFD

Haythem Bahri; Fatma Ezahra Sayadi; Randa Khemiri; Marwa Chouchene; Mohamed Atri

Optimising computing times of applications is an increasingly important task in many different areas such as scientific and industrial applications. Graphics processing unit (GPU) is considered as one of the powerful engines for computationally demanding applications since it proposes a highly parallel architecture. In this context, the authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image. They choose generalised Fourier descriptor (GFD) and generalised colour Fourier descriptor (GCFD) models, as a method to extract the image feature for various applications such as colour object recognition in real-time or image retrieval. They compare the computing time experimental results on central processing unit and GPU. They also present a case study of these experimental results descriptors using two platforms: a NVIDIA GeForce GT525M and a NVIDIA GeForce GTX480. Their experimental results demonstrate that the execution time can considerably be reduced until 34× for GFD and 56× for GCFD.

Iet Computers and Digital Techniques | 2018

CUDA memory optimisation strategies for motion estimation

Fatma Elzahra Sayadi; Marwa Chouchene; Haithem Bahri; Randa Khemiri; Mohamed Atri

As video processing technologies continue to rise quicker than central processing unit (CPU) performance in complexity and image resolution, data-parallel computing methods will be even more important. In fact, the high-performance, data-parallel architecture of modern graphics processing unit (GPUs) can minimise execution times by orders of magnitude or more. However, creating an optimal GPU implementation not only needs converting sequential implementation of algorithms into parallel ones but, more importantly, needs cautious balancing of the GPU resources. It requires also an understanding of the bottlenecks and defect caused by memory latency and code computing. The defiance is even greater when an implementation exceeds the GPU resources. In this study, the authors discuss the parallelisation and memory optimisation strategies of a computer vision application for motion estimation using the NVIDIA compute unified device architecture (CUDA). It addresses optimisation techniques for algorithms that surpass the GPU resources in either computation or memory resources for CUDA architecture. The proposed implementation reveals a substantial improvement in both speed up (SU) and peak signal-to-noise ratio (PSNR). Indeed, the implementation is up to 50 times faster than the CPU counterpart. It also provides an increase in PSNR of the coded test sequence up to 8 dB.

computer and information technology | 2014

MatLab acceleration for DWT “Daubechies 9/7” for JPEG2000 standard on GPU

Randa Khemiri; Fatma Ezahra Sayadi; Mohamed Atri; Rached Tourki

Discrete wavelet transform (DWT) has diverse applications in signal and image processing fields. In this paper, we have implemented the lifting “Cohen-Daubechies-Feauveau 9/7” algorithm on a low cost NVIDIAs GPU (Graphics Processing Unit) with MatLab to achieve speedup in computation. The efficiency of our GPU based implementation is measured and compared with CPU based algorithms. Our investigational results with GPU show performance enhancement over a factor of 1.82 compared with CPU for an image of size 4096×4096 pixels.

Indian journal of science and technology | 2016