Ab Al Hadi Ab Rahman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ab Al Hadi Ab Rahman is active.

Explore More

Publication

Featured researches published by Ab Al Hadi Ab Rahman.

Eurasip Journal on Image and Video Processing | 2011

Pipeline synthesis and optimization of FPGA-based video processing applications with CAL

Ab Al Hadi Ab Rahman; Anatoly Prihozhy; Marco Mattavelli

This article describes a pipeline synthesis and optimization technique that increases data throughput of FPGA-based system using minimum pipeline resources. The technique is applied on CAL dataflow language, and designed based on relations, matrices, and graphs. First, the initial as-soon-as-possible (ASAP) and as-late-as-possible (ALAP) schedules, and the corresponding mobility of operators are generated. From this, operator coloring technique is used on conflict and nonconflict directed graphs using recursive functions and explicit stack mechanisms. For each feasible number of pipeline stages, a pipeline schedule with minimum total register width is taken as an optimal coloring, which is then automatically transformed to a description in CAL. The generated pipelined CAL descriptions are finally synthesized to hardware description languages for FPGA implementation. Experimental results of three video processing applications demonstrate up to 3.9× higher throughput for pipelined compared to non-pipelined implementations, and average total pipeline register width reduction of up to 39.6 and 49.9% between the optimal, and ASAP and ALAP pipeline schedules, respectively.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2015

Synthesis and Optimization of Pipelines for HW Implementations of Dataflow Programs

Anatoly Prihozhy; Endri Bezati; Ab Al Hadi Ab Rahman; Marco Mattavelli

This paper introduces a new methodology for pipeline synthesis with applications to data flow high-level system design. The pipeline synthesis is applied to dataflow programs whose operators are translated into graphs and dependencies relations that are then processed for the pipeline architecture optimization. For each pipeline-stage time, a minimal number of pipeline stages are first determined and then an optimal assignment of operators to stages is generated with the objective of minimizing the total pipeline register size. The obtained “optimal” pipeline schedule is automatically transformed back into a dataflow program that can be synthesized to efficient hardware implementations. Two new pipeline scheduling: “least cost search branch and bound” and a heuristic technique have been developed. The first algorithm yields global optimum solutions for middle size designs, whereas the second one generates close-to-optimal solutions for large designs. Experimental results on FPGA designs show that the total pipeline register size gain in a range up to 4.68× can be achieved. The new algorithms overcome the known downward and upward direction dataflow graph traversal algorithms concerning the amount of pipeline register size by up to 100% on average.

signal processing systems | 2011

Methodology and technique to improve throughput of FPGA-based Cal dataflow programs: Case study of the RVC MPEG-4 SP Intra decoder

Hossam Amer; Ab Al Hadi Ab Rahman; Ihab Amer; Christophe Lucarz; Marco Mattavelli

The specification of complex signal processing systems in hardware by means of HDL is no longer the appropriate way since they are known to be time consuming to design, and less flexible to extend features. Recently, Cal dataflow language was specified to increase productivity and scalability, with ability to synthesize to HDL for hardware implementation. In this paper, a new methodology to improve throughput of dataflow-based hardware designs is given by analyzing Cal programs using the profiling tool. As a case study, we analyzed the RVC MPEG-4 SP Intra decoder and found that the texture decoding part has the highest improvement factor. We have also introduced the luminance texture splitting technique as the improvement method by increasing the level of parallelism in the decoder. Experimental results of implementation on Virtex-5 FPGA confirmed our analysis with throughput increase of up to 50.5% with only 4.3% additional slice.

international conference on acoustics, speech, and signal processing | 2014

A methodology for optimizing buffer sizes of dynamic dataflow fpgas implementations

Ab Al Hadi Ab Rahman; Simone Casale-Brunet; Claudio Alberti; Marco Mattavelli

Minimizing buffer sizes of dynamic dataflow implementations without introducing deadlocks or reducing the design performance is in general an important and useful design objective. Indeed, buffer sizes that are too small causing a system to deadlock during execution, or dimensioning unnecessarily large sizes leading to a resource inefficient design are both not a desired design option. This paper presents an implementation, validation, and comparison of several buffer size optimization techniques for the generic class of dynamic dataflow model of computation called the dataflow process network. The paper presents an heuristic capable of finding a close-to-minimum buffer size configuration for deadlock-free executions, and a methodology to efficiently explore different configurations for feasible design alternatives. The approach is demonstrated using as experimental design case, an MPEG-4 AVC/H.264 decoder implemented on an FPGA.

conference on design and architectures for signal and image processing | 2011

Optimization methodologies for complex FPGA-based signal processing systems with CAL

Ab Al Hadi Ab Rahman; Hossam Amer; Anatoly Prihozhy; Christophe Lucarz; Marco Mattavelli

Signal processing designs are becoming increasingly complex with demands for more advanced algorithms. Designers are now seeking high-level tools and methodology to help manage complexity and increase productivity. Recently, CAL dataflow language has been specified which is capable of synthesizing dataflow description into RTL codes for hardware implementation, and based on several case studies, have shown promising results. However, no work has been done on global network analysis, which could increase the optimization space. In this paper, we introduce methodologies to analyze and optimize CAL programs by determining which actions should be parallelized, pipelined, or refactored for the highest throughput gain, and then providing tools and techniques to achieve this using minimum resource. As a case study on the RVC MPEG-4 SP Intra decoder for implementation on Virtex-5 FPGA, experimental results confirmed our analysis with throughput gain of up to 3.5x using relatively-minor additional slice compared to the reference design.

international conference on computer and communication engineering | 2008

A genetic algorithm approach to VLSI macro cell non-slicing floorplans using binary tree

H.A. Rahim; Ab Al Hadi Ab Rahman; G. Andaljayalakshmi; R. B. Ahmad; W.N.S. Firuz Wan Arrifin

This paper proposes an optimization approach for macro-cell placement which minimizes the chip area size. A binary tree method for non-slicing tree construction process is utilized for the placement and area optimization of macro-cell layout in very large scaled integrated (VLSI) design. Three different types of genetic algorithms: simple genetic algorithm (SGA), steady-state algorithm (SSGA) and adaptive genetic algorithm (AGA) are employed in order to examine their performances in converging to their global minimums. Experimental results on Microelectronics Center of North Carolina (MCNC) benchmark problems show that the developed algorithm achieves an acceptable performance quality to the slicing floorplan. Furthermore, the robustness of genetic algorithm also has been investigated in order to validate the performance stability in achieving the optimal solution for every runtime. This algorithm demonstrates that SSGA converges to the optimal result faster than SGA and AGA. Besides that, SSGA also outperforms SGA and AGA in terms of robustness.

Lecture Notes in Electrical Engineering | 2015

Efficient Motion Estimation Algorithms for HEVC/H.265 Video Coding

Edward Tamunoiyowuna Jaja; Zaid Omar; Ab Al Hadi Ab Rahman; Muhammad Mun’im Ahmad Zabidi

This paper presents two fast motion estimation algorithms based on the structure of the triangle and the pentagon, respectively, for HEVC/H.265 video coding. These new search patterns determine motion vectors faster than the two Tzsearch patterns - diamond and square - that are built into the motion estimation engine of the HEVC. The proposed algorithms are capable of achieving a faster run-time with negligible video quality loss and increase in bit rate. Experimental results show that, at their best, the triangle and pentagon algorithms can offer 63 % and 61.9 % speed-up in run-time respectively compared to the Tzsearch algorithms in HEVC reference software.

Journal of Physics: Conference Series | 2018

VLSI Implmentation of a Fast Kogge-Stone Parallel-Prefix Adder

Lee Mei Xiang; Muhammad Mun’im Ahmad Zabidi; Ainy Haziyah Awab; Ab Al Hadi Ab Rahman

This paper presents a VLSI implementation of a high speed Kogge-Stone adder (KSA) using 0.18µm process technology. The adder is known to be one of the fastest adder architectures, and this is validated through a comparison with other adder architectures including the standard ripple carry adder and the carry look ahead adder. Furthermore, our KSA adder is also compared with a default optmized adder from the Artisan standard cell library. The adders are compared for bit widths of 8, 16, and 32. The adders are designed using Verilog and synthesized using both front-end and back-end tools, with complete validation and verification stages, including analysis for performance, power, and area. Results show that in terms of performance, KSA results in the lowest propagation delay with almost constant delay for all bit widths, with up to 70% less delay as compared to all other architectures. Area and power penalty is found to also increase by roughly 59%. In terms of energy usage, the KSA adder results in up to 64% less. In the case when speed and energy are critical, this fast and energy efficient KSA adder can be readily integrated into custom VLSI designs.

international conference on signal and image processing applications | 2017

An alphabetic contour-based descriptor for shape-based image retrieval

Ali Taheri Anaraki; Usman Ullah Sheikh; Ab Al Hadi Ab Rahman; Zaid Omar

Content-based image retrieval methods use color, texture and shape of an object for indexing and retrieval. Among these features, shape is a basic visual feature that holds significant information of the object. In this paper an alphabetic contour-based shape description method is proposed to facilitate shape classification and retrieval. The proposed method breaks down shapes contour into small segments and assigns unique alphabetic symbol for each segment based on its geometrical features. These symbols are used to create feature string which we call it, an alphabet string. The alphabet strings are compared together using dynamic programming during classification. The proposed method was tested on BROWN dataset that consists of occluded, articulated and missing part shapes. Results show the feasibility of the method and highlight its advantages over some state-of-the art methods.

international conference on signal and image processing applications | 2017

Wavelet-based aortic annulus sizing of echocardiography images

Norhasmira Mohammad; Zaid Omar; Usman Ullah Sheikh; Ab Al Hadi Ab Rahman; Musab Sahrim

Aortic stenosis (AS) is a condition where the calcification deposit within the heart leaflets narrows the valve and restricts the blood from flowing through it. This disease is progressive over time where it may affect the mechanism of the heart valve. To alleviate this condition without resorting to surgery, which runs the risk of mortality, a new method of treatment has been introduced: Transcatheter Aortic Valve Implantation (TAVI), in which imagery acquired from real-time echocardiogram (Echo) are needed to determine the exact size of aortic annulus. However, Echo data often suffers from speckle noise and low pixel resolution, which may result in incorrect sizing of the annulus. Our study therefore aims to perform an automated detection and measurement of aortic annulus size from Echo imagery. Two stages of algorithm are presented — image denoising and object detection. For the removal of speckle noise, Wavelet thresholding technique is applied. It consists of three sequential steps; applying linear discrete wavelet transform, thresholding wavelet coefficients and performing linear inverse wavelet transform. For the next stage of analysis, several morphological operations are used to perform object detection as well as valve sizing. The results showed that the automated system is able to produce more accurate sizing based on ground truth.

Explore More