Naim Harb
University of Mons
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Naim Harb.
IEEE Transactions on Computers | 2014
Paulo Ricardo Possa; Sidi Ahmed Mahmoudi; Naim Harb; Carlos Valderrama; Pierre Manneback
This work presents a new flexible parameterizable architecture for image and video processing with reduced latency and memory requirements, supporting a variable input resolution. The proposed architecture is optimized for feature detection, more specifically, the Canny edge detector and the Harris corner detector. The architecture contains neighborhood extractors and threshold operators that can be parameterized at runtime. Also, algorithm simplifications are employed to reduce mathematical complexity, memory requirements, and latency without losing reliability. Furthermore, we present the proposed architecture implementation on an FPGA-based platform and its analogous optimized implementation on a GPU-based architecture for comparison. A performance analysis of the FPGA and the GPU implementations, and an extra CPU reference implementation, shows the competitive throughput of the proposed architecture even at a much lower clock frequency than those of the GPU and the CPU. Also, the results show a clear advantage of the proposed architecture in terms of power consumption and maintain a reliable performance with noisy images, low latency and memory requirements.
symposium on application specific processors | 2011
Naim Harb; Smail Niar; Mazen A. R. Saghir; Yassin El Hillali; Rabie Ben Atitallah
Application-specific programmable processors are increasingly being replaced by FPGAs, which offer high levels of logic density, rich sets of embedded hardware blocks, and a high degree of customizability and reconfigurability. New FPGA features such as Dynamic Partial Reconfiguration (DPR) can be leveraged to reduce resource utilization and power consumption while still providing high levels of performance. In this paper, we describe our implementation of a dynamically reconfigurable multiple-target tracking (MTT) module for an automotive driver assistance system. Our module implements a dynamically reconfigurable filtering block that changes with changing driving conditions.
field programmable logic and applications | 2012
Paulo Da Cunha Possa; Sidi Ahmed Mahmoudi; Naim Harb; Carlos Valderrama
In this paper, we present a FPGA based flexible self-adapting architecture for two features detectors, the Canny edge detector and the Harris corner detector, with reduced latency and memory requirements, and supporting variable resolution images. The new architecture uses neighbourhood extractors that can self-adapt its parameters on-the-fly and algorithm simplifications to reduce mathematical complexity, memory requirements and latency without losing reliability.
digital systems design | 2010
Tobias Lange; Naim Harb; Haisheng Liu; Smail Niar; Rabie Ben Atitallah
Multiple Target Tracking (MTT) algorithms are widely used in various military and civilian applications but its use in automotive safety has little been investigated. In MTT algorithms, implemented in embedded systems, it is important to use the minimum required resources to allow the entire DAS system to be integrated on the same chip (data acquisition, MTT and alarm restitution). This allows the reduction of the System on Chip (SoC) complexity and cost. This paper presents an efficient Driver Assistance System (DAS) based on MTT application. To do so, we first identified the performance bottlenecks in the application. In this application, a set of optimizations were applied to reduce the MTT algorithm’s complexity. Tuning in conjunction the hardware and the software yielded to optimize the final system and to meet the functional requirements. The result is a complete embedded MTT application running on an embedded system that fits in a contemporary medium sized FPGA device.
ACM Sigbed Review | 2009
Naim Harb; Smail Niar; Jehangir Khan; Mazen A. R. Saghir
This paper reports on our progress in developing a dynamically reconfigurable processing platform for an automotive multiple-target tracking (MTT) system. In addition to motivating our work, we present an overview of our design, some preliminary results, and report on the status of our work.
international symposium on industrial embedded systems | 2017
Naim Harb; Carlos Valderrama; Jonathan Pisane
Wireless digital repeaters are used to amplify cellular signals in different scenarios and environments using several frequency bands and radio links. The market for such repeaters varies between the demand for a specific scenario and a specific frequency radio band. In this work we present a dynamic and all purpose FPGA based repeater implemented in an actual industrial product. We present the FPGA based digital repeater customised for usage in different scenarios, frequency bands and number of channels. The procedure from developing an emulator to acquire test data and potential processing modules through the FPGA based implementation and industrialisation is the core of this work. We show that the main advantage of our solution is reconfigurability not just in filter coefficients, number of channels and taps but also in adding new functional blocks and using different configurations thanks to the use of FPGAs.
international symposium on industrial embedded systems | 2017
Boutheina Maaloul; Abdelmalik Taleb-Ahmed; Smail Niar; Naim Harb; Carlos Valderrama
For the past few decades, automatic accident detection, especially using video analysis, has become a very important subject. It is important not only for traffic management but also, for Intelligent Transportation Systems (ITS) through its contribution to avoid the escalation of accidents especially on highways. In this paper a novel vision-based road accident detection algorithm on highways and expressways is proposed. This algorithm is based on an adaptive traffic motion flow modeling technique, using Farneback Optical Flow for motions detection and a statistic heuristic method for accident detection. The algorithm was applied on a set of collected videos of traffic and accidents on highways. The results prove the efficiency and practicability of the proposed algorithm using only 240 frames for traffic motion modeling. This method avoids to utilization of a large database while adequate and common accidents videos benchmarks do not exist.
applied reconfigurable computing | 2017
Álvaro Avelino; Valentin Obac; Naim Harb; Carlos Valderrama; Glauberto Leilson Alves De Albuquerque; Paulo Da Cunha Possa
Power consumption reduction is crucial for portable equipments and for those in remote locations, whose battery replacement is impracticable. P\(^2\)IP is an architecture targeting real-time embedded image and video processing, which combines runtime reconfigurable processing, low-latency and high performance. Being a configurable architecture allows the combination of powerful video processing operators (Processing Elements or PEs) to build the target application. However, many applications do not require all PEs available. Remaining idle, these PEs still represent a power consumption problem that Partial Reconfiguration can mitigate. To assess the impact on energy consumption, another P\(^2\)IP implementation based on Partial Reconfiguration was developed and tested with three different image processing applications. Measurements have been made to analyze energy consumption when executing each of three applications. Results show that compared to the original implementation of the architecture use of Partial Reconfiguration leads to power savings of up to 45%.
Microprocessors and Microsystems | 2017
Fernando A. Escobar; Anthony Kolar; Naim Harb; Filipe Vinci dos Santos; Carlos Valderrama
Abstract Dynamic Programming (DP) is used to solve combinatorial optimization problems and constitutes one of the 13 High Performance Computing (HPC) patterns. DP suffers from irregular, data-dependent memory accesses that deteriorates performance. The Knapsack 0/1 belongs to the simplest DP algorithms which is called Serial Monadic and has been treated in software with cache-efficient algorithms as well as parallel threads, OpenMP or MPI. In this paper we propose a shared memory, parametrizable architecture to compute the DP matrix for the Knapsack 0/1. Our system has a parallel runtime of Θ(mC/q) for a knapsack of capacity C with m items and q operators. Using additional off-chip space and DMA transfers it can solve knapsacks of any size. The architecture is implemented on the ZYNQ-7020 System On Chip (SoC) that contains a dual-core ARM plus Artix FPGA fabric. Under such architecture we make use of 64-bit High Performance ports for off-chip transfers and asymmetric 32-bit write/64-bit read BRAMs to minimize data exchange times. We also exploit computation synchronization to minimize BRAM address propagation and reduce routing congestion. We present results for a base system with 70 Processing Elements (PEs) capable of solving problems with a maximum item weight ω max = 1024 . For more complex instances we configure the architecture with 58 PEs and ω max = 6144 , where a single BRAM is shared among 13 computing units. We thus solve problems with 6 × bigger weights than previous works, attain a 16 × speed-up versus an optimized software on an Intel Xeon E5 and get the highest efficiency per core versus other architectures. We achieve between 2.4 − 3.3 × acceleration versus previous FPGA solutions.
international symposium on industrial embedded systems | 2016
Naim Harb; Carlos Valderrama; Esteban Pelaez; Alexandre Girardi
At the heart of the European Rail Train Management System (ERTMS) is the European Train Control System (ETCS). One major goal of the ERTMS-ETCS project is the standardization and unification of all train control and command systems in Europe. Hence, it is critical to have a reliable test bed for ease of validation and certification, enforcing the reliability of ERTMS-ETCS train equipment. In this context, we present a low-cost system comprised of several connected Heterogeneous System on Chip (HSoC) cards that are used for the purpose of certifying train equipment. The proposed system mimics real train behaviors in operation. Train behavior scenarios are controlled by a train motion simulator running on a host PC, and train behavior data is fed from our system to the train equipment undergoing testing. An intermediate extension is used to guarantee real-time data transmission since the simulator is not capable of doing so due to its high computation demands and communication latencies. In our intermediate extension, each HSoC card contains a NVIDIA Tegra 2 microprocessor chip, an Altera Cyclone II Field Programmable Gate Array (FPGA) chip and several custom Application Specific Integrated Circuit (ASIC) chips. Each card can be accessed by the simulator over a Gigabit Ethernet port, and all cards intercommunicate using a 1 Mbps back-plane serial bus. We show that by using simulations as a starting point, our system is able to generate authentic train control signals 20 times faster than the software simulator in real-time, presenting the train equipment with a real test case scenario accurately modelling train behavior over a track.