Affaq Qamar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Affaq Qamar is active.

Explore More

Publication

Featured researches published by Affaq Qamar.

mediterranean electrotechnical conference | 2014

Design space exploration of a stereo vision system using high-level synthesis

Affaq Qamar; Claudio Passerone; Luciano Lavagno; Francesco Gregoretti

Stereoscopic vision is an essential building block of modern assisted driving and surveillance applications. Semi-Global Matching (SGM) is a very efficient approach, which outperforms most local algorithms and can deliver real-time performance if properly implemented in hardware. In this paper we describe the design space exploration of the SGM algorithm for automotive applications. The paper also highlights the methodology that we used for the transformation of the high-level code from a reference software implementation, which was unsuitable as a starting point for high-level synthesis, to the hardware implementation. Stream-based processing of the SGM algorithm, despite its complex data dependencies, is achieved by focusing on the inner most loops of the algorithms. Changing the choices of the loop implementation and type of the targeted memory implementation yield different RTL code with a broad range of area vs performance trade-offs.

Microprocessors and Microsystems | 2017

LP-HLS: Automatic power-intent generation for high-level synthesis based hardware implementation flow

Affaq Qamar; Fahad Bin Muslim; Javed Iqbal; Luciano Lavagno

Abstract The abstraction level for digital designs is rising from Register Transfer Level (RTL) to algorithmic untimed or transaction-based, followed by an automated high-level synthesis (HLS) flow. However, it is still a significant challenge for chip architects and designers to describe low-power design decisions at the system-level. Nowadays, low power design techniques for digital blocks are applied at RTL and there exists no commercial tool or methodology that can automatically derive the power intent from the system-level description. The process requires considerable amount of human intervention and various low-level details that are needed to implement low power schemes at RTL. This research aims to integrate low power techniques, specifically Power Shut-Off (PSO), within a model-based hardware flow and to derive an automated Low Power-High Level Synthesis (LP-HLS) methodology. The methodology aims at minimizing the design effort for low power design by deriving low-level power intent automatically for model-based designs, while using high-level synthesis to achieve a broad set of target system implementations. LP-HLS uses set of pragmas and a directive file to derive power intent information. To illustrate the methodology, three model designs, ranging from simple designs to medium complexity hardware accelerators, are considered. Finally, the power saving results for the design scenarios validate the effectiveness of our LP-HLS methodology.

vehicular technology conference | 2015

Analysis and Implementation of the Semi-Global Matching 3D Vision Algorithm Using Code Transformations and High-Level Synthesis

Affaq Qamar; Fahad Bin Muslim; Luciano Lavagno

High-level synthesis (HLS) offers several advantages, such as faster simulation run-time and better design re-use, thanks to the higher level of abstraction. This work uses HLS to implement the Semi-Global Matching (SGM) algorithm, which is frequently used in stereo vision systems, e.g. for automotive applications. The hardware implementation is based on a Xilinx® Virtex 7 FPGA. The initial algorithmic “golden” model used very large arrays, which had to be mapped to an external DRAM and brought into the on-chip RAM of the FPGA on demand. This required both adding the memory transfer loops and inserting calls to the AXI transactors that access the DRAM through the on-chip DDR slave. Moreover, the initial single-threaded algorithm had to be parallelized, by converting the top-level sweeps of the image in eight directions into as many threads. The access to the DRAM was then managed with a centralized controller. This modified SystemC design proved to be suitable to achieve the target real-time performance. The design space was thus explored by making several fairly different micro-architectural choices. In the end, it was possible to obtain an implementation which is comparable to a very efficient (and hence very inflexible) manual RTL design that had been previously developed, including a very sophisticated fine-grained management of data and computation.

federated conference on computer science and information systems | 2016

Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL

Fahad Bin Muslim; Alexandros Demian; Liang Ma; Luciano Lavagno; Affaq Qamar

Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of computeintensive tasks at considerably lower power consumption. Modern FPGAs, due to their reasonable execution speed and comparatively lower power consumption, are strong competitors to the traditional GPU based accelerators. High-level Synthesis (HLS) simplifies FPGA programming by allowing designers to program FPGAs in several high-level languages e.g. C/C++, OpenCL and SystemC. This work focuses on using an HLS based methodology to implement a widely used classification algorithm i.e. k-nearest neighbor on an FPGA based platform directly from its OpenCL code. Multiple fairly different implementations of the algorithm are considered and their performance on FPGA and GPU is compared. It is concluded that the FPGA generally proves to be more power efficient as compared to the GPU. Furthermore, using an FPGA-specific OpenCL coding style and providing appropriate HLS directives can yield an FPGA implementation comparable to a GPU also in terms of execution time. Keywords—kNN; FPGA; High-Level Synthesis; Hardware Acceleration; low-power low-energy computation; Parallel Computing; OpenCL.

international conference on software, telecommunications and computer networks | 2015

Low power methodology for an ASIC design flow based on high-level synthesis

Fahad Bin Muslim; Affaq Qamar; Luciano Lavagno

Power management in system-on-chip (SoC) design has become very important in modern nanometric technologies. It is desirable to consider power optimization at the system-level for maximum power savings due to its higher level of abstraction. Clock gating and power gating are two well-known techniques for dynamic and leakage power reduction respectively. They can even be integrated to get maximum power reduction by using the same signal to control both. This work presents a methodology using both these techniques to save power of an inverse discrete cosine transform (IDCT) design when the register transfer level (RTL) is generated automatically by high-level synthesis (HLS). Power gating is implemented by capturing the power intent using common power format (CPF). This work mainly highlights the prospects of integrating CPF with automatically generated RTL using HLS flow. Saving in dynamic power by a factor of around 10× is obtained through clock gating while more than 50% saving in static power is obtained through power gating. Power gating also results in some area overhead.

Security and Communication Networks | 2017

Design and Analysis of Self-Healing Tree-Based Hybrid Spectral Amplitude Coding OCDMA System

Waqas Ahmed Imtiaz; Affaq Qamar; Atiq ur Rehman; Haider Ali; Adnan Rashid Chaudhry; Javed Iqbal

This paper presents an efficient tree-based hybrid spectral amplitude coding optical code division multiple access (SAC-OCDMA) system that is able to provide high capacity transmission along with fault detection and restoration throughout the passive optical network (PON). Enhanced multidiagonal (EMD) code is adapted to elevate system’s performance, which negates multiple access interference and associated phase induced intensity noise through efficient two-matrix structure. Moreover, system connection availability is enhanced through an efficient protection architecture with tree and star-ring topology at the feeder and distribution level, respectively. The proposed hybrid architecture aims to provide seamless transmission of information at minimum cost. Mathematical model based on Gaussian approximation is developed to analyze performance of the proposed setup, followed by simulation analysis for validation. It is observed that the proposed system supports 64 subscribers, operating at the data rates of 2.5 and above. Moreover, survivability and cost analysis in comparison with existing schemes show that the proposed tree-based hybrid SAC-OCDMA system provides the required redundancy at minimum cost of infrastructure and operation.

IEEE Access | 2017

High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

Affaq Qamar; Fahad Bin Muslim; Francesco Gregoretti; Luciano Lavagno; Mihai Teodor Lazarescu

High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of

international multi topic conference | 2016

Dramatically enhanced oxygen surface exchange kinetics of mixed conducting SOFC cathode by surface modification

Saim Saher; Shafi ur Rehman; Abdul Basit; Muhammad Noman; Ayesha Samreen; Shahid Ali; Affaq Qamar; Muhammad Alam Zaib

640\times 480

international conference on emerging technologies | 2016

Tailoring the surface of perovskite oxide for enhanced oxygen exchange kinetics

Saim Saher; Shafi ur Rehman; Abdul Basit; Muhammad Noman; Muhammad Alam Zaib; Ayesha Samreen; Shahid Ali; Affaq Qamar

with a disparity depth of 128 pixels per frame.

Microsystem Technologies-micro-and Nanosystems-information Storage and Processing Systems | 2017

Design, development and implementation of a low power and high speed pipeline A/D converter in submicron CMOS technology

Muhammad Imran Khan; Affaq Qamar; Faisal Shabbir; Rizwan Shoukat

Lanthanum-strontium-cobalt-ferrite (LSCF) mixed conducting perovskite have been widely studied as cathode material for intermediate temperature solid oxide fuel cells (SOFC). The performance of LSCF cathode is restrained by oxygen surface exchange process, which is the rate determining step of oxygen reduction reaction. Inspired by the past work, in this study La<inf>0.6</inf>Sr<inf>0.4</inf>Co<inf>0.2</inf>Fe<inf>0.8</inf>O<inf>3−δ</inf> dense electrode is coated with La<inf>2</inf>NiO<inf>4+δ</inf> nanoparticle electrolyte. The La<inf>2</inf>NiO<inf>4+δ</inf> has relatively high ionic conductivity at intermediate temperature range, 0.0014 S.cm<sup>−1</sup> to 0.0371 S.cm<sup>−1</sup> at 500 °C to 800 °C, respectively, hence increase of surface reaction sites promotes the oxygen reduction reaction. The electrical conductivity relaxation (ECR) technique is used to study the oxygen surface exchange kinetics of bare LSCF and coated with La<inf>2</inf>NiO<inf>4+δ</inf>. The results show that the oxygen surface exchange kinetics of LSCF is affected by the coating. At 800 °C, pO<inf>2</inf> (bar) step change from 0.2 to 0.8, the oxygen surface exchange coefficient, k<inf>chem</inf>, of LSCF coated with La<inf>2</inf>NiO<inf>4+δ</inf> increases by a factor 7. The enhancement in k<inf>chem</inf> is found comparable as reported values by T. Hong et al. It has been concluded that the ionic conductivity of the coating phase is the key factor responsible for the improvement of surface exchange kinetics. High ionic condusctive oxide coating promotes the oxygen surface exchange kinetics of mixed conducting oxide and improves the performance of electrochemical devices such as solid oxide fuel cell.

Explore More