Syed Waqar Nabi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Syed Waqar Nabi is active.

Explore More

Publication

Featured researches published by Syed Waqar Nabi.

symposium on cloud computing | 2008

A coarse-grained Dynamically Reconfigurable MAC Processor for power-sensitive multi-standard devices

Syed Waqar Nabi; Cade C. Wells; Wim Vanderbauwhede

We have designed a coarse-grained, dynamically reconfigurable architecture, specifically for implementing the wireless MAC layer in consumer hand-held devices. The dynamically reconfigurable MAC Processor is a SoC architecture that uses a reconfigurable hardware co-processor to delegate critical tasks. The co-processor can reconfigure packet-by-packet, handling upto 3 data streams of different protocols concurrently. We present results of simulations involving transmission and reception of packets, showing that the platform concurrently handles three protocol streams, reconfigures dynamically, yet meets and exceeds the protocol timing constraints, all at a moderate frequency. Thus we show that this architecture is capable of replacing up to three MAC processors in a wireless device. Its heterogeneous and coarse-grained functional units, requirements of limited connectivity between these units, and the idle time of hardware resources promise a very modest power-consumption, suitable for mobile devices.

international parallel and distributed processing symposium | 2016

A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

Syed Waqar Nabi; Wim Vanderbauwhede

Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the designs representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach.

reconfigurable computing and fpgas | 2015

Using type transformations to generate program variants for FPGA design space exploration

Syed Waqar Nabi; Wim Vanderbauwhede

We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correct-by-construction program variants through type transformations. We have developed a custom Intermediate Representation (IR) language, the TyTra-IR, which is similar to the LLVM IR, with extensions to express parallelism, allowing us to designs variants associated with each program variant. The key innovation of the TyTra-IR is the ability to construct and cost design variants for FPGAs. Our prototype compiler generates Verilog code for FPGA synthesis from a given IR description. Using a real-world Successive Over-Relaxation (SOR) kernel, we illustrate generation of program variants in Idris, their representation in TyTra-IR, and evaluation of variants using our cost-model. We compare the estimates from the cost-model with results from synthesis and simulation of equivalent HDL.

parallel computing | 2015

FPGAs as Components in Heterogeneous High-Performance Computing Systems: Raising the Abstraction Level.

Wim Vanderbauwhede; Syed Waqar Nabi

We present an overview of the evolution of programming techniques for Field-Programmable Gate Arrays (FPGAs), with a particular focus on High-Level Synthesis (HLS) and Heterogeneous Computing (HC), and we argue that, in the context of High-Performance Computing (HPC), FPGAs should be treated as components of a larger heterogeneous compute platform. Consequently, HLS and HC tools become compilation targets rather than high-level development tools. Compiler technology has to evolve to automatically create the best compiled program variant by transforming a given original program. We describe our methodology based on type transformations and cost models, which allows to automatically generate correct-by-construction program variants and accurately estimate their performance, so that an optimal program can be constructed by the compilation system.

adaptive hardware and systems | 2008

A Dynamically Reconfigurable Hardware Co-Processor for a Multi-Standard Wireless MAC Processor

Syed Waqar Nabi; Cade C. Wells; Wim Vanderbauwhede

The dynamically reconfigurable MAC processor is an innovative architecture specialized for the wireless MAC layer, and aimed at consumer hand-held devices. It is a software/hardware partitioned platform where the microprocessor uses a reconfigurable hardware co-processor to delegate critical tasks. This allows the microprocessor to handle fast and complex MAC protocols while clocking at relatively slow speeds, thus consuming less power. The architecture on the whole is designed to be dynamically reconfigurable. It will handle data streams of multiple (up to 3) different protocol standards, by reconfiguring on a packet-by-packet basis. Results of simulation of packet transmission and reception on a prototype Simulink model indicate that a packet to packet reconfiguration for three concurrent data streams, while meeting protocol real-time requirements, will indeed be possible.

International Journal of Parallel Programming | 2018

Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs

Wim Vanderbauwhede; Syed Waqar Nabi; Cristian Urlea

In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain.

parallel computing | 2015

FPGA port of a large scientific model from legacy code: the Emanuel convection scheme

Kristian Thorin Hentschel; Wim Vanderbauwhede; Syed Waqar Nabi

The potential of FPGAs for High-Performance Computing is increasingly recognized, but most work focuses on acceleration of small, isolated kernels. We present a parallel FPGA implementation of a legacy algorithm, the seminal scheme for cumulus convection in large-scale models developed by Emanuel [1]. Our design makes use of pipelines both at the arithmetic and at the logical stage level, keeping the entire algorithm on the FPGA. We assert that modern FPGAs have the resources to support this type of large algorithms. Through a practical and theoretical evaluation of our design we show how such an FPGA implementation compares to GPU implementations or multi-core approaches such as OpenMP.

field-programmable logic and applications | 2008

Interface and Reconfiguration Controller for a wireless MAC-oriented dynamically reconfigurable hardware co-processor

Syed Waqar Nabi; Cade C. Wells; Wim Vanderbauwhede

To address the challenges of the consumer wireless device industry, we have designed a dynamically reconfigurable architecture with flexibility limited to address the MAC layer. It is a Software/Hardware partitioned platform in which critical tasks are delegated to a dynamically reconfigurable hardware co-processor. It will handle data streams of multiple (up to 3) different protocol standards, by reconfiguring on a packet-by-packet basis. The Interface and Reconfiguration Controller uses a combination of controllers to dynamically reconfigure the functional units in the architecture and delegate MAC tasks to them. Results of packet transmission on a prototype model indicate that the device handles three transmission requests from different protocol modes in a fraction of the packet durations.

conference on ph.d. research in microelectronics and electronics | 2007