Tassadaq Hussain | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tassadaq Hussain is active.

Explore More

Publication

Featured researches published by Tassadaq Hussain.

applied reconfigurable computing | 2012

PPMC: a programmable pattern based memory controller

Tassadaq Hussain; Muhammad Shafiq; Miquel Pericàs; Nacho Navarro; Eduard Ayguadé

One of the main challenges in the design of hardware accelerators is the efficient access of data from the external memory. Improving and optimizing the functionality of the memory controller between the external memory and the accelerators is therefore critical. In this paper, we advance toward this goal by proposing PPMC, the Programmable Pattern-based Memory Controller. This controller supports scatter-gather and strided 1D, 2D and 3D accesses with programmable tiling. Compared to existing solutions, the proposed system provides better performance, simplifies programming access patterns and eases software integration by interfacing to high-level programming languages. In addition, the controller offers an interface for automating domain decomposition via tiling. We implemented and tested PPMC on a Xilinx ML505 evaluation board using a MicroBlaze soft-core as the host processor. The evaluation uses six memory intensive application kernels: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication, and 3D-Stencil. The results show that the PPMC-enhanced system achieves at least 10x speed-ups for 1D, 2D and 3D memory accesses as compared to a non-PPMC based setup.

high performance computing systems and applications | 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé; Mateo Valero

The ever-increasing complexity of high-performance computing applications limits performance due to memory constraints in FPGAs. To address this issue, we propose the Advanced Pattern based Memory Controller (APMC), which supports both regular and irregular memory patterns. The proposed memory controller systematically reduces the latency faced by processors/accelerators due to irregular memory access patterns and low memory bandwidth by using a smart mechanism that collects and stores the different patterns and reuses them when it is needed. In order to prove the effectiveness of the proposed controller, we implemented and tested it on a Xilinx ML505 FPGA board. In order to prove that our controller is efficient in a variety of scenarios, we used several benchmarks with different memory access patterns. The benchmarking results show that our controller consumes 20% less hardware resources, 32% less on chip power and achieves a maximum speedup of 52× and 2.9× for regular and irregular applications respectively.

field-programmable technology | 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool

Tassadaq Hussain; Miquel Pericàs; Nacho Navarro; Eduard Ayguadé

Reconfigurable computers have started to appear in the HPC landscape, albeit at a slow pace. Adoption is still being hindered by the design methodologies and slow implementation cycles. Recently, methodologies based on High Level Synthesis (HLS) have begun to flourish and the reconfigurable supercomputing community is slowly adopting these techniques. In this paper we took a geophysics application and implemented it on FPGA using a HLS tool called HCE. The application, Reverse Time Migration, is an important code for subsalt imaging. It is also a highly demanding code both in computationally as in its memory requirements. The complexity of this code makes it challenging to implement it using a HLS methodology instead of HDL. We study the achieved performance and compare it with hand-written HDL and also with software based execution. The resulting design, when implemented on the Altera Stratix IV EP4SGX230 and EP4SGX530 devices achieves 11.2 and 22 GFLOPS respectively. On these devices, the design was capable of achieving up to 4.2× and 7.9× improvement, respectively, over a general purpose processor core (Intel i7).

field programmable logic and applications | 2012

PPMC: Hardware scheduling and memory management support for multi accelerators

Tassadaq Hussain; Miquel Pericàs; Nacho Navarro; Eduard Ayguadé

A generic multi-accelerator system comprises a microprocessor unit that schedules the accelerators along with the necessary data movements. The system, having the processor as control unit, encounters multiple delays (memory and task management) which degrade the overall system performance. This performance degradation demands an efficient memory manager and high speed scheduler, which feeds prearranged data to the appropriate accelerator. In this work we propose the integration of an efficient scheduler and an intelligent memory manger into an existing core known as PPMC (Programmable Pattern based Memory Controller), such that data movement and computational tasks can be handled proficiently. Consequently, the modified PPMC system improves performance by managing data movements and address generation in hardware and scheduling accelerators without the intervention of a control processor nor an operating system. The PPMC system is evaluated with six memory intensive accelerators: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication and 3D-Stencil. This modified PPMC system is implemented and tested on a Xilinx ML505 evaluation FPGA board. The performance of the system is compared with a microprocessor based system that has been integrated with the Xilkernel operating system. Results show that the modified PPMC based multi-accelerator system consumes 50% less hardware resources, 32% less on-chip power and achieves approximately a 27× speed-up compared to the MicroBlaze-based system.

applied reconfigurable computing | 2014

Stand-alone memory controller for graphics system

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé; Mateo Valero; Amna Haider

There has been a dramatic increase in the complexity of graphics applications in System-on-Chip (SoC) with a corresponding increase in performance requirements. Various powerful and expensive platforms to support graphical applications appeared recently. All these platforms require a high performance core that manages and schedules the high speed data of graphics peripherals (camera, display, etc.) and an efficient on chip scheduler. In this article we design and propose a SoC based Programmable Graphics Controller (PGC) that handles graphics peripherals efficiently. The data access patterns are described in the program memory; the PGC reads them, generates transactions and manages both bus and connected peripherals without the support of a master core. The proposed system is highly reliable in terms of cost, performance and power. The PGC based system is implemented and tested on a Xilinx ML505 FPGA board. The performance of the PGC is compared with the Microblaze processor based graphic system. When compared with the baseline system, the results show that the PGC captures video at 2x of higher frame rate and achieves 3.4x to 7.4x of speedup while processing images. PGC consumes 30% less hardware resources and 22% less on-chip power than the baseline system.

International Journal of Circuits and Architecture Design | 2014

PGC: a pattern-based graphics controller

Tassadaq Hussain; Amna Haider

In last decade graphics system have shown a great impact in our lives not only as a commodity itself but also for specialised use. Various powerful and expensive platforms to support graphical applications appeared in recent years. All these platforms require a high performance core that manages and schedules the high speed data of graphics peripherals. In this article, we design and propose a pattern-based graphics controller (PGC) that handles graphics peripherals efficiently. The proposed system is highly reliable in terms of cost, performance and power. The PGC-based system is implemented and tested on a Xilinx ML505 FPGA board. When compared with the baseline systems, the results show that the PGC captures video at 2.5×, 1.78× and 5× of higher frame rate and achieves 1.8× to 14.6× of speedup while executing different image processing applications. PGC consumes 16% to 28% of less on-chip power than the baseline systems.

Journal of Signal Processing Systems | 2018

Memory Controller for Vector Processor

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé

To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators and vector processing cores for data-intensive scientific applications. FPGA based vector accelerators are used to increase the performance of high-performance application kernels. Adding more vector lanes does not affect the performance, if the processor/memory performance gap dominates. In addition if on/off-chip communication time becomes more critical than computation time, causes performance degradation. The system generates multiple delays due to application’s irregular data arrangement and complex scheduling scheme. Therefore, just like generic scalar processors, all sets of vector machine – vector supercomputers to vector microprocessors – are required to have data management and access units that improve the on/off-chip bandwidth and hide main memory latency. In this work, we propose an Advanced Programmable Vector Memory Controller (PVMC), which boosts noncontiguous vector data accesses by integrating descriptors of memory patterns, a specialized on-chip memory, a memory manager in hardware, and multiple DRAM controllers. We implemented and validated the proposed system on an Altera DE4 FPGA board. The PVMC is also integrated with ARM Cortex-A9 processor on Xilinx Zynq All-Programmable System on Chip architecture. We compare the performance of a system with vector and scalar processors without PVMC. When compared with a baseline vector system, the results show that the PVMC system transfers data sets up to 1.40x to 2.12x faster, achieves between 2.01x to 4.53x of speedup for 10 applications and consumes 2.56 to 4.04 times less energy.

field programmable logic and applications | 2014

MAPC: Memory access pattern based controller

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrián Cristal; Eduard Ayguadé; Mateo Valero

Traditionally, system designers have attempted to improve system performance by scheduling the processing cores and by exploring different memory system configurations and there is comparatively less work done scheduling the accesses at the memory system level and exploring data accesses on the memory systems. In this paper, we propose a memory access pattern based controller (MAPC). MAPC organizes data accesses in descriptors, prioritizes them with respect to the number and size of transfer requests. When compared to the baseline multicore system, the MAPC based system achieves between 2.41× to 5.34× of speedup for different applications, consumes 28% less hardware resources and 13% less dynamic power.

parallel computing | 2015

AMC: Advanced Multi-accelerator Controller

Tassadaq Hussain; Amna Haider; Shakaib A. Gursal; Eduard Ayguadé

The rapid advancement, use of diverse architectural features and introduction of High Level Synthesis (HLS) tools in FPGA technology have enhanced the capacity of data-level parallelism on a chip. A generic FPGA based HLS multi-accelerator system requires a microprocessor (master core) that manages memory and schedules accelerators. In a real environment, such HLS multi-accelerator systems do not give a perfect performance due to memory bandwidth issues. Thus, a system demands a memory manager and a scheduler that improves performance by managing and scheduling the multi-accelerator’s memory access patterns efficiently. In this article, we propose the integration of an intelligent memory system and efficient scheduler in the HLS-based multi-accelerator environment called Advanced Multi-accelerator Controller (AMC). The AMC system is evaluated with memory intensive accelerators, High Performance Computing (HPC) applications and implemented and tested on a Xilinx Virtex-5 ML505 evaluation FPGA board. The performance of the system is compared against the microprocessor-based systems that have been integrated with the operating system. Results show that the AMC based HLS multi-accelerator system achieves 10.4x and 7x of speedup compared to the MicroBlaze and Intel Core based HLS multi-accelerator systems.

application-specific systems, architectures, and processors | 2014

PVMC: Programmable Vector Memory Controller

Tassadaq Hussain; Oscar Palomar; Osman S. Unsal; Adrian Cristal; Eduard Ayguadé; Mateo Valero

In this work, we propose a Programmable Vector Memory Controller (PVMC), which boosts noncontiguous vector data accesses by integrating descriptors of memory patterns, a specialized local memory, a memory manager in hardware, and multiple DRAM controllers. We implemented and validated the proposed system on an Altera DE4 FPGA board. We compare the performance of our proposal with a vector system without PVMC as well as a scalar only system. When compared with a baseline vector system, the results show that the PVMC system transfers data sets up to 2.2× to 14.9× faster, achieves between 2.16× to 3.18× of speedup for 5 applications and consumes 2.56 to 4.04 times less energy.

Explore More