Syed Mohammad Asad Hassan Jafri

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Syed Mohammad Asad Hassan Jafri is active.

Explore More

Publication

Featured researches published by Syed Mohammad Asad Hassan Jafri.

international symposium on quality electronic design | 2010

Design of a fault-tolerant coarse-grained

Syed Mohammad Asad Hassan Jafri; Stanislaw J. Piestrak; Olivier Sentieys; Sébastien Pillement

This paper considers the possibility of implementing low-cost hardware techniques which would allow to tolerate temporary faults in the datapaths of coarse-grained reconfigurable architectures (CGRAs). Our goal was to use less hardware overhead than commonly used duplication or triplication methods. The proposed technique relies on concurrent error detection by using residue code modulo 3 and re-execution of the last operation, once an error is detected. We have chosen the DART architecture as a vehicle to study the efficiency of this approach to protect its datapaths. Simulation results have confirmed hardware savings of the proposed approach over duplication.

field-programmable technology | 2011

Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures

Syed Mohammad Asad Hassan Jafri; Ahmed Hemani; Kolin Paul; Juha Plosila; Hannu Tenhunen

In the era of platforms hosting multiple applications, where inter-application communication and concurrency patterns are arbitrary, static compile time decision making is neither optimal nor desirable. As a part of solving this problem, we present a novel method for compactly representing multiple configuration bitstreams of a single application, with varying parallelisms, as a unique, compact, and customizable representation, called CGIR. The representation thus stored is unraveled at runtime to configure the device with optimal (e.g. in terms of energy) implementation. Our goal was to provide optimal decision making capability to the runtime resource manager (RTM) without compromising the runtime behavior or the memory requirements of the system. The presence of multiple binaries enhance optimality by providing the RTM with multiple implementations to choose from. The CGIR ensures minimal increase in memory requirements with the addition of each binary. The low-cost unraveling of CGIR guarantees the runtime behavior. We have chosen the dynamically reconfigurable resource array (DRRA) as a vehicle to study the feasibility of our approach. Simulation results using 16 point decimation in time fast Fourier transform (FFT) has showed massive (up to 18% for 2 versions, 33% for 3 versions) memory savings compared to state of the art. Formal evaluation shows that the savings increase with the increase in the number of implementations stored.

international conference on embedded computer systems architectures modeling and simulation | 2013

Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs

Syed Mohammad Asad Hassan Jafri; Muhammad Adeel Tajammul; Ahmed Hemani; Kolin Paul; Juha Plosila; Hannu Tenhunen

Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of multiple tasks, spatially mapped to different parts of platform. Providing worst-case operating point to all applications leads to excessive energy and power consumption. To cater this problem, dynamic voltage and frequency scaling (DVFS) is a frequently used technique. DVFS allows to scale the voltage and/or frequency of the device, based on runtime constraints. Recent research suggests that the efficiency of DVFS can be significantly enhanced by combining dynamic parallelism with DVFS. The proposed methods exploit the speedup induced by parallelism to allow aggressive frequency and voltage scaling. These techniques, employ greedy algorithm, that blindly parallelizes a task whenever required resources are available. Therefore, it is likely to parallelize a task(s) even if it offers no speedup to the application, thereby undermining the effectiveness of parallelism. As a solution to this problem, we present energy aware task parallelism. Our solution relies on a resource allocation graphs and an autonomous parallelism, voltage, and frequency selection algorithm. Using resource allocation graph, as a guide, the autonomous parallelism, voltage, and frequency selection algorithm parallelizes a task only if its parallel version reduces overall application execution time. Simulation results, using representative applications (MPEG4, WLAN), show that our solution promises better resource utilization, compared to greedy algorithm. Synthesis results (using WLAN) confirm a significant reduction in energy (up to 36%), power (up to 28%), and configuration memory requirements (up to 36%), compared to state of the art.

international symposium on quality electronic design | 2013

Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells

Syed Mohammad Asad Hassan Jafri; Ozan Bag; Ahmed Hemani; Nasim Farahini; Kolin Paul; Juha Plosila; Hannu Tenhunen

This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures

Syed Mohammad Asad Hassan Jafri; Ahmed Hemani; Kolin Paul; Juha Plosila; Hannu Tenhunen

This paper considers the possibility of speeding up the configuration by reducing the size of configware in coarse grained reconfigurable architectures (CGRAs). Our goal was to reduce the number of cycles and increase the configuration bandwidth. The proposed technique relies on multicasting and bit stream compression. The multicasting reduces the cycles by configuring the components performing identical functions simultaneously, in a single cycle, while the bit stream compression increases the configuration bandwidth. We have chosen the dynamically reconfigurable resource array (DRRA) architecture as a vehicle to study the efficiency of this approach. In our proposed method, the configuration bit stream is compressed offline and stored in a memory. If reconfiguration is required, the compressed bit stream is decompressed using an online decompresser and sent to DRRA. Simulation results using practical applications showed up to 78% and 22% decrease in configuration cycles for completely parallel and completely serial implementations, respectively. Synthesis results have confirmed nigligible overhead in terms of area (1.2 %) and timing.

digital systems design | 2013

Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs

Syed Mohammad Asad Hassan Jafri; Stanislaw J. Piestrak; Kolin Paul; Ahmed Hemani; Juha Plosila; Hannu Tenhunen

In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.

Microprocessors and Microsystems | 2014

Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric

Nasim Farahini; Ahmed Hemani; Hassan Sohofi; Syed Mohammad Asad Hassan Jafri; Muhammad Adeel Tajammul; Kolin Paul

This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.

application specific systems architectures and processors | 2013

Private configuration environments (PCE) for efficient reconfiguration, in CGRAs

Muhammad Adeel Tajammul; Syed Mohammad Asad Hassan Jafri; Ahmed Hemani; Juha Plosila; Hannu Tenhunen

In this paper, we propose a polymorphic configuration architecture, that can be tailored to efficiently support reconfiguration needs of the applications at runtime. Today, CGRAs host multiple applications, running simultaneously on a single platform. Novel CGRAs allow each application to exploit late binding and time sharing for enhancing the power and area efficiency. These features require frequent reconfigurations, making reconfiguration time a bottleneck for time critical applications. Existing solutions to this problem either employ powerful configuration architectures or hide configuration latency (using configuration caching). However, both these methods incur significant costs when designed for worst-case reconfiguration needs. As an alternative to worst-case dedicated configuration mechanism, we exploit reconfiguration to provide each application its private configuration environment (PCE). PCE relies on a morphable configuration infrastructure, a distributed memory sub-system, and a set of PCE controllers. The PCE controllers customize the morphable configuration infrastructure and reserve portion of the a distributed memory sub-system, to act as a context memory for each application, separately. Thereby, each application enjoys its own configuration environment which is optimal in terms of configuration speed, memory requirements and energy. Simulation results using representative applications (WLAN and Matrix Multiplication) showed that PCE offers up to 58% reduction in memory requirements, compared to dedicated, worst case configuration architecture. Synthesis results show that the morphable reconfiguration architecture incurs negligible overheads ( 3% area and 4% power compared of a single processing element).

digital systems design | 2012

Energy-Aware Fault-Tolerant Network-on-Chips for Addressing Multiple Traffic Classes

Syed Mohammad Asad Hassan Jafri; Liang Guang; Ahmed Hemani; Kolin Paul; Juha Plosila; Hannu Tenhunen

This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.

Microprocessors and Microsystems | 2014

Design of the coarse-grained reconfigurable architecture DART with on-line error detection

Syed Mohammad Asad Hassan Jafri; Stanislaw J. Piestrak; Olivier Sentieys; Sébastien Pillement

This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.

Explore More