Mohammed Alawad | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammed Alawad is active.

Explore More

Publication

Featured researches published by Mohammed Alawad.

field programmable gate arrays | 2014

Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation

Mohammed Alawad; Yu Bai; Ronald F. DeMara; Mingjie Lin

Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.

IEEE Transactions on Emerging Topics in Computing | 2016

Survey of Stochastic-based Computation Paradigms

Mohammed Alawad; Mingjie Lin

Effectively tackling the upcoming “zettabytes” data explosion requires a huge quantum leap in our computing power and energy efficiency. However, with the Moore’s law dwindling quickly, the physical limits of CMOS technology make it almost intractable to achieve high energy efficiency if the traditional “deterministic and precise” computing model still dominates. Worse yet, the upcoming data explosion mostly comprises statistics gleaned from uncertain, imperfect real-world environment. As such, the traditional computing means of first principles modeling or explicit statistical modeling will very likely be ineffective to achieve flexibility, autonomy, and human interaction. The bottom line is clear: given where we are headed, the fundamental principle of modern computing—deterministic logic circuits can flawlessly emulate propositional logic deduction governed by Boolean algebra—has to be reexamined, and transformative changes in the foundation of modern computing must be made. This paper surveys some of the most important works on stochastic-based computing. We specifically focus on four important research areas: 1) random number generation, 2) stochastic computing, 3) stochastic electronics, and 4) emerging device technology and its potential application in stochastic computing. All these research works share two distinctive features. First, they embrace and exploit quantum-induced randomness as an invaluable information carrier, not the “villain of correct computation” to be suppressed. Second, the theoretical foundation underpinning most of these works are based on neither the Boolean algebra in digital circuit nor the nonlinear amplifying and filtering in analog circuit. Instead, it tightly brings together the algorithmatic essence of computing and the quantum-induced randomness through the powerful framework of stochastic-based computing transformation.

field-programmable custom computing machines | 2015

FIR Filter Based on Stochastic Computing with Reconfigurable Digital Fabric

Mohammed Alawad; Mingjie Lin

FIR filtering is widely used in many important DSP applications in order to achieve filtering stability and linear-phase property. This paper presents a hardware-and energy-efficient approach to implement FIR filtering through reconfigurable stochastic computing. Specifically, we exploit a basic probabilistic principle of summing independent random variables to achieve approximate FIR filtering without costly multiplications. This allows our proposed FIR architecture to achieve about 9 times and 4 times less power consumption than the conventional multiplier-based and DA-based design, respectively. Additionally, when compared with the state-of-the art systolic DA-based design, our design can achieve about 3times reduction in hardware usage.

2014 IEEE Dallas Circuits and Systems Conference (DCAS) | 2014

Energy-efficient imprecise reconfigurable computing through probabilistic domain transformation

Mohammed Alawad; Mingjie Lin

Many DSP applications naturally possess so-called “inherent error resilience”, meaning certain degrees of computational errors would not noticeably impair their eventual quality of results. Such a phenomenon offers an interesting opportunity to significantly improve the overall energy efficiency of these DSP applications at the cost of minute degradations in computing accuracy. This work presents a probabilistic-based methodology to perform high-performance DSP applications while achieving low power consumption. Deviating from all published approximate computing methods, our solution leverages a fundamental probability principle to implement a reconfigurable finite impulse response FIR digital filter specifically designed for FPGA-based image and video processors. Our method is trading off performance efficiency and power consumption against accuracy of the output results. To validate this proposed probabilistic architecture for discrete FIR filter, We have developed a 16-tap FIR filter with Virtex 5 FPGA devices (XC5VSX95T-1FF1136). Our prototype of probabilistic-based reconfigurable FIR filter consumes 9 times less power than multiplier-based FIR filter and dissipates 43.13 μJ in dynamic energy consumption to perform filtering on a (256×256) pixel image. We believe that this new architecture can be exploited in all the real-time applications in which energy-efficient FIR filters are required and it can be realized with many other FPGA device families.

field-programmable custom computing machines | 2013

Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph

Yu Bai; Abigail Fuentes; Michael Riera; Mohammed Alawad; Mingjie Lin

Emerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement single-chip distributed logic-memory (DLM) architectures [1]. Being “memory-centric”, the DLM architecture can significantly improve the overall performance and energy efficiency of many memory-intensive embedded applications, especially those that exhibit irregular array data access patterns at algorithmic level. However, implementing DLM architecture poses unique challenges to an FPGA designer in terms of 1) organizing and partitioning diverse on-chip memory resources, and 2) orchestrating effective data transmission between on-chip and off-chip memory. In this paper, we offer our solutions to both of these challenges. Specifically, 1) we propose a stochastic memory partitioning scheme based on the well-known simulated annealing algorithm. It obtains memory partitioning solutions that promote parallelized memory accesses by exploring large solution space; 2) we augment the proposed DLM architecture with a reconfigure hardware graph that can dynamically compute precedence relationship between memory partitions, thus effectively exploiting algorithmic level memory parallelism on a per-application basis. We evaluate the effectiveness of our approach (A3) against two other DLM architecture synthesizing methods: an algorithmic-centric reconfigurable computing architectures with a single monolithic memory (A1) and the heterogeneous distributed architectures synthesized according to [1] (A2). To make our comparison fair, in all three architectures, the data path remains the same while local memory architecture differs. For each of ten benchmark applications from SPEC2006 and MiBench [2], we break down the performance benefit of using A3 into two parts: the portion due to stochastic local memory partitioning and the portion due to the dynamic graph-based memory arbitration. All experiments have been conducted with a Virtex-5 (XCV5LX155T-2) FPGA. On average, our experimental results show that our proposed A3 architecture outperforms A2 and A1 by 34% and 250%, respectively. Within the performance improvement of A3 over A2, more than 70% improvement comes from the hardware graph-based memory scheduling.

international symposium on quality electronic design | 2017

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Mohammed Alawad; Mingjie Lin

Large-scale convolutional neural network (CNN), conceptually mimicking the operational principle of visual perception in human brain, has been widely applied to tackle many challenging computer vision and artificial intelligence applications. Unfortunately, despite of its simple architecture, a typically-sized CNN is well known to be computationally intensive. This work presents a novel stochastic-based and scalable hardware architecture and circuit design that computes a large-scale CNN with FPGA. The key idea is to implement all key components of a deep learning CNN, including multi-dimensional convolution, activation, and pooling layers, completely in the probabilistic computing domain in order to achieve high computing robustness, high performance, and low hardware usage. Most importantly, through both theoretical analysis and FPGA hardware implementation, we demonstrate that stochastic-based deep CNN can achieve superior hardware scalability when compared with its conventional deterministic-based FPGA implementation by allowing a stream computing mode and adopting efficient random sample manipulations. Overall, being highly scalable and energy efficient, our stochastic-based convolutional neural network architecture is well-suited for a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images, especially those perception-based computing tasks that are inherently fault-tolerant, while still requiring high energy efficiency.

field programmable gate arrays | 2017

Stochastic-Based Multi-stage Streaming Realization of a Deep Convolutional Neural Network (Abstract Only)

Mohammed Alawad; Mingjie Lin

Large-scale convolutional neural network (CNN), conceptually mimicking the operational principle of visual perception in human brain, has been widely applied to tackle many challenging computer vision and artificial intelligence applications. Unfortunately, despite of its simple architecture, a typically sized CNN is well known to be computationally intensive. This work presents a novel stochastic-based and scalable hardware architecture and circuit design that computes a large-scale CNN with FPGA. The key idea is to implement all key components of a deep learning CNN, including multi-dimensional convolution, activation, and pooling layers, completely in the probabilistic computing domain in order to achieve high computing robustness, high performance, and low hardware usage. Most importantly, through both theoretical analysis and FPGA hardware implementation, we demonstrate that stochastic-based deep CNN can achieve superior hardware scalability when compared with its conventional deterministic-based FPGA implementation by allowing a stream computing mode and adopting efficient random sample manipulations. Overall, being highly scalable and energy efficient, our stochastic-based convolutional neural network architecture is well-suited for a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images, especially those perception-based computing tasks that are inherently fault-tolerant, while still requiring high energy efficiency.

ACM Journal on Emerging Technologies in Computing Systems | 2017

Sketching Computation with Stochastic Processing Engines

Mohammed Alawad; Mingjie Lin

This article explores how to leverage stochastic principles to gracefully exploit partial computation results, hence achieving quality-scalable embedded computing. Our work is inspired by the concept of incremental sketching frequently found in artistic rendering, where the drawing procedure consists of a series of steps, each gradually improving the quality of results. The essence of our approach is to first encode input signals as probability density functions (PDFs), then perform stochastic computing operations on all signals in the probabilistic domain, and finally decode output signals by estimating the PDF of these resulting random samples. Although numerous approximate computing schemes exist, such as inaccurate adders and multipliers that reduce bit width or weaken logic circuit design, none of them can seamlessly improve computing accuracy incrementally without making any changes to the computing hardware at runtime. Furthermore, in conventional embedded computing, a sudden shortage of computing resources, such as premature termination, often means a complete computing failure and totally unusable results. Our sketching computing scheme can readily trade off between the quality of results and computing efforts without modifying its circuit design. To validate our proposed architecture design, we have implemented a proof-of-concept computation sketching engine based on a probabilistic convolver using a Virtex-6 FPGA device. Using three widely deployed image processing applications—image correspondence, image sharpening, and edge detection—we have demonstrated that important embedded computing applications can indeed be “sketched” in a graceful manner using roughly one third the hardware and one fifth the energy compared to the traditional multiplier-based computing method.

ieee computer society annual symposium on vlsi | 2016

Stochastic-Based Convolutional Networks with Reconfigurable Logic Fabric

Mohammed Alawad; Mingjie Lin

Convolutional neural network (CNN), well-known to be computationally intensive, is a fundamental algorithmic building block in many computer vision and artificial intelligence applications that follow the deep learning principle. This work presents a novel stochastic-based and scalable hardware architecture and circuit design that computes a convolutional neural network with FPGA. The key idea is to implement a multidimensional convolution accelerator that leverages the widely-used convolution theorem. Our approach has three advantages. First, it can achieve significantly lower algorithmic complexity for any given accuracy requirement. This computing complexity, when compared with that of conventional multiplier-based and FFT-based architectures, represents a significant performance improvement. Second, this proposed stochastic-based architecture is highly fault-tolerant because the information to be processed is encoded with a large ensemble of random samples. As such, the local perturbations of its computing accuracy will be dissipated globally, thus becoming inconsequential to the final overall results. Overall, being highly scalable and energy efficient, our stochastic-based convolutional neural network architecture is well-suited for a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images, especially those perception-based computing tasks that are inherently fault-tolerant. We also present a performance comparison between FPGA implementations that use deterministic-based and Stochastic-based architectures.

field programmable gate arrays | 2016

Stochastic-Based Convolutional Networks with Reconfigurable Logic Fabric (Abstract Only)

Mohammed Alawad; Mingjie Lin

Large-scale convolutional neural network (CNN), well-known to be computationally intensive, is a fundamental algorithmic building block in many computer vision and artificial intelligence applications that follow the deep learning principle. This work presents a novel stochastic-based and scalable hardware architecture and circuit design that computes a convolutional neural network with FPGA. The key idea is to implement a multi-dimensional convolution accelerator that leverages the widely-used convolution theorem. Our approach has three advantages. First, it can achieve significantly lower algorithmic complexity for any given accuracy requirement. This computing complexity, when compared with that of conventional multiplierbased and FFT-based architectures, represents a significant performance improvement. Second, this proposed stochastic-based architecture is highly fault-tolerant because the information to be processed is encoded with a large ensemble of random samples. As such, the local perturbations of its computing accuracy will be dissipated globally, thus becoming inconsequential to the final overall results. Overall, being highly scalable and energy efficient, our stochastic-based convolutional neural network architecture is well-suited for a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images, especially those perception-based computing tasks that are inherently fault-tolerant. We also present a performance comparison between FPGA implementations that use deterministic-based and Stochastic-based architectures.

Explore More