Mohammed Shoaib | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammed Shoaib is active.

Explore More

Publication

Featured researches published by Mohammed Shoaib.

design automation conference | 2015

Scalable-effort classifiers for energy-efficient machine learning

Swagath Venkataramani; Anand Raghunathan; Jie Liu; Mohammed Shoaib

Supervised machine-learning algorithms are used to solve classification problems across the entire spectrum of computing platforms, from data centers to wearable devices, and place significant demand on their computational capabilities. In this paper, we propose scalable-effort classifiers, a new approach to optimizing the energy efficiency of supervised machine-learning classifiers. We observe that the inherent classification difficulty varies widely across inputs in real-world datasets; only a small fraction of the inputs truly require the full computational effort of the classifier, while the large majority can be classified correctly with very low effort. Yet, state-of-the-art classification algorithms expend equal effort on all inputs, irrespective of their difficulty. To address this inefficiency, we introduce the concept of scalable-effort classifiers, or classifiers that dynamically adjust their computational effort depending on the difficulty of the input data, while maintaining the same level of accuracy. Scalable effort classifiers are constructed by utilizing a chain of classifiers with increasing levels of complexity (and accuracy). Scalable effort execution is achieved by modulating the number of stages used for classifying a given input. Every stage in the chain contains an ensemble of biased classifiers, where each biased classifier is trained to detect a single class more accurately. The degree of consensus between the biased classifiers outputs is used to decide whether classification can be terminated at the current stage or not. Our methodology thus allows us to transform any given classification algorithm into a scalable-effort chain. We build scalable-effort versions of 8 popular recognition applications using 3 different classification algorithms. Our experiments demonstrate that scalable-effort classifiers yield 2.79x reduction in average operations per input, which translates to 2.3x and 1.5x improvement in energy for hardware and software implementations, respectively.

custom integrated circuits conference | 2012

A compressed-domain processor for seizure detection to simultaneously reduce computation and communication energy

Mohammed Shoaib; Niraj K. Jha; Naveen Verma

In low-power sensing systems, communication constraints play a critical role; e.g., biomedical devices often acquire physiological signals from distributed sources and/or wireless implants. Compressive sensing enables sub-Nyquist sampling for low-energy data reduction on such nodes. The reconstruction cost, however, is severe, typically pushing signal analysis to a base station. We present a seizure-detection processor that directly analyzes compressively-sensed electroencephalograms (EEGs) on the sensor node. In addition to alleviating communication costs while also circumventing reconstruction costs, it leads to computational energy savings, due to the reduced number of input samples. This provides an effective knob for system power management and enables scaling of energy and application-level performance. For compression factors of 2-24×, the energy to extract signal features (over 18 channels) is 7.13-0.11μJ, and the detectors performance for sensitivity, latency, and specificity is 96-80%, 4.7-17.8 sec, and 0.15-0.79 false-alarms/hr., respectively (compared to baseline performance of 96%, 4.6 sec, and 0.15 false-alarms/hr.).

IEEE Transactions on Very Large Scale Integration Systems | 2015

Signal Processing With Direct Computations on Compressively Sensed Data

Mohammed Shoaib; Niraj K. Jha; Naveen Verma

Sparsity is characteristic of a signal that potentially allows us to represent information efficiently. We present an approach that enables efficient representations based on sparsity to be utilized throughout a signal processing system, with the aim of reducing the energy and/or resources required for computation, communication, and storage. The representation we focus on is compressive sensing. Its benefit is that compression is achieved with minimal computational cost through the use of random projections; however, a key drawback is that reconstruction is expensive. We focus on inference frameworks for signal analysis. We show that reconstruction can be avoided entirely by transforming signal processing operations (e.g., wavelet transforms, finite impulse response filters, etc.) such that they can be applied directly to the compressed representations. We present a methodology and a mathematical framework that achieve this goal and also enable significant computational-energy savings through operations over fewer input samples. This enables explicit energy-versus-accuracy tradeoffs that are under the control of the designer. We demonstrate the approach through two case studies. First, we consider a system for neural prosthesis that extracts wavelet features directly from compressively sensed spikes. Through simulations, we show that spike sorting can be achieved with 54× fewer samples, providing an accuracy of 98.63% in spike count, 98.56% in firing-rate estimation, and 96.51% in determining the coefficient of variation; this compares with a baseline Nyquist-domain detector with corresponding performance of 98.97%, 99.69%, and 97.09%, respectively. Second, we consider a system for detecting epileptic seizures by extracting spectral-energy features directly from compressively sensed electroencephalogram. Through simulations of the end-to-end algorithm, we show that detection can be achieved with 21× fewer samples, providing a sensitivity of 94.43%, false alarm rate of 0.1543/h, and latency of 4.70 s; this compares with a baseline Nyquist-domain detector with corresponding performance of 96.03%, 0.1471/h, and 4.59 s, respectively.

IEEE Transactions on Circuits and Systems | 2014

A 0.6–107 µW Energy-Scalable Processor for Directly Analyzing Compressively-Sensed EEG

Mohammed Shoaib; Kyong Ho Lee; Niraj K. Jha; Naveen Verma

Compressive sensing has been used to overcome communication constraints (energy and bandwidth) in low-power sensors. In this work, we present a seizure-detection processor that directly uses compressively-sensed electroencephalograms (EEGs) for embedded signal analysis. In addition to addressing communication, this has two advantages for local computation. First, with compressive sensing, reconstruction costs are typically severe, precluding embedded analysis; directly analyzing the compressed signals circumvents reconstruction costs, enabling embedded analysis within applications. Second, compared to Nyquist-sampled signals, the use of compressed representations reduces the computational energy of signal analysis due to the reduced number of signal samples. We describe an algorithmic formulation as well as a hardware architecture that enables two strong power-management knobs, wherein application-level performance can scale with computational energy. The two knobs are parameterized as follows: 1) ξ, which quantifies the amount of data compression, and 2) ν, which determines the approximation error within the proposed compressed-domain processing algorithm. For ξ and ν in the range 2-24×, the energy to extract signal features (over 18 channels) is 70.8-1.3 nJ, and the detectors performance for sensitivity, latency, and specificity is 96-91%, 4.7-5.3 sec., and 0.17-0.30 false-alarms/hr., respectively (compared to a baseline performance of 96%, 4.6 sec., and 0.15 false-alarms/hr.).

design automation conference | 2011

A low-energy computation platform for data-driven biomedical monitoring algorithms

Mohammed Shoaib; Niraj K. Jha; Naveen Verma

A key challenge in closed-loop chronic biomedical systems is the ability to detect complex physiological states from patient signals within a constrained power budget. Data-driven machine-learning techniques are major enablers for the modeling and interpretation of such states. Their computational energy, however, scales with the complexity of the required models. In this paper, we propose a low-energy, biomedical computation platform optimized through the use of an accelerator for data-driven classification. The accelerator retains selective flexibility through hardware reconfiguration and exploits voltage scaling and parallelism to operate at a sub-threshold minimum-energy point. Using cardiac arrhythmia detection algorithms with patient data from the MIT-BIH database, classification is achieved in 2.96 µJ (at Vdd = 0.4 V), over four orders of magnitude smaller than that on a low-power general-purpose processor. The energy of feature extraction is 148 µJ while retaining flexibility for a range of possible biomarkers.

design, automation, and test in europe | 2012

Enabling advanced inference on sensor nodes through direct use of compressively-sensed signals

Mohammed Shoaib; Niraj K. Jha; Naveen Verma

Nowadays, sensor networks are being used to monitor increasingly complex physical systems, necessitating advanced signal analysis capabilities as well as the ability to handle large amounts of network data. For the first time, we present a methodology to enable advanced decision support on a low-power sensor node through the direct use of compressively-sensed signals in a supervised-learning framework; such signals provide a highly efficient means of representing data in the network, and their direct use overcomes the need for energy-intensive signal reconstruction. Sensor networks for advanced patient monitoring are representative of the complexities involved. We demonstrate our technique on a patient-specific seizure detection algorithm based on electroencephalograph (EEG) sensing. Using data from 21 patients in the CHB-MIT database, our approach demonstrates an overall detection sensitivity, latency, and false alarm rate of 94.70%, 5.83 seconds, and 0.199 per hour, respectively, while achieving data compression by a factor of 10x. This compares well with the state-of-the-art baseline detector with corresponding results being 96.02%, 4.59 seconds, and 0.145 per hour, respectively.

international conference on e-health networking, applications and services | 2011

Digital pacer detection in diagnostic grade ECG

Mohammed Shoaib; Harinath Garudadri

Pulses from a cardiac pacemaker appear as extremely narrow and low- amplitude spikes in an ECG. These get misinterpreted for R-peaks by QRS detectors, leading to subsequent faulty analysis of several algorithms which rely on beat-segmentation. Detection of the pacer pulses, thus, necessitates sampling the ECG signal at high data rates of 4–16 kHz. In a wireless body sensor network, transmission of this high-bandwidth data to a processing gateway, for pacer detection, is extremely power consuming. In this paper, we describe a compressed sensing approach, which enables reliable detection of AAMI/EC11 specified pacer pulses using ECG data rates of 50–100 sps, an order of magnitude smaller than those used in typical detection algorithms in the literature.

wearable and implantable body sensor networks | 2015

Rate-adaptive compressed-sensing and sparsity variance of biomedical signals

Vahid Behravan; Neil E. Glover; Rutger Farry; Patrick Chiang; Mohammed Shoaib

Biomedical signals exhibit substantial variance in their sparsity, preventing conventional a-priori open-loop setting of the compressed sensing (CS) compression factor. In this work, we propose, analyze, and experimentally verify a rate-adaptive compressed-sensing system where the compression factor is modified automatically, based upon the sparsity of the input signal. Experimental results based on an embedded sensor platform exhibit a 16.2% improvement in power consumption for the proposed rate-adaptive CS versus traditional CS with a fixed compression factor. We also demonstrate the potential to improve this number to 24% through the use of an ultra low power processor in our embedded system.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Algorithm-Driven Architectural Design Space Exploration of Domain-Specific Medical-Sensor Processors

Mohammed Shoaib; Niraj K. Jha; Naveen Verma

Data-driven machine-learning techniques enable the modeling and interpretation of complex physiological signals. The energy consumption of these techniques, however, can be excessive, due to the complexity of the models required. In this paper, we study the tradeoffs and limitations imposed by the energy consumption of high-order detection models implemented in devices designed for intelligent biomedical sensing. Based on the flexibility and efficiency needs at various processing stages in data-driven biomedical algorithms, we explore options for hardware specialization through architectures based on custom instruction and coprocessor computations. We identify the limitations in the former, and propose a coprocessor-based platform that exploits parallelism in computation as well as voltage scaling to operate at a subthreshold minimum-energy point. We present results from post-layout simulation of cardiac arrhythmia detection with patient data from the MIT-BIH database. After wavelet-based feature extraction, which consumes 12.28 μJ, we demonstrate classification computations in the 12.00-120.05 μJ range using 10000-100000 support vectors. This represents 1170× lower energy than that of a low-power processor with custom instructions alone. After morphological feature extraction, which consumes 8.65 μJ of energy, the corresponding energy numbers are 10.24-24.51 μJ, which is 1548× smaller than one based on a custom-instruction design. Results correspond to Vdd=0.4 V and a data precision of 8 b.

international symposium on performance analysis of systems and software | 2016

X-Mem: A cross-platform and extensible memory characterization tool for the cloud

Mark Gottscho; Sriram Govindan; Bikash Sharma; Mohammed Shoaib; Puneet Gupta

Effective use of the memory hierarchy is crucial to cloud computing. Platform memory subsystems must be carefully provisioned and configured to minimize overall cost and energy for cloud providers. For cloud subscribers, the diversity of available platforms complicates comparisons and the optimization of performance. To address these needs, we present X-Mem, a new open-source software tool that characterizes the memory hierarchy for cloud computing.

Explore More