Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Adam Page is active.

Publication


Featured researches published by Adam Page.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2015

A Flexible Multichannel EEG Feature Extractor and Classifier for Seizure Detection

Adam Page; Chris Sagedy; Emily Smith; Nasrin Attaran; Tim Oates; Tinoosh Mohsenin

This brief presents a low-power, flexible, and multichannel electroencephalography (EEG) feature extractor and classifier for the purpose of personalized seizure detection. Various features and classifiers were explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. Additionally, algorithmic and hardware optimizations were identified to further improve performance. The classifiers studied include


biomedical circuits and systems conference | 2015

A low power seizure detection processor based on direct use of compressively-sensed data and employing a deterministic random matrix

Ali Jafari; Adam Page; Chris Sagedy; Emily Smith; Tinoosh Mohsenin

k


ACM Journal on Emerging Technologies in Computing Systems | 2017

SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks

Adam Page; Ali Jafari; Colin Shea; Tinoosh Mohsenin

-nearest neighbor, support vector machines, naïve Bayes, and logistic regression (LR) . All feature and classifier pairs were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on ten patients. A fully flexible hardware system was implemented that offers parameters for the number of EEG channels, the number of features, the classifier type, and various word width resolutions. Five seizure detection processors with different classifiers have been fully placed and routed on a Virtex-5 field-programmable gate array and been compared. It was found that five features per channel with LR proved to be the best solution for the application of personalized seizure detection. LR had the best average F1 score of 91%, the smallest area and power footprint, and the lowest latency. The ASIC implementation of the same combination in 65-nm CMOS shows that the processor occupies 0.008 mm2 and dissipates 19 nJ at 484 Hz.


great lakes symposium on vlsi | 2016

Low-Power Manycore Accelerator for Personalized Biomedical Applications

Adam Page; Nasrin Attaran; Colin Shea; Houman Homayoun; Tinoosh Mohsenin

This work presents a low power multi-channel seizure detection processor based on Compressive Sensing (CS) algorithm. Direct use of compressively-sensed data is proposed for feature extraction and classification in order to reduce computational and data transmission energy due to reduced number of input samples. To further reduce power consumption of the system, using a deterministic random matrix (DRM) is proposed instead of implementing a random number generator circuit (LFSR) for compressive sensing circuit. For feature extraction, simple features are used and for classification Logistic Regression (LR) is employed. Three different architectures are implemented in Virtex-5 FPGA and are compared against each other. For compression rates of 2-16x, the energy consumption of the proposed 22-channel seizure detection processor including CS, feature extractor and classifier is 2.36-0.38 μJ and the detector performance for sensitivity and specificity is 80.7-78.8% and 85.3-83.5%, respectively. The proposed system with 16x compression rate consumes 0.38 μJ which is 6 times lower than the system without using compressive sensing.


international conference of the ieee engineering in medicine and biology society | 2015

An ultra low power feature extraction and classification system for wearable seizure detection

Adam Page; Siddharth Pramod Tim Oates; Tinoosh Mohsenin

Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database. Unfortunately, the benefits of deep networks have yet to be fully exploited in embedded, resource-bound settings that have strict power and area budgets. Graphical processing unit (GPU) have been shown to improve throughput and energy-efficiency over central processing unit (CPU) due to their highly parallel architecture yet still impose a significant power burden. In a similar fashion, field programmable gate array (FPGA) can be used to improve performance while further allowing more fine-grained control over implementation to improve efficiency. In order to reduce power and area while still achieving required throughput, classification-efficient network architectures are required in addition to optimal deployment on efficient hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in convolutional neural networks. The sparsification techniques include feature compression partition, structured filter pruning, and dynamic feature pruning. Additionally, we explore filter factorization and filter quantization approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose SPARCNet, a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. The accelerator looks to enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. To demonstrate both contributions, modern deep convolutional network architectures containing millions of parameters are explored within the context of the computer vision dataset CIFAR. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by 60% and 93% with less than 0.03% impact on accuracy when compared to the best baseline network with 93.47% accuracy. The SPARCNet accelerator with different numbers of processing engines is implemented on a low-power Artix-7 FPGA platform. Additionally, the same networks are optimally implemented on a number of embedded commercial-off-the-shelf platforms including NVIDIAs CPU+GPU SoCs TK1 and TX1 and Intel Edison. Compared to NVIDIAs TK1 and TX1, the FPGA-based accelerator obtains 11.8 × and 7.5 × improvement in energy efficiency while maintaining a classification throughput of 72 images/s. When further compared to a number of recent FPGA-based accelerators, SPARCNet is able to achieve up to 15 × improvement in energy efficiency while consuming less than 2W of total board power at 100MHz. In addition to improving efficiency, the accelerator has built-in support for sparsification techniques and ability to perform in-place rectified linear unit (ReLU) activation function, max-pooling, and batch normalization.Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database. Unfortunately, the benefits of deep networks have yet to be fully exploited in embedded, resource-bound settings that have strict power and area budgets. Graphical processing unit (GPU) have been shown to improve throughput and energy-efficiency over central processing unit (CPU) due to their highly parallel architecture yet still impose a significant power burden. In a similar fashion, field programmable gate array (FPGA) can be used to improve performance while further allowing more fine-grained control over implementation to improve efficiency. In order to reduce power and area while still achieving required throughput, classification-efficient network architectures are required in addition to optimal deployment on efficient hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in convolutional neural networks. The sparsification techniques include feature compression partition, structured filter pruning, and dynamic feature pruning. Additionally, we explore filter factorization and filter quantization approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose SPARCNet, a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. The accelerator looks to enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. To demonstrate both contributions, modern deep convolutional network architectures containing millions of parameters are explored within the context of the computer vision dataset CIFAR. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by 60% and 93% with less than 0.03% impact on accuracy when compared to the best baseline network with 93.47% accuracy. The SPARCNet accelerator with different numbers of processing engines is implemented on a low-power Artix-7 FPGA platform. Additionally, the same networks are optimally implemented on a number of embedded commercial-off-the-shelf platforms including NVIDIAs CPU+GPU SoCs TK1 and TX1 and Intel Edison. Compared to NVIDIAs TK1 and TX1, the FPGA-based accelerator obtains 11.8 × and 7.5 × improvement in energy efficiency while maintaining a classification throughput of 72 images/s. When further compared to a number of recent FPGA-based accelerators, SPARCNet is able to achieve up to 15 × improvement in energy efficiency while consuming less than 2W of total board power at 100MHz. In addition to improving efficiency, the accelerator has built-in support for sparsification techniques and ability to perform in-place rectified linear unit (ReLU) activation function, max-pooling, and batch normalization.


field programmable custom computing machines | 2016

FPGA-Based Reduction Techniques for Efficient Deep Neural Network Deployment

Adam Page; Tinoosh Mohsenin

Wearable personal health monitoring systems can offer a cost effective solution for human healthcare. These systems must provide both highly accurate, secured and quick processing and delivery of vast amount of data. In addition, wearable biomedical devices are used in inpatient, outpatient, and at home e-Patient care that must constantly monitor the patients biomedical and physiological signals 24/7. These biomedical applications require sampling and processing multiple streams of physiological signals with strict power and area footprint. The processing typically consists of feature extraction, data fusion, and classification stages that require a large number of digital signal processing and machine learning kernels. In response to these requirements, in this paper, a low-power, domain-specific manycore accelerator named Power Efficient Nano Clusters (PENC) is proposed to map and execute the kernels of these applications. Experimental results show that the manycore is able to reduce energy consumption by up to 80% and 14% for DSP and machine learning kernels, respectively, when optimally parallelized. The performance of the proposed PENC manycore when acting as a coprocessor to an Intel Atom processor is compared with existing commercial off-the-shelf embedded processing platforms including Intel Atom, Xilinx Artix-7 FPGA, and NVIDIA TK1 ARM-A15 with GPU SoC. The results show that the PENC manycore architecture reduces the energy by as much as 10X while outperforming all off-the-shelf embedded processing platforms across all studied machine learning classifiers.


application specific systems architectures and processors | 2013

An efficient & reconfigurable FPGA and ASIC implementation of a spectral Doppler ultrasound imaging system

Adam Page; Tinoosh Mohsenin

In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.


great lakes symposium on vlsi | 2018

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

Colin Shea; Adam Page; Tinoosh Mohsenin

Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep max-pooling convolutional neural networks (MPCNN) have been shown to dominate on several popular public benchmarks. Unfortunately, the benefits of deep networks have yet to be exploited in embedded, resource-bound settings that have strict power and area budgets. GPUs have been shown to improve throughput and energy-efficiency over CPUs due to their parallel architecture. In a similar fashion, FPGAs can improve performance while allowing more fine control over implementation. In order to meet power, area, and latency constraints, it is necessary to develop network reduction strategies in addition to optimal mapping. This work looks at two specific reduction techniques including limited precision for both fixed-point and floating-point formats, and performing weight matrix truncation using singular value decomposition. An FPGA-based framework is also proposed and used to deploy the trained networks. To demonstrate, a handful of public computer vision datasets including MNIST, CIFAR-10, and SVHN are fully implemented on a low-power Xilinx Artix-7 FPGA. Experimental results show that all networks are able to achieve a classification throughput of 16 img/sec and consume less than 700 mW when running at 200 MHz. In addition, the reduced networks are able to, on average, reduce power and area utilization by 37% and 44%, respectively, while only incurring less than 0.20% decrease in accuracy.


biomedical circuits and systems conference | 2015

Live demonstration: Towards an ultra low power on-board processor for Tongue Drive System

Ali Jafari; N. Buswell; Adam Page; Tinoosh Mohsenin; Md. Nazmus Sahadat; Maysam Ghovanloo

Pulsed wave (PW) Doppler ultrasound is a common technique used for making non-invasive velocity measurements of blood flow in humans. Most current PW Doppler ultrasound designs rely on fixed signal processing hardware; greatly limiting their versatility. This paper presents a highly efficient and highly versatile FPGA-based PW spectral Doppler ultrasound system. The system is implemented on a Virtex-5 FPGA using Xilinxs ISE design suite. In order to measure the accuracy of the system, a similar design was implemented in MATLAB. Furthermore, the design was also implemented in 65 nm CMOS ASIC design for performance comparisons. The Virtex-5 design requires 1,159 of 17,280 slice resources and consumes 1.089 watts of power when running at its maximum clock speed of 333 megahertz. The ASIC design has an area of .573 mm2 and consumes 41 mW of power at a maximum clock speed of 1 GHz.


Archive | 2014

Deep Belief Networks used on High Resolution Multichannel Electroencephalography Data for Seizure Detection

Jt Turner; Adam Page; Tinoosh Mohsenin; Tim Oates

As deep learning networks mature and improve classification performance, a significant challenge is their deployment in embedded settings. Modern network typologies, such as convolutional neural networks, can be very deep and impose considerable complexity that is often not feasible in resource bound, real-time systems. Processing of these networks requires high levels of parallelization, maximizing data throughput, and support for different network types, while minimizing power and resource consumption. In response to these requirements, in this paper, we present a low power FPGA based neural network accelerator named SCALENet: a SCalable Low power AccELerator for real-time deep neural Networks. Key features include optimization for power with coarse and fine grain scheduler, implementation flexibility with hardware only or hardware/software co-design, and acceleration for both fully connected and convolutional layers. The experimental results evaluate SCALENet against two different neural network applications: image processing, and biomedical seizure detection. The image processing networks, implemented on SCALENet, trained on the CIFAR-10 and ImageNet datasets with eight different networks, are implemented on an Arty A7 and Zedboard#8482; FPGA platforms. The highest improvement came with the Inception network on an ImageNet dataset with a throughput of 22x and decrease in energy consumption of 13x compared to the ARM processor implementation. We then implement SCALENet for time series EEG seizure detection using both a Direct Convolution and FFT Convolution method to show its design versatility with a 99.7% reduction in execution time and a 97.9% improvement in energy consumption compared to the ARM. Finally, we demonstrate the ability to achieve parity with or exceed the energy efficiency of NVIDIA GPUs when evaluated against Jetson TK1 with embedded GPU System on Chip (SoC) and with a 4x power savings in a power envelope of 2.07 Watts.

Collaboration


Dive into the Adam Page's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ali Jafari

University of Maryland

View shared research outputs
Top Co-Authors

Avatar

Colin Shea

University of Maryland

View shared research outputs
Top Co-Authors

Avatar

Tim Oates

University of Maryland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Emily Smith

University of Maryland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jt Turner

University of Maryland

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge