Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Bankman is active.

Publication


Featured researches published by Daniel Bankman.


asilomar conference on signals, systems and computers | 2015

Mixed-signal circuits for embedded machine-learning applications

Boris Murmann; Daniel Bankman; Elaina Chai; Daisuke Miyashita; Lita Yang

Machine learning algorithms are attractive solutions for a number of problems in data analytics and sensor signal classification. However, to enable the deployment of such algorithms in embedded hardware, significant progress must be made to reduce the large power dissipation of current GPU and FPGA-based implementations. Our work studies the trade-off between energy and accuracy in neural networks, and looks to incorporate mixed-signal design techniques to achieve low power dissipation in a semi-programmable ASIC implementation.


asian solid state circuits conference | 2016

An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS

Daniel Bankman; Boris Murmann

An 8-bit, 16 input, switched-capacitor dot product circuit in 28-nm FDSOI CMOS is presented. The design uses sixteen 8-bit passive charge redistribution digital-to-analog multipliers followed by an 8-bit SAR ADC. Measured energy per dot product operation is 3.2 pJ. When used to compute partial dot products for the hidden layer of a three-layer neural network, the design achieves a classification accuracy of 98.0% on the MNIST handwritten digit dataset.


Archive | 2019

Conclusions, Contributions, and Future Work

Bert Moons; Daniel Bankman; Marian Verhelst

This dissertation has focused on techniques to minimize the energy consumption of deep learning algorithms for embedded applications on battery-constrained wearable edge devices. Although SotA in many typical machine-learning tasks, deep learning algorithms are also very costly in terms of energy consumption, due to their large amount of required computations and huge model sizes. Because of this, deep learning applications on battery-constrained wearables have only been possible through wireless connections with a resourceful cloud. This setup has several drawbacks. First, there are privacy concerns. This setup requires users to share their raw data—images, video, locations, and speech—with a remote system. As most users are not willing to share all of this, large-scale applications cannot yet be developed. Second, the cloud-setup requires users to be connected all the time, which is unfeasible given current cellular coverage. Furthermore, real-time applications require low latency connections, which cannot be guaranteed using the current communication infrastructure. Finally, this wireless connection is very inefficient—requiring too much energy per transferred bit for real-time data transfer on energy constrained platforms. All these issues—privacy, latency/connectivity, and costly wireless connections—can be resolved by moving towards computing on the edge.


Archive | 2019

Circuit Techniques for Approximate Computing

Bert Moons; Daniel Bankman; Marian Verhelst

This chapter focuses on approximate computing (AC), a set of software- and primarily hardware-level techniques in which algorithm accuracy is traded for energy consumption by deliberately introducing acceptable errors into the computing process. It is hence a means of efficiently exploiting a neural network’s fault-tolerance to reduce its energy consumption, as was first discussed on the system level in Chap. 3. Approximate computing techniques have become crucial to reduce energy in modern neural network acceleration, as computational and storage demands are still high and traditional methods in device engineering and architectural design fail to significantly reduce those costs. The first part of this chapter is a general introduction to common approximate computing techniques on several levels of the design hierarchy. The second part focuses on dynamic-voltage-accuracy-frequency-scaling (DVAFS), a third major contribution of this text. It is a dynamic arithmetic precision scaling method on the circuit-level that enables minimum energy test-time FPNNs and QNNs, as discussed in Chap. 3. Chapter 5 discusses two physically implemented CNN chips that apply this DVAFS technique in real silicon. BinarEye, discussed in Chap. 6, can be used in DVAFS modes as well.


Archive | 2019

ENVISION: Energy-Scalable Sparse Convolutional Neural Network Processing

Bert Moons; Daniel Bankman; Marian Verhelst

This chapter focuses on Envision: two generations of energy-scalable sparse convolutional neural network processors. They achieve SotA energy-efficiency through leveraging the three key CNN-characteristics discussed in Chap. 3. (a) Inherent CNN parallelism is exploited through a highly parallelized processor architecture that minimizes internal memory bandwidth. (b) Inherent network sparsity in pruned networks and RELU activated feature maps is leveraged by compressing sparse IO-datastreams and skipping unnecessary computations. (c) The inherent fault-tolerance of CNNs is exploited by making this architecture DVAS or DVAFS compatible in Envision V1 and V2, respectively, according to the theory discussed in Chap. 4. This capability allows minimizing energy consumption for any CNN, with any computational precision requirement up to 16b fixed-point. Through its energy-scalability and high energy-efficiency, Envision lends itself perfectly for hierarchical applications, discussed in Chap. 2. It hereby enables CNN processing in always-on embedded applications.


Archive | 2019

BINAREYE: Digital and Mixed-Signal Always-On Binary Neural Network Processing

Bert Moons; Daniel Bankman; Marian Verhelst

The Envision CNN processors discussed in Chap. 5 are efficient but not sufficient for always-on embedded inference. Both neural networks and ASICs can be further optimized for such specific applications. To this end, this chapter focuses on two prototypes of BinaryNets, neural networks with all weights and activations constrained to + ∕ − 1. Both chips target always-on visual applications and can be used as visual wake-up sensors in a hierarchical vision application, as discussed in Chap. 2. These two chips are orthogonally optimized. The Mixed-Signal Binary Neural Net (MSBNN) accelerator is an implementation of the 256X architecture. It is fully optimized for energy efficiency, by leveraging analog computations. The all-digital BinarEye, an implementation of the SX architecture, focuses on the system level. It is designed for flexibility, allowing it to trade-off energy for network accuracy at run-time. This chapter discuses and compares both designs.


Archive | 2019

Embedded Deep Neural Networks

Bert Moons; Daniel Bankman; Marian Verhelst

Deep learning networks have recently come up as the state-of-the-art classification algorithms in artificial intelligence, achieving super-human performance in a number of perceptive tasks in computer vision and automated speech recognition. Although these networks are extremely powerful, bringing their functionality to always-on embedded devices and hence to wearable applications is currently impossible because of their compute and memory requirements. First, this chapter introduces the basic concepts in machine learning and deep learning: network architectures and how to train them. Second, this chapter lists the challenges associated with the large compute requirements in deep learning and outlines a vision to overcome them. Finally, this chapter gives an overview of my contributions to the field and a general structure of the book.


Archive | 2019

Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing

Bert Moons; Daniel Bankman; Marian Verhelst

Chapter 1 Embedded Deep Neural Networks -- Chapter 2 Optimized Hierarchical Cascaded Processing -- Chapter 3 Hardware-Algorithm Co-optimizations -- Chapter 4 Circuit Techniques for Approximate Computing -- Chapter 5 ENVISION: Energy-Scalable Sparse Convolutional Neural Network Processing -- Chapter 6 BINAREYE: Digital and Mixed-signal Always-on Binary Neural Network Processing -- Chapter 7 Conclusions, contributions and future work.


Archive | 2019

Hardware-Algorithm Co-optimizations

Bert Moons; Daniel Bankman; Marian Verhelst

As discussed in Chap. 1, neural network-based applications are still too costly for them to be embedded on mobile and always-on devices. This chapter discusses hardware aware algorithm-level solutions for this problem. As an introduction to this topic, this chapter gives an overview of existing work in hardware and neural network co-optimizations. Two own contributions in hardware-algorithm optimization are discussed and compared: network quantization either at test- and train-time. The chapter ends with a methodology for designing minimum energy quantized neural networks—networks trained for low-precision fixed-point operation, a second major contribution of this text.


design automation conference | 2018

TRIG: hardware accelerator for inference-based applications and experimental demonstration using carbon nanotube FETs

Gage Hills; Daniel Bankman; Bert Moons; Lita Yang; Jake Hillard; Alex Kahng; Rebecca S. Park; Marian Verhelst; Boris Murmann; Max M. Shulaker; H.-S. Philip Wong; Subhasish Mitra

The energy efficiency demands of future abundant-data applications, e.g., those which use inference-based techniques to classify large amounts of data, exceed the capabilities of digital systems today. Field-effect transistors (FETs) built using nanotechnologies, such as carbon nanotubes (CNTs), can improve energy efficiency significantly. However, carbon nanotube FETs (CNFETs) are subject to process variations inherent to CNTs: variations in CNT type (semiconductor or metallic), CNT density, or CNT diameter, to name a few. These CNT variations can degrade CNFET benefits at advanced technology nodes. One path to overcome CNT variations is to co-optimize CNT processing and CNFET circuit design; however, the required CNT process advancements have not been achieved experimentally. We present a new design approach (TRIG, Technique for Reducing errors using Iterative Gray code) to overcome process variations in hardware accelerators targeting inference-based applications that use serial matrix operations (serial: accumulated over at least 2 clock cycles). We demonstrate that TRIG can retain the major energy efficiency benefits (quantified using Energy Delay Product or EDP) of CNFETs despite CNT variations that exist in todays CNFET fabrication - without requiring further CNT processing improvements to overcome CNT variations. As a case study, we analyze the effectiveness of TRIG for a binary neural network hardware accelerator that classifies images. Despite CNT variations that exist today, TRIG can maintain 99% (90%) of projected EDP benefits of CNFET digital circuits for 90% (99%) image classification accuracy target. We also demonstrate experimentally fabricated CNFET circuits to compute scalar product (a common matrix operation, also called dot product), with and without TRIG: TRIG reduces the mean difference between the expected result (no errors) and the experimentally computed result by 30 × in the presence of CNT variations, shown experimentally.

Collaboration


Dive into the Daniel Bankman's collaboration.

Top Co-Authors

Avatar

Bert Moons

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Marian Verhelst

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge