Is this you? Create Your Porfile

Bert Moons

Katholieke Universiteit Leuven

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bert Moons is active.

Explore More

Publication

Featured researches published by Bert Moons.

symposium on vlsi circuits | 2016

A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets

Bert Moons; Marian Verhelst

A low-power precision-scalable processor for ConvNets or convolutional neural networks (CNN) is implemented in a 40nm technology. Its 256 parallel processing units achieve a peak 102GOPS running at 204MHz. To minimize energy consumption while maintaining throughput, this works is the first to both exploit the sparsity of convolutions and to implement dynamic precision-scalability enabling supply- and energy scaling. The processor is fully C-programmable, consumes 25-288mW at 204 MHz and scales efficiency from 0.3-2.6 real TOPS/W. This system hereby outperforms the state-of-the-art up to 3.9× in energy efficiency.

international solid-state circuits conference | 2017

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI

Bert Moons; Roel Uytterhoeven; Wim Dehaene; Marian Verhelst

ConvNets, or Convolutional Neural Networks (CNN), are state-of-the-art classification algorithms, achieving near-human performance in visual recognition [1]. New trends such as augmented reality demand always-on visual processing in wearable devices. Yet, advanced ConvNets achieving high recognition rates are too expensive in terms of energy as they require substantial data movement and billions of convolution computations. Today, state-of-the-art mobile GPUs and ConvNet accelerator ASICs [2][3] only demonstrate energy-efficiencies of 10s to several 100s GOPS/W, which is one order of magnitude below requirements for always-on applications. This paper introduces the concept of hierarchical recognition processing, combined with the Envision platform: an energy-scalable ConvNet processor achieving efficiencies up to 10TOPS/W, while maintaining recognition rate and throughput. Envision hereby enables always-on visual recognition in wearable devices.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2014

Energy-Efficiency and Accuracy of Stochastic Computing Circuits in Emerging Technologies

Bert Moons; Marian Verhelst

The continued scaling of feature sizes in integrated circuit technology leads to more uncertainty and unreliability in circuit behavior. Maintaining the paradigm of deterministic Boolean computing therefore becomes increasingly challenging. Stochastic computing (SC) processes digital data in the form of long pseudo-random bit-streams denoting probabilities and is therefore less vulnerable to uncertainty. When transient circuit variations are present, SC greatly outperforms classical binary implementations. Under these circumstances, it is impossible for binary systems to achieve arbitrarily low error rates, while SC can still trade-off precision for energy by using longer bit-streams. This makes the technique a valuable alternative to binary logic in emerging technologies with high inherent transient uncertainty. This paper assesses the feasibility of multi-stage SC and discusses energy and accuracy considerations in SC design. First, the basics of SC-circuit design are discussed. Second, we investigate three different sources of noise or uncertainty and assess their impact on SC accuracy. Third, we propose a methodological design strategy to evaluate the accuracy of general, multi-stage SC systems. The validity of this new approach is illustrated through the design of a 1D-DCT stochastic circuit, as part of a JPEG compression accelerator. Our analysis shows multi-stage stochastic computing requires very long word lengths to achieve high accuracy, resulting in low energy efficiency. Exploiting stochastic computings transient error tolerance in emerging technologies will thus have a high energy cost.

workshop on applications of computer vision | 2016

Energy-efficient ConvNets through approximate computing

Bert Moons; Bert De Brabandere; Luc Van Gool; Marian Verhelst

Recently convolutional neural networks (ConvNets) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very computation and memory intensive. In order to be able to embed ConvNet-based classification into wearable platforms and embedded systems such as smartphones or ubiquitous electronics for the internet-of-things, their energy consumption should be reduced drastically. This paper proposes methods based on approximate computing to reduce energy consumption in state-of-the-art ConvNet accelerators. By combining techniques both at the system- and circuit level, we can gain energy in the systems arithmetic: up to 30× without losing classification accuracy and more than 100× at 99% classification accuracy, compared to the commonly used 16-bit fixed point number format.

international symposium on low power electronics and design | 2015

DVAS: Dynamic Voltage Accuracy Scaling for increased energy-efficiency in approximate computing

Bert Moons; Marian Verhelst

A wide variety of existing and emerging applications in recognition, mining and synthesis and machine-to-human interactions tolerate small errors or deviations in their computational results. Digital systems can exploit this error tolerance to increase their energy efficiency, which is crucial in high performance wearable electronics and in emerging low power systems for the internet-of-things. A dynamic energy-accuracy trade-off brings an extra degree of freedom for system level power management. We introduce the concept of Dynamic Voltage Accuracy Scaling and illustrate its analogy to Dynamic Voltage Frequency Scaling. Dynamic Voltage Accuracy Scaling proves to have higher energy gains at most output qualities compared to other approximate computing alternatives. This work further generalizes the Dynamic Voltage Accuracy Scaling concept to pipelined structures and quantifies its energy overhead. Shallow pipelined multipliers with two to four dynamic accuracy modes can be supported with limited (<; 10-20%) overhead, resulting in significant energy savings of up to 90% or more for less than 2% mean error. DVAS is finally applied to a JPEG image processing application, demonstrating large system level gains without noticeable impact to user or application.

IEEE Journal of Solid-state Circuits | 2017

An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS

Bert Moons; Marian Verhelst

A precision-scalable processor for low-power ConvNets or convolutional neural networks is implemented in a 40-nm CMOS technology. To minimize energy consumption while maintaining throughput, this paper is the first to implement dynamic precision and energy scaling and exploit the sparsity of convolutions in a dedicated processor architecture. The processor’s 256 parallel processing units achieve a peak 102 GOPS running at 204 MHz and 1.1 V. It is fully C-programmable through a custom generated compiler and consumes 25–287 mW at 204 MHz and a scaling efficiency between 0.3 and 2.7 effective TOPS/W. It achieves 47 frames/s on the convolutional layers of the AlexNet benchmark, consuming only 76 mW. This system hereby outperforms the state-of-the-art up to five times in energy efficiency.

design, automation, and test in europe | 2017

DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling

Bert Moons; Roel Uytterhoeven; Wim Dehaene; Marian Verhelst

Several applications in machine learning and machine-to-human interactions tolerate small deviations in their computations. Digital systems can exploit this fault-tolerance to increase their energy-efficiency, which is crucial in embedded applications. Hence, this paper introduces a new means of Approximate Computing: Dynamic-Voltage-Accuracy-Frequency-Scaling (DVAFS), a circuit-level technique enabling a dynamic trade-off of energy versus computational accuracy that outperforms other Approximate Computing techniques. The usage and applicability of DVAFS is illustrated in the context of Deep Neural Networks, the current state-of-the-art in advanced recognition. These networks are typically executed on CPUs or GPUs due to their high computational complexity, making their deployment on battery-constrained platforms only possible through wireless connections with the cloud. This work shows how deep learning can be brought to IoT devices by running every layer of the network at its optimal computational accuracy. Finally, we demonstrate a DVAFS processor for Convolutional Neural Networks, achieving efficiencies of multiple TOPS/W.

IEEE Solid-state Circuits Magazine | 2017

Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices

Marian Verhelst; Bert Moons

Deep learning has recently become im-mensely popular for image recognition, as well as for other recognition and pattern matching tasks in, e.g., speech processing, natural language processing, and so forth. The online evaluation of deep neural networks, however, comes with significant computational complexity, making it, until recently, feasible only on power-hungry server platforms in the cloud. In recent years, we see an emerging trend toward embedded processing of deep learning networks in edge devices: mobiles, wearables, and Internet of Things (IoT) nodes. This would enable us to analyze data locally in real time, which is not only favorable in terms of latency but also mitigates privacy issues. Yet evaluating the powerful but large deep neural networks with power budgets in the milliwatt or even microwatt range requires a significant improvement in processing energy efficiency.

international new circuits and systems conference | 2014

Energy and accuracy in multi-stage stochastic computing

Bert Moons; Marian Verhelst

The continued scaling of feature sizes in integrated circuit technology leads to more uncertainty and unreliability in circuit behaviour. Maintaining the paradigm of deterministic Boolean computing therefore becomes increasingly challenging. Stochastic computing (SC) processes digital data in the form of pseudo-random bit-streams denoting probabilities and is therefore less vulnerable to uncertainty. When transient circuit variations are present, SC greatly outperforms classical binary implementations. Previous work has mainly been on SC circuits with only a few stages. This paper assesses the feasibility of multistage SC. First the reasons for decreasing accuracy in these types of circuits are discussed. Second, we introduce a straightforward method to evaluate the accuracy of general SC systems. Third, the validity of this new approach is illustrated through the design of a 1D-DCT stochastic circuit, as part of a JPEG compression accelerator. Last, we couple the results of our analysis to low-level energy considerations.

Archive | 2019

Conclusions, Contributions, and Future Work

Bert Moons; Daniel Bankman; Marian Verhelst

This dissertation has focused on techniques to minimize the energy consumption of deep learning algorithms for embedded applications on battery-constrained wearable edge devices. Although SotA in many typical machine-learning tasks, deep learning algorithms are also very costly in terms of energy consumption, due to their large amount of required computations and huge model sizes. Because of this, deep learning applications on battery-constrained wearables have only been possible through wireless connections with a resourceful cloud. This setup has several drawbacks. First, there are privacy concerns. This setup requires users to share their raw data—images, video, locations, and speech—with a remote system. As most users are not willing to share all of this, large-scale applications cannot yet be developed. Second, the cloud-setup requires users to be connected all the time, which is unfeasible given current cellular coverage. Furthermore, real-time applications require low latency connections, which cannot be guaranteed using the current communication infrastructure. Finally, this wireless connection is very inefficient—requiring too much energy per transferred bit for real-time data transfer on energy constrained platforms. All these issues—privacy, latency/connectivity, and costly wireless connections—can be resolved by moving towards computing on the edge.

Explore More