Lukas Cavigelli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lukas Cavigelli is active.

Explore More

Publication

Featured researches published by Lukas Cavigelli.

ieee computer society annual symposium on vlsi | 2016

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Renzo Andri; Lukas Cavigelli; Davide Rossi; Luca Benini

Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. The required computational effort of CNNs today requires power-hungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) and accelerators for System-On-Chip (SoC) integration have achieved very promising results. Unfortunately, even these highly optimized engines are still above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. On the algorithmic side, highly competitive classification accuracy canbe achieved by properly training CNNs with binary weights. This novel algorithm approach brings major optimization opportunities in the arithmetic core by removing the need for the expensive multiplications as well as in the weight storage and I/O costs. In this work, we present a HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a corearea of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V. Our accelerator outperforms state-of-the-art performance in terms of ASIC energy efficiency as well as area efficiency with 61.2 TOp/s/W and 1135 GOp/s/MGE, respectively.

design automation conference | 2015

Accelerating real-time embedded scene labeling with convolutional networks

Lukas Cavigelli; Michele Magno; Luca Benini

Today there is a clear trend towards deploying advanced computer vision (CV) systems in a growing number of application scenarios with strong real-time and power constraints. Brain-inspired algorithms capable of achieving record-breaking results combined with embedded vision systems are the best candidate for the future of CV and video systems due to their flexibility and high accuracy in the area of image understanding. In this paper, we present an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms. We show that our algorithm can achieve up to 96GOp/s, running on the Nvidia Tegra K1 embedded SoC. We present experimental results, compare them to the state-of-the-art, and demonstrate that for scene labeling our approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2018

YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration

Renzo Andri; Lukas Cavigelli; Davide Rossi; Luca Benini

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today’s CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultralow power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/−1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this paper, we present an accelerator optimized for binary-weight CNNs that achieves 1.5 TOp/s at 1.2 V on a core area of only 1.33 million gate equivalent (MGE) or 1.9 mm 2 and with a power dissipation of 895

IEEE Transactions on Circuits and Systems for Video Technology | 2017

Origami: A 803-GOp/s/W Convolutional Network Accelerator

Lukas Cavigelli; Luca Benini

\mu

international symposium on neural networks | 2017

CAS-CNN: A deep convolutional neural network for image compression artifact suppression

Lukas Cavigelli; Pascal Alexander Hager; Luca Benini

W in UMC 65-nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1.1 TOp/s/[email protected] V, respectively.

european signal processing conference | 2017

Deep structured features for semantic segmentation

Michael Tschannen; Lukas Cavigelli; Fabian Mentzer; Thomas Wiatowski; Luca Benini

An ever-increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow, and super resolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile computer vision systems. We present a new architecture, design, and implementation, as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power, area, and I/O efficiency. The manufactured device provides up to 196 GOp/s on 3.09

arXiv: Computer Vision and Pattern Recognition | 2016

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Lukas Cavigelli; Dominic Bernath; Michele Magno; Luca Benini

\text {mm}^{2}

arXiv: Computer Vision and Pattern Recognition | 2018

XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks

Andrawes Al Bahou; Geethan Karunaratne; Renzo Andri; Lukas Cavigelli; Luca Benini

of silicon in UMC 65-nm technology and can achieve a power efficiency of 803 GOp/s/W. The massively reduced bandwidth requirements make it the first architecture scalable to TOp/s performance.

static analysis symposium | 2017

StoneNode: A low-power sensor device for induced rockfall experiments

Pascal A. Niklaus; Thomas Birchler; Tim Aebi; Michael Schaffner; Lukas Cavigelli; Andrin Caviezel; Michele Magno; Luca Benini

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks. We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image — a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

international conference on distributed smart cameras | 2017

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Lukas Cavigelli; Philippe Degen; Luca Benini

We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms. Specifically, our architecture combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stages i) and ii) are completely pre-specified, only the linear classifier is learned from data. We apply the proposed architecture to outdoor scene and aerial image semantic segmentation and show that the accuracy of our architecture is competitive with conventional pixel classification CNNs. Furthermore, we demonstrate that the proposed architecture is data efficient in the sense of matching the accuracy of pixel classification CNNs when trained on a much smaller data set.

Explore More