Pascal Alexander Hager

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pascal Alexander Hager is active.

Explore More

Publication

Featured researches published by Pascal Alexander Hager.

design, automation, and test in europe | 2015

Tackling the bottleneck of delay tables in 3D ultrasound imaging

A. Ibrahim; Pascal Alexander Hager; Andrea Bartolini; Federico Angiolini; Marcel Arditi; Luca Benini; G. De Micheli

3D ultrasound imaging is quickly becoming a reference technique for high-quality, accurate, expressive diagnostic medical imaging. Unfortunately, its computation requirements are huge and, today, demand expensive, power-hungry, bulky processing resources. A key bottleneck is the receive beamforming operation, which requires the application of many permutations of fine-grained delays among the digitized received echoes. To apply these delays in the digital domain, in principle large tables (billions of coefficients) are needed, and the access bandwidth to these tables can reach multiple TB/s, meaning that their storage both on-chip and off-chip is impractical. However, smarter implementations of the delay generation function, including forgoing the tables altogether, are possible. In this paper we explore efficient strategies to compute the delay function that controls the reconstruction of the image, and present a feasibility analysis for an FPGA platform.

international symposium on neural networks | 2017

CAS-CNN: A deep convolutional neural network for image compression artifact suppression

Lukas Cavigelli; Pascal Alexander Hager; Luca Benini

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks. We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image — a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

biomedical circuits and systems conference | 2014

Assessing the area/power/performance tradeoffs for an integrated fully-digital, large-scale 3D-ultrasound beamformer

Pascal Alexander Hager; Pirmin Vogel; Andrea Bartolini; Luca Benini

High-frame-rate and high-resolution 3D medical ultrasound imaging imposes high requirements on the involved processing hardware. Several thousands of analog signals need to be processed in many steps to obtain a final image. Fully digital beamforming makes it possible to achieve high image quality coupled with extreme flexibility. Unfortunately, digital beamforming imposes staggering requirements on main memory bandwidth caused by the loading of off-chip stored beamforming delays. In this paper we present the first fully-digital integrated beamformer that is able to compute 269.3 M focal points (FP) per second from 10 000 receive channels, and which does not require off-chip main memory. This is enabled by our novel delay approximation circuit that exploits temporal correlation between subsequent computations and thereby allows to compute the delays for beamforming online. To estimate the area and power requirements, the complete system was designed and the beamformer core was evaluated for a 130 nm CMOS technology. The estimated complexity per channel is 37.2 kGE and the corresponding power dissipation was estimated with 48 mW.

IEEE Transactions on Very Large Scale Integration Systems | 2016

Ekho: A 30.3W, 10k-Channel Fully Digital Integrated 3-D Beamformer for Medical Ultrasound Imaging Achieving 298M Focal Points per Second

Pascal Alexander Hager; Andrea Bartolini; Luca Benini

3-D medical ultrasound imaging enables new diagnostic possibilities and modalities. In a computational process called beamforming, a 3-D volume is reconstructed from several thousands of analog signals. Todays systems rely on massive analog preprocessing to reduce the computational burden of the subsequent digital processing system. In this paper, we present a configurable beamformer (BF) architecture, which demonstrates for the first time that it is possible to implement the entire 3-D delay and sum beamforming fully digitally and on one single chip, without requiring the off-chip memories. We present a presilicon implementation of a single-chip BF in an advanced 28-nm silicon-on-insulator technology. The BF targets a fully sampled 10k element 8-MHz bandwidth transducer head and is able to produce 298.1M focal points (FPs) per second-enough to produce a high-resolution volume with 16.3MFP at 15 Hz. All delays are computed online and on-chip to eliminate the power-hungry external memories for delay storage. The final design (register-transfer-level and floorplan) has a complexity of 342M gate equivalents requiring 1.68cm2 of area. The core power is estimated to be 30.3 W, resulting in an unprecedented power efficiency of 98.4G beamforming operations per watt.

international symposium on circuits and systems | 2017

LightProbe: A 64-channel programmable ultrasound transducer head with an integrated front-end and a 26.4 Gb/s optical link

Pascal Alexander Hager; Christoph Risser; Peter-Karl Weber; Luca Benini

Medical ultrasound processing features two main components: A transducer head to generate the ultrasound wave and acquire the reflected signals and a processing system that will generate the final image. The connection between these two components is established using digital communication over a USB link for smaller mobile systems whereas large stationary systems operating with 4–16x more channels use analog signals over micro-coaxial cables to avoid link rates of 16–100 Gb/s. In this paper, we present LightProbe, a programmable ultrasound transducer head with an integrated 64-channel frontend and operating on an estimated 12 W worst-case power budget. LightProbe is the first transducer head equipped with a 26.4 Gb/s optical link. Moreover, it features a configurable FPGA that can be configured to pre-process the data on the transducer head and allows a flexible, inexpensive, light digital optical link that is immune to interference and can be tailored to fit a variety of devices from small mobile devices all the way to large stationary devices with high throughput requirements.

embedded systems for real time multimedia | 2016

Mobile Ultrasound Imaging on Heterogeneous Multi-Core Platforms

Andreas Kurth; Andreas Tretter; Pascal Alexander Hager; Sergio J. Sanabria; Orcun Goksel; Lothar Thiele; Luca Benini

Ultrasound imaging is one of the most important medical diagnostic methods. The bulkiness of state-of-the-art high-quality ultrasound devices, however, drastically limits their usability in important application scenarios. In this paper, we show how a portable medical ultrasound device can be built using many-core technology and programmable logic, combining low power consumption with high flexibility. We discuss a typical ultrasound image reconstruction algorithm and howit can be parallelized using a pipelined design that efficiently partitions theworkload among heterogeneous processing elements. A special focus lies on the limited memory resources and data bandwidth between components. To tackle both problems, we use floating windowbuffers and approximate computations, and we minimize lookup table sizes using on-the-fly calculations. We evaluate the design on the Adapteva Parallella platform, which contains a power-efficient 16-core Epiphany coprocessor and a Zynq SoC including a dual-core ARM A9 processor and programmable logic. Experimental results show that parallel beamforming of 128 input channels to a 288x128 pixel ultrasound image can be achieved on the Parallella at a rate of 5.3 frames per second consuming only 2watt of dynamic power.

ifip ieee international conference on very large scale integration | 2013

A Complete Real-Time Feature Extraction and Matching System Based on Semantic Kernels Binarized

Michael Schaffner; Pascal Alexander Hager; Lukas Cavigelli; Zhou Fang; Pierre Greisen; Frank K. Gürkaynak; Aljoscha Smolic; Hubert Kaeslin; Luca Benini

Feature extraction and matching is an important step in many current image and video processing algorithms. In this work, we designed and implemented an efficient feature extraction and matching system for sparse point correspondence search in stereo video. Our system is based on the recently proposed Semantic Kernels Binarized (SKB) algorithm, which showed superior performance with respect to other algorithms in our evaluation. The feature extraction stage has been prototyped in 180 nm technology and the complete system with two feature extraction pipelines (left and right view) together with the matching unit have been implemented on a Stratix IV FPGA where it delivers a performance of up to 42 frames per second on 720p video. Especially due to the high throughput of up to 25 k matched descriptors per frame, our system compares favourably with recent hardware implementations of similar algorithms.

design, automation, and test in europe | 2017

A scan-chain based state retention methodology for IoT processors operating on intermittent energy

Pascal Alexander Hager; Hamed Fatemi; J José Pineda de Gyvez; Luca Benini

Future IoT systems are tightly constraint by cost and size and will often be operated from an energy harvesters output. Since these batteryless systems operate on intermittent energy they have to be able to retain their state during the power outages in order to guarantee computation progress. Due to the lack of large energy buffers the state needs to be saved quickly using residual energy only. In related work, the state is retained in-place by replacing all flip-flops with state retentive flip-flops (SRFF), which are powered by auxiliary supplies for retention or incorporate non-volatile memory cells. However, these SRFFs increase the power consumption during active operation impairing the overall systems efficiency. In this paper, we present a scan-chain based state retention approach, where the state is moved to memory using only 4.5pJ/b. Since our approach does not introduce any power overhead, this energy cost pays off after an on-time of just 100us compared to state-of-the-art in-place solutions. Moreover, compared to a software mechanism, our approach requires 6.6x less energy to move the state and is 5.8x faster.

ifip ieee international conference on very large scale integration | 2013

A real-time 720p feature extraction core based on Semantic Kernels Binarized

Michael Schaffner; Pascal Alexander Hager; Lukas Cavigelli; Pierre Greisen; Frank K. Gürkaynak; Hubert Kaeslin

Several image processing applications rely on a sparse set of correspondence points between stereo images to discern a sparse but robust depth structure of the scene. There exist several methods to extract and match correspondences, but they are all computationally extensive and require significant memory bandwidths. In this paper, we describe an efficient ASIC core that is able to detect up to 25 k interest points in real time on a 720p video stream using the recently proposed Semantic Kernels Binarized (SKB) algorithm. To keep the memory bandwidth low, an optimized method to calculate the filter responses in the interest point detection stage has been devised. Instead of the 2D integral image we use a local 1D integral image combined with an incremental updating scheme to calculate the box filters. The ASIC core is manufactured in 180 nm technology and has a complexity of 254 kGE. It runs at 100 MHz, has a power dissipation of 184 mW and is the central processing block for a larger FPGA based stereo vision system that calculates a sparse depth map by locating corresponding interest points between left and right images in real time.

internaltional ultrasonics symposium | 2017

UltraLight: An ultrafast imaging platform based on a digital 64-channel ultrasound probe

Pascal Alexander Hager; Daniel Speicher; Christian Degel; Luca Benini

Digital ultrasound probes include the entire analog frontend in their enclosing and are equipped with a standard digital link. This enables to build very cost-effective ultrasound systems as they can be simply connected to a commodity device, such as a desktop PC, tablet or smartphone, running an ultrasound imaging application. Up to now, digital probes have been mainly demonstrated for low-end ultrasound applications and are currently limited to a small number of frontend channels (typically 16). In addition, the available bandwidth at the digital interface (less than 10 Gb/s) limits these devices only to basic imaging modalities. In this work, we present an imaging platform built with a digital 64-channel ultrasound probe that supports ultrafast imaging. Our digital probe, called LightProbe, utilizes a 64-element phased array without multiplexing and incorporates a 64-channel 100 Vpp TX/RX stage providing a sample rate up to 32.5 MS/s @ 12bit. The probe features an optical link interface achieving 25Gb/s on a standard fiber cable. A Xilinx Artix 7 FPGA is integrated in the probe to manage the optical interface and to provide a high-degree of configurabilty. To the best of our knowledge, this is the first digital probe capable of compounded plane wave imaging. We capture plane waves with peak and average rate of 4.9 kHz and 2kHz respectively, with a peak link load of 15.36 Gb/s, while consuming just 9.25 W.

Explore More