Hubert Kaeslin
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hubert Kaeslin.
custom integrated circuits conference | 1994
Reto Zimmermann; Andreas Curiger; H. Bonnenberg; Hubert Kaeslin; Norbert Felber; Wolfgang Fichtner
A VLSI implementation of the International Data Encryption Algorithm is presented. Security considerations led to novel system concepts in chip design including protection of sensitive information and on-line failure detection capabilities. BIST was instrumental for reconciling contradicting requirements of VLSI testability and cryptographic security. The VLSI chip implements data encryption and decryption in a single hardware unit. All important standardized modes of operation of block ciphers, such as ECB, CBC, CFB, OFB, and MAC, are supported. In addition, new modes are proposed and implemented to fully exploit the algorithms inherent parallelism. With a system clock frequency of 25 MHz the device permits a data conversion rate of more than 177 Mb/s. Therefore, the chip can be applied to on-line encryption in high-speed networking protocols like ATM or FDDI. >
international conference on asic | 1999
J. Muttersbach; Thomas Villiger; Hubert Kaeslin; Norbert Felber; Wolfgang Fichtner
A novel methodology for realizing Globally-Asynchronous Locally-Synchronous (GALS) architectures is reported. We developed a library of predesigned modules that facilitate the assembly of independently clocked modules to on-chip systems. The components of this library establish high-performance data exchange channels which are instrumental in constructing flexible architectures. The validity of our concept is proven by applying it to an ASIC design with real-world complexity.
asia pacific conference on circuits and systems | 2008
Peter Luethi; Christoph Studer; Sebastian Duetsch; Eugen Zgraggen; Hubert Kaeslin; Norbert Felber; Wolfgang Fichtner
The QR decomposition (QRD) is an important prerequisite for many different detection algorithms in multiple-input multiple-output (MIMO) wireless communication systems. This paper presents an optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix. Integrated in 0.18 mum CMOS technology, the proposed VLSI architecture processes up to 1.56 million complex-valued 4times4-dimensional matrices per second. The implementation results of this work are extensively compared to the Givens rotation (GR)-based QRD implementation of Luethi et al., ISCAS 2007. In order to ensure a fair comparison, both QRD circuits have been integrated in the same IC manufacturing technology, with equal functionality, and the same numeric precision. The comparison of the implementation results clearly showed superiority of the GR-based VLSI solution in terms of area, processing cycles, and throughput.
midwest symposium on circuits and systems | 2003
Michael Kuhn; Stephan Moser; Oliver Isler; Frank K. Gürkaynak; Andreas Burg; Norbert Felber; Hubert Kaeslin; Wolfgang Fichtner
This paper presents a fast and area-efficient implementation of a real-time stereo vision algorithm for spatial depth mapping. The design combines two well-known area-based approaches to stereo matching and includes an occlusion detection method. Hardware efficiency is achieved by storing only partial images on-chip, avoiding full-sized frame buffers. A low-latency dataflow-oriented structure makes it possible to process 256/spl times/192 pixel. Input streams with a rate in excess of 50 frames per second, amounting to more than 54 million pixel /spl times/ disparity measurements per second (PDS) (for a 25-pixel disparity range), or roughly 18 GOPS. The design has been integrated in a 0.25 /spl mu/m standard CMOS technology and occupies an area of less than 3 mm/sup 2/.
IEEE Journal of Solid-state Circuits | 1991
Andreas Curiger; H. Bonnenberg; Hubert Kaeslin
The authors describe VLSI architectures for multiplication modulo p, where p is a Fermat prime. With increasing p, ROM-based table lookup methods become unattractive for integration due to excessive memory requirements. Three novel methods are discussed and compared to ROM implementations with regard to their speed and complexity characteristics. The first method is based on an (n+1)*(n+1)-bit array multiplier, the second on modulo p carry-save addition, and the third on modulo (p-1) carry-save addition using a bit-pair recoding scheme. All allow very high throughputs in pipelined implementations. While the former is very convenient for CAD (computer-aided design) environments providing a pipelined multiplier macrocell, the latter two are well-suited to full-custom implementation. >
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2012
Patrick Maechler; Christoph Studer; David E. Bellasi; Arian Maleki; Andreas Burg; Norbert Felber; Hubert Kaeslin; Richard G. Baraniuk
Sparse signal recovery finds use in a variety of practical applications, such as signal and image restoration and the recovery of signals acquired by compressive sensing. In this paper, we present two generic very-large-scale integration (VLSI) architectures that implement the approximate message passing (AMP) algorithm for sparse signal recovery. The first architecture, referred to as AMP-M, employs parallel multiply-accumulate units and is suitable for recovery problems based on unstructured (e.g., random) matrices. The second architecture, referred to as AMP-T, takes advantage of fast linear transforms, which arise in many real-world applications. To demonstrate the effectiveness of both architectures, we present corresponding VLSI and field-programmable gate array implementation results for an audio restoration application. We show that AMP-T is superior to AMP-M with respect to silicon area, throughput, and power consumption, whereas AMP-M offers more flexibility.
european solid-state circuits conference | 2004
N. Pramstaller; Frank K. Gürkaynak; Simon Haene; Hubert Kaeslin; Norbert Felber; Wolfgang Fichtner
Differential power analysis (DPA) implies measuring the supply current of a cipher-circuit in an attempt to uncover part of a cipher-key. Cryptographic security gets compromised if the current waveforms so obtained correlate with those from a hypothetical power model of the circuit. Such correlations can be minimized by masking datapath operations with random bits in a reversible way. We analyze such countermeasures and discuss how they perform and how well they lend themselves to being incorporated into dedicated hardware implementations of the advanced encryption standard (AES) block cipher. Our favorite masking scheme entails a performance penalty of some 40-50%. We also present a VLSI design that can serve for practical experiments with DPA.
ieee international symposium on asynchronous circuits and systems | 2006
Frank K. Gürkaynak; Stephan Oetiker; Hubert Kaeslin; Norbert Felber; Wolfgang Fichtner
The Integrated Systems Laboratory (IIS) of ETH Zurich (Swiss Federal Institute of Technology) has been active in globally-asynchronous locally-synchronous (GALS) research since 1998. During this time, a number of GALS circuits have been fabricated and tested successfully on silicon. From a hardware designers point of view, this article summarizes the evolution from proof of concept designs over multi-point interconnects to applications that specifically take advantage of GALS operation to improve cryptographic security. In spite of the fact that they fail to address numerous idiosyncrasies of GALS (such as good partitioning into synchronous islands, port controller design, pausable clock generators, design for test, etc.), hierarchical design flows have been found to form a workable basis. What prevents GALS from gaining a wider acceptance mainly is the initial effort required to come up with a design flow that is efficient and dependable
IEEE Journal of Solid-state Circuits | 1989
M. Biver; Hubert Kaeslin; C. Tommasini
An area-efficient in-place computation scheme for updating path metrics in solid-state Viterbi decoders is proposed. The permutation of items in memory, resulting as a by-product from in-place updating, is formally shown to be cyclic address rotation, which can be compensated for with almost no extra hardware. >
international conference on electronics, circuits, and systems | 2012
Lin Bai; Patrick Maechler; Michael Muehlberghuber; Hubert Kaeslin
Compressed sensing allows to reconstruct sparse signals sampled at sub-Nyquist rates. However, reconstruction of the original signal requires high computational effort, even for problems of moderate size. Especially for applications with real-time requirements, software realizations are not fast enough. We therefore present generic high-speed FPGA implementations of two fast reconstruction algorithms: orthogonal matching pursuit (OMP) and approximate message passing (AMP). Our implementations also support less sparse signals, which makes them suitable for, e.g., image reconstruction. The two implementations are optimized for highly parallel processing on FPGAs and have similar hardware structures, which allows comparisons in terms of resource usage and performance.