Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Theo Kluter is active.

Publication


Featured researches published by Theo Kluter.


IEEE Journal of Solid-state Circuits | 2008

A 128

Cristiano Niclass; Claudio Favi; Theo Kluter; Marek Gersbach; Edoardo Charbon

An imager for time-resolved optical sensing was fabricated in CMOS technology. The sensor comprises an array of 128times128 single-photon pixels, a bank of 32 time-to-digital-converters, and a 7.68 Gbps readout system. Thanks to the outstanding timing precision of single-photon avalanche diodes and the optimized measurement circuitry, a typical resolution of 97 ps was achieved within a range of 100 ns. To the best of our knowledge, this imager is the first fully integrated system for photon time-of-arrival evaluation. Applications include 3-D imaging, optical rangefinding, fast fluorescence lifetime imaging, imaging of extremely fast phenomena, and, more generally, imaging based on time-correlated single photon counting. When operated as an optical rangefinder, this design has enabled us to reconstruct 3-D scenes with milimetric precisions in extremely low signal exposure. A laser source was used to illuminate the scene up to 3.75 m with an average power of 1 mW, a field-of-view of 5deg and under 150 lux of constant background light. Accurate distance measurements were repeatedly achieved based on a short integration time of 50 ms even when signal photon count rates as low as a few hundred photons per second were available.


IEEE Journal of Solid-state Circuits | 2009

\times

Cristiano Niclass; Claudio Favi; Theo Kluter; Frédéric Monnier; Edoardo Charbon

Phase and intensity of light are detected simultaneously using a fully digital imaging technique: single-photon synchronous detection. This approach has been theoretically and experimentally investigated in this paper. We designed a fully integrated camera implementing the new technique that was fabricated in a 0.35 mum CMOS technology. The camera demonstrator features a modulated light source, so as to independently capture the time-of-flight of the photons reflected by a target, thereby reconstructing a depth map of the scene. The camera also enables image enhancement of 2D scenes when used in passive mode, where differential maps of the reflection patterns are the basis for advanced image processing algorithms. Extensive testing has shown the suitability of the technique and confirmed phase accuracy predictions. Experimental results showed that the proposed rangefinder method is effective. Distance measurement performance was characterized with a maximum nonlinearity error lower than 12 cm within a range of a few meters. In the same range, the maximum repeatability error was 3.8 cm.


international solid-state circuits conference | 2008

128 Single-Photon Image Sensor With Column-Level 10-Bit Time-to-Digital Converter Array

Cristiano Niclass; Claudio Favi; Theo Kluter; Marek Gersbach; Edoardo Charbon

We present an array of 128times128 highly miniaturized SPAD (single-photon avalanche diodes) pixels with a bank of 32 time-to-digital converters (TDCs) on chip. A decoder selects a 128-pixel row. Every group of 4 pixels in the row shares a TDC based on an event-driven mechanism. As a result, row-wise parallel acquisition is obtained with a low number of TDCs. Because of the outstanding timing precision of SPADs and an optimized TDC design, a typical resolution of 97 ps is achieved within a range of 100 ns (10 b) at a maximum rate of 10 MS/s per TDC. The TDC bank exhibits a DNL of 0.08LSB and an INL of 1.89LSB.


symposium on computer arithmetic | 2005

Single-Photon Synchronous Detection

Bart R. Zeydel; Theo Kluter; Vojin G. Oklobdzija

Efficient adder design requires proper selection of a recurrence algorithm and its realization. Each of the algorithms: Weinbergers, Lings and Dorans were analyzed for its flexibility in representation and suitability for realization in CMOS. We describe general techniques for developing efficient realizations based on CMOS technology constraints when using Lings algorithm. From these techniques we propose two high-performance realizations that achieve 1 FO4 delay improvement at the same energy and 50% energy reduction at the same delay than existing Ling and Weinberger designs.


design automation conference | 2009

A 128×128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution

Theo Kluter; Philip Brisk; Paolo Ienne; Edoardo Charbon

This paper introduces way stealing, a simple architectural modification to a cache-based processor to increase data bandwidth to and from application-specific instruction set extensions (ISEs). Way stealing provides more bandwidth to the ISE-logic than the register file alone and does not require expensive coherence protocols, as it does not add memory elements to the processor. When enhanced with way stealing, ISE identification flows detect more opportunities for acceleration than prior methods; consequently, way stealing can accelerate applications to up to 3.7times, whilst reducing the memory sub-system energy consumption by up to 67%, despite data-cache related restrictions.


international conference on hardware/software codesign and system synthesis | 2008

Efficient mapping of addition recurrence algorithms in CMOS

Theo Kluter; Philip Brisk; Paolo Ienne; Edoardo Charbon

Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.


european solid-state circuits conference | 2008

Way Stealing: cache-assisted automatic instruction set extensions

Cristiano Niclass; Claudio Favi; Theo Kluter; Frédéric Monnier; Edoardo Charbon

A novel imaging technique is proposed for fully digital detection of phase and intensity of light. A fully integrated camera implementing the new technique was fabricated in a 0.35 mum CMOS technology. When coupled to a modulated light source, the camera can be used to accurately and rapidly reconstruct a 3D scene by evaluating the time-of-flight of the light reflected by a target. In passive mode, it allows building differential phase maps of reflection patterns for image enhancement purposes. Tests show the suitability of the technique and confirm phase accuracy predictions.


high performance embedded architectures and compilers | 2010

Speculative DMA for architecturally visible storage in instruction set extensions

Theo Kluter; Samuel Burri; Philip Brisk; Edoardo Charbon; Paolo Ienne

Customizable processors augmented with application-specific Instruction Set Extensions (ISEs) have begun to gain traction in recent years. The most effective ISEs include Architecturally Visible Storage (AVS), compiler-controlled memories accessible exclusively to the ISEs. Unfortunately, the usage of AVS memories creates a coherence problem with the data cache. A multiprocessor coherence protocol can solve the problem, however, this is an expensive solution when applied in a uniprocessor context. Instead, we can solve the problem by modifying the cache controller so that the AVS memories function as extra ways of the cache with respect to coherence, but are not generally accessible as extra ways for use under normal software execution. This solution, which we call Virtual Ways is less costly than a hardware coherence protocol, and eliminate coherence messages from the system bus, which improves energy consumption. Moreover, eliminating these messages makes Virtual Ways significantly more robust to performance degradation when there is a significant disparity in clock frequency between the processor and main memory.


symposium on application specific processors | 2009

Single-photon synchronous detection

Marcela Zuluaga; Theo Kluter; Philip Brisk; Nigel P. Topham; Paolo Ienne

Multi-cycle Instruction set extensions (ISE) can be pipelined in order to increase their throughput; however, typical program traces seldom contain consecutive calls to the same ISE that would allow this temporal parallelism. Often, there are intermittent calls to branch instructions, at a minimum, that prevent the pipelined execution of subsequent calls to the same ISE within a loop. What is needed is ISEs that cover an entire loop body, which can create a stream of repeated calls to the same ISE during program execution; this, in turn, permits the use of hardware pipelining. To address this concern, we introduce a new type of ISE that borrows ideas from zero-overhead loop instructions to permit pipelined execution of loops. To further expose instruction-level parallelism, the ISE supports loops whose bodies form hyperblocks, which are regions of program control flow that have multiple exits (including loop iterations and break points within loops). These ISEs broaden the scope of instruction-level parallelism and obtain higher speed ups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches.


international conference on embedded computer systems architectures modeling and simulation | 2012

Virtual ways: efficient coherence for architecturally visible storage in automatic instruction set extensions

Aanjhan Ranganathan; Ali Galip Bayrak; Theo Kluter; Philip Brisk; Edoardo Charbon; Paolo Ienne

We introduce a counting stream register snoop filter, which improves the performance of existing snoop filters based on stream registers. Over time, this class of snoop filters loses the ability to filter memory addresses that have been loaded, and then evicted, from the caches that are filtered; they include cache wrap detection logic, which resets the filter whenever the contents of the cache have been completely replaced. The counting stream register snoop filter introduced here replaces the cache wrap detection logic with a direct-mapped update unit and augments each stream register with a counter, which acts as a validity checker; loading new data into the cache increments the counter, while replacements, snoopy invalidations, and evictions decrement it. A cache wrap is detected whenever the counter reaches zero. Our experimental evaluation shows that the counting stream register snoop filter architecture improves the accuracy compared to traditional stream register snoop filters for representative embedded workloads.

Collaboration


Dive into the Theo Kluter's collaboration.

Top Co-Authors

Avatar

Edoardo Charbon

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Paolo Ienne

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Philip Brisk

University of California

View shared research outputs
Top Co-Authors

Avatar

Claudio Favi

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar

Marek Gersbach

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ali Galip Bayrak

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Samuel Burri

École Polytechnique Fédérale de Lausanne

View shared research outputs
Researchain Logo
Decentralizing Knowledge