David Kammler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Kammler is active.

Explore More

Publication

Featured researches published by David Kammler.

cryptographic hardware and embedded systems | 2009

Designing an ASIP for Cryptographic Pairings over Barreto-Naehrig Curves

David Kammler; Diandian Zhang; Peter Schwabe; Hanno Scharwaechter; Markus Langenberg; Dominik Auras; Gerd Ascheid; Rudolf Mathar

This paper presents a design-space exploration of an application-specific instruction-set processor (ASIP) for the computation of various cryptographic pairings over Barreto-Naehrig curves (BN curves). Cryptographic pairings are based on elliptic curves over finite fields--in the case of BN curves a field

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2007

Application-Specific Instruction-Set Processor for Retinex-Like Image and Video Processing

Sergio Saponara; Luca Fanucci; Stefano Marsi; Giovanni Ramponi; David Kammler; Ernst Martin Witte

\mathbb{F}_p

IEEE Transactions on Very Large Scale Integration Systems | 2008

A Design Flow for Architecture Exploration and Implementation of Partially Reconfigurable Processors

Kingshuk Karuri; Anupam Chattopadhyay; Xiaolin Chen; David Kammler; Ling Hao; Rainer Leupers; Heinrich Meyr; Gerd Ascheid

of large prime order p . Efficient arithmetic in these fields is crucial for fast computation of pairings. Moreover, computation of cryptographic pairings is much more complex than elliptic-curve cryptography (ECC) in general. Therefore, we facilitate programming of the proposed ASIP by providing a C compiler. In order to speed up

secure software integration and reliability improvement | 2009

A Fast and Flexible Platform for Fault Injection and Evaluation in Verilog-Based Simulations

David Kammler; Junqing Guan; Gerd Ascheid; Rainer Leupers; Heinrich Meyr

\mathbb{F}_p

personal, indoor and mobile radio communications | 2009

Combining orthogonalized partial metrics: Efficient enumeration for soft-input sphere decoder

Chun-Hao Liao; I-Wei Lai; Konstantinos Nikitopoulos; Filippo Borlenghi; David Kammler; Martin Witte; Dan Zhang; Tzi-Dar Chiueh; Gerd Ascheid; Heinrich Meyr

arithmetic, a RISC core is extended with additional scalable functional units. Because the resulting speedup can be limited by the memory throughput, utilization of multiple data-memory banks is proposed. The presented design needs 15.8 ms for the computation of the Optimal-Ate pairing over a 256-bit BN curve at 338 MHz implemented with a 130 nm standard cell library. The processor core consumes 97 kGates making it suitable for the use in embedded systems.

rapid system prototyping | 2005

Optimization techniques for ADL-driven RTL processor synthesis

Oliver Schliebusch; Anupam Chattopadhyay; Ernst Martin Witte; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr

This brief presents an application-specific instruction-set processor (ASIP) for real-time Retinex image and video filtering. Design optimizations are addressed at algorithmic and architectural levels, the latter including a dedicated memory structure, an adapted pipeline, bypasses, a custom address generator and special looping structures. Synthesized in CMOS technology, the ASIP stands for its better energy-flexibility tradeoff versus reference ASIC and digital signal processing Retinex implementations.

design, automation, and test in europe | 2007

Design space exploration of partially re-configurable embedded processors

Anupam Chattopadhyay; W. Ahmed; K. Karari; David Kammler; Rainer Leupers; Gerd Ascheid; Heinrich Meyr

During the last years, the growing application complexity, design, and mask costs have compelled embedded system designers to increasingly consider partially reconfigurable application-specific instruction set processors (rASIPs) which combine a programmable base processor with a reconfigurable fabric. Although such processors promise to deliver excellent balance between performance and flexibility, their design remains a challenging task. The key to the successful design of a rASIP is combined architecture exploration of all the three major components: the programmable core, the reconfigurable fabric, and the interfaces between these two. This work presents a design flow that supports fast architecture exploration for rASIPs. The design flow is centered around a unified description of an entire rASIP in an architecture description language (ADL). This ADL description facilitates consistent modeling and exploration of all three components of a rASIP through automatic generation of the software tools (compiler tool chain and instruction set simulator) and the RTL hardware model. The generated software tools and the RTL model can be used either for final implementation of the rASIP or can serve as a preoptimized starting point for implementation that can be hand optimized afterward. The design flow is further enhanced by a number of automatic application analysis tools, including a fine-grained application profiler, an instruction set extension (ISE) generator, and a data path mapper for coarse grained reconfigurable architectures (CGRAs). We present some case studies on embedded benchmarks to show how the design space exploration process helps to efficiently design an application domain specific rASIP.

asia and south pacific design automation conference | 2005

A framework for automated and optimized ASIP implementation supporting multiple hardware description languages

Oliver Schliebusch; Anupam Chattopadhyay; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; Tim Kogel

This paper presents a complete framework for Verilog-based fault injection and evaluation. In contrast to existing approaches, the proposed solution is the first one based on the Verilog programming interface (VPI). Due to standardization of the VPI, the framework is—in contrast to simulator command based techniques—independent from the used simulator. Additionally, it does not require recompilation for different fault injection experiments like techniques modifying the Verilog code for fault injection. The feasibility of the VPI-based approach is shown in a case study.

military communications conference | 2009

Efficient and portable SDR waveform development: The Nucleus concept

Venkatesh Ramakrishnan; Ernst Martin Witte; Torsten Kempf; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; Marc Adrat; Markus Antweiler

Using the Schnorr-Euchner (SE) order for soft-input sphere decoders is inefficient for implementation, because it requires exhaustive calculation and sorting of partial metrics of all constellation points. Instead, low-complexity methods can be applied by separating the partial metric into channel information and a priori information and solely enumerating based on one of them. With such an orthogonalization, this paper presents an algorithm that effectively combines these two enumerations to deliver an order close to the SE one. Mathematical analyses and simulation results demonstrate that this is the first algorithm allowing for a low-complexity implementation with optimal error rate performance for any number of iterations.

field-programmable custom computing machines | 2012

FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection Using Reconfigurable ASIP

Xiaolin Chen; Andreas Minwegen; Yahia Hassan; David Kammler; Shuai Li; Torsten Kempf; Anupam Chattopadhyay; Gerd Ascheid

Nowadays, architecture description languages (ADLs) are becoming popular for speeding up the development of complex SoC design, by performing design space exploration at a higher level of abstraction. This increase in the abstraction level traditionally comes at the cost of low performance of the final application specific instruction-set processor (ASIP) implementation, which is generated automatically from the ADL. There is a pressing need for novel optimization techniques for high level synthesis from ADLs, to compensate for this loss of performance. Two important aspects of these optimizations are the efficient usage of available structural information in the high level architecture descriptions and prudent pruning of overhead, introduced by mapping from ADL to register transfer level (RTL). In this paper, we present two high level optimization techniques, path sharing and decision minimization. These optimization techniques are shown to be of lower complexity, by at least two orders, compared to similar optimization during gate-level synthesis. The optimizations are tested for a RISC architecture, a VLIW architecture and two industrial embedded processors, Motorola M68HC11 and Infineon ICORE. The results indicate a significant improvement in overall performance.

Explore More