David Kammler
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Kammler.
cryptographic hardware and embedded systems | 2009
David Kammler; Diandian Zhang; Peter Schwabe; Hanno Scharwaechter; Markus Langenberg; Dominik Auras; Gerd Ascheid; Rudolf Mathar
This paper presents a design-space exploration of an application-specific instruction-set processor (ASIP) for the computation of various cryptographic pairings over Barreto-Naehrig curves (BN curves). Cryptographic pairings are based on elliptic curves over finite fields--in the case of BN curves a field
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2007
Sergio Saponara; Luca Fanucci; Stefano Marsi; Giovanni Ramponi; David Kammler; Ernst Martin Witte
\mathbb{F}_p
IEEE Transactions on Very Large Scale Integration Systems | 2008
Kingshuk Karuri; Anupam Chattopadhyay; Xiaolin Chen; David Kammler; Ling Hao; Rainer Leupers; Heinrich Meyr; Gerd Ascheid
of large prime order p . Efficient arithmetic in these fields is crucial for fast computation of pairings. Moreover, computation of cryptographic pairings is much more complex than elliptic-curve cryptography (ECC) in general. Therefore, we facilitate programming of the proposed ASIP by providing a C compiler. In order to speed up
secure software integration and reliability improvement | 2009
David Kammler; Junqing Guan; Gerd Ascheid; Rainer Leupers; Heinrich Meyr
\mathbb{F}_p
personal, indoor and mobile radio communications | 2009
Chun-Hao Liao; I-Wei Lai; Konstantinos Nikitopoulos; Filippo Borlenghi; David Kammler; Martin Witte; Dan Zhang; Tzi-Dar Chiueh; Gerd Ascheid; Heinrich Meyr
arithmetic, a RISC core is extended with additional scalable functional units. Because the resulting speedup can be limited by the memory throughput, utilization of multiple data-memory banks is proposed. The presented design needs 15.8 ms for the computation of the Optimal-Ate pairing over a 256-bit BN curve at 338 MHz implemented with a 130 nm standard cell library. The processor core consumes 97 kGates making it suitable for the use in embedded systems.
rapid system prototyping | 2005
Oliver Schliebusch; Anupam Chattopadhyay; Ernst Martin Witte; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr
This brief presents an application-specific instruction-set processor (ASIP) for real-time Retinex image and video filtering. Design optimizations are addressed at algorithmic and architectural levels, the latter including a dedicated memory structure, an adapted pipeline, bypasses, a custom address generator and special looping structures. Synthesized in CMOS technology, the ASIP stands for its better energy-flexibility tradeoff versus reference ASIC and digital signal processing Retinex implementations.
design, automation, and test in europe | 2007
Anupam Chattopadhyay; W. Ahmed; K. Karari; David Kammler; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
During the last years, the growing application complexity, design, and mask costs have compelled embedded system designers to increasingly consider partially reconfigurable application-specific instruction set processors (rASIPs) which combine a programmable base processor with a reconfigurable fabric. Although such processors promise to deliver excellent balance between performance and flexibility, their design remains a challenging task. The key to the successful design of a rASIP is combined architecture exploration of all the three major components: the programmable core, the reconfigurable fabric, and the interfaces between these two. This work presents a design flow that supports fast architecture exploration for rASIPs. The design flow is centered around a unified description of an entire rASIP in an architecture description language (ADL). This ADL description facilitates consistent modeling and exploration of all three components of a rASIP through automatic generation of the software tools (compiler tool chain and instruction set simulator) and the RTL hardware model. The generated software tools and the RTL model can be used either for final implementation of the rASIP or can serve as a preoptimized starting point for implementation that can be hand optimized afterward. The design flow is further enhanced by a number of automatic application analysis tools, including a fine-grained application profiler, an instruction set extension (ISE) generator, and a data path mapper for coarse grained reconfigurable architectures (CGRAs). We present some case studies on embedded benchmarks to show how the design space exploration process helps to efficiently design an application domain specific rASIP.
asia and south pacific design automation conference | 2005
Oliver Schliebusch; Anupam Chattopadhyay; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; Tim Kogel
This paper presents a complete framework for Verilog-based fault injection and evaluation. In contrast to existing approaches, the proposed solution is the first one based on the Verilog programming interface (VPI). Due to standardization of the VPI, the framework is—in contrast to simulator command based techniques—independent from the used simulator. Additionally, it does not require recompilation for different fault injection experiments like techniques modifying the Verilog code for fault injection. The feasibility of the VPI-based approach is shown in a case study.
military communications conference | 2009
Venkatesh Ramakrishnan; Ernst Martin Witte; Torsten Kempf; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; Marc Adrat; Markus Antweiler
Using the Schnorr-Euchner (SE) order for soft-input sphere decoders is inefficient for implementation, because it requires exhaustive calculation and sorting of partial metrics of all constellation points. Instead, low-complexity methods can be applied by separating the partial metric into channel information and a priori information and solely enumerating based on one of them. With such an orthogonalization, this paper presents an algorithm that effectively combines these two enumerations to deliver an order close to the SE one. Mathematical analyses and simulation results demonstrate that this is the first algorithm allowing for a low-complexity implementation with optimal error rate performance for any number of iterations.
field-programmable custom computing machines | 2012
Xiaolin Chen; Andreas Minwegen; Yahia Hassan; David Kammler; Shuai Li; Torsten Kempf; Anupam Chattopadhyay; Gerd Ascheid
Nowadays, architecture description languages (ADLs) are becoming popular for speeding up the development of complex SoC design, by performing design space exploration at a higher level of abstraction. This increase in the abstraction level traditionally comes at the cost of low performance of the final application specific instruction-set processor (ASIP) implementation, which is generated automatically from the ADL. There is a pressing need for novel optimization techniques for high level synthesis from ADLs, to compensate for this loss of performance. Two important aspects of these optimizations are the efficient usage of available structural information in the high level architecture descriptions and prudent pruning of overhead, introduced by mapping from ADL to register transfer level (RTL). In this paper, we present two high level optimization techniques, path sharing and decision minimization. These optimization techniques are shown to be of lower complexity, by at least two orders, compared to similar optimization during gate-level synthesis. The optimizations are tested for a RISC architecture, a VLIW architecture and two industrial embedded processors, Motorola M68HC11 and Infineon ICORE. The results indicate a significant improvement in overall performance.