Is this you? Create Your Porfile

Andreas Becher

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Becher is active.

Explore More

Publication

Featured researches published by Andreas Becher.

field programmable logic and applications | 2014

Energy-aware SQL query acceleration through FPGA-based dynamic partial reconfiguration

Andreas Becher; Florian Bauer; Daniel Ziener; Jürgen Teich

In this paper, we propose an approach for energy-aware FPGA-based query acceleration for databases on embedded devices. After the analysis of an incoming query, a query-specific hardware accelerator is generated on-the-fly and loaded on the FPGA for subsequent query execution using partial dynamic reconfiguration. For each SQL query operation, a pre-synthesized partial bitstream implementation exists in a module library. This library includes modules for all major SQL operations like restrictions, aggregations, as well as more complex operations such as join and sort. The implementation of this flexible FPGA-based query accelerator approach on the embedded low-energy system-on-chip (SoC) platform Xilinx Zynq shows SQL query processing speeds comparable to high-end database servers, however, at a much lower energy consumption. Indeed, provided experimental results give evidence that the proposed architecture may reduce the amount of consumed energy to just 5% of the energy needed of an in-memory database system running on an x86-based server at equal throughput for respective benchmarks.

adaptive hardware and systems | 2015

Reliability of space-grade vs. COTS SRAM-based FPGA in N-modular redundancy

Robért Glein; Florian Rittner; Andreas Becher; Daniel Ziener; Jürgen Frickel; Jürgen Teich; Albert Heuberger

In this paper, we evaluate the suitability of different SRAM-based FPGAs for harsh radiation environments (e.g., space). In particular, we compare the space-grade and radiation-hardened by design Virtex-5QV (XQR5VFX130) with the commercial off-the-shelf Kintex-7 (KC7K325T) from Xilinx. The advantages of the latter device are: 2.5 times the resources of the space-grade FPGA, faster switching times, less power consumption, and the support of modern design tools. We focus on resource consumption as well as reliability in dependence of single event upset rates for a geostationary earth orbit satellite application, the Heinrich Hertz satellite mission. For this mission, we compare different modular redundancy schemes with different voter structures for the qualification of a digital communication receiver. A major drawback of the Kintex-7 are current-step single event latchups, which are a risk for space missions. If the use of an external voter is not possible, we suggest triple modular redundancy with one single voter at the end, whereby the Virtex-5QV in this configuration is about as reliable as the Kintex-7 in an N-modular redundancy configuration with an external high-reliable voter.

field-programmable technology | 2015

A co-design approach for accelerated SQL query processing via FPGA-based data filtering

Andreas Becher; Daniel Ziener; Klaus Meyer-Wegener; Jürgen Teich

In this paper, we present a novel co-designed architecture for high throughput database query processing. It consists of a highly configurable FPGA-based filter chain with arithmetic operation support and an alignment unit. This feeds the filtered data directly and in a cache-optimized way to embedded processors which are responsible for joining tables and post processing. High throughput interfaces and parallelism of FPGA implementations are thus combined in order to provide reduced and cache-aligned data for optimized processor access. As a key component, we introduce a new highly configurable bloom filter cascade to relieve a processor of time-consuming hash-value computation and to significantly reduce the data for hash joins. It is shown that this unique approach may reduce the amount of data to be processed by the processors in typical data-warehouse applications by several orders of magnitude. The proposed co-design has been implemented on the embedded low-energy system-on-chip (SoC) platform Xilinx Zynq. Performance results for standard benchmarks show an up to 10 x higher throughput compared to a full featured x86-based processor at only a fraction of energy consumption.

field programmable gate arrays | 2016

FPGA-Based Dynamically Reconfigurable SQL Query Processing

Daniel Ziener; Florian Bauer; Andreas Becher; Christopher Dennl; Klaus Meyer-Wegener; Ute Schurfeld; Jürgen Teich; Jörg Stephan Vogt; Helmut H. Weber

In this article, we propose an FPGA-based SQL query processing approach exploiting the capabilities of partial dynamic reconfiguration of modern FPGAs. After the analysis of an incoming query, a query-specific hardware processing unit is generated on the fly and loaded on the FPGA for immediate query execution. For each query, a specialized hardware accelerator pipeline is composed and configured on the FPGA from a set of presynthesized hardware modules. These partially reconfigurable hardware modules are gathered in a library covering all major SQL operations like restrictions and aggregations, as well as more complex operations such as joins and sorts. Moreover, this holistic query processing approach in hardware supports different data processing strategies including row- as column-wise data processing in order to optimize data communication and processing. This article gives an overview of the proposed query processing methodology and the corresponding library of modules. Additionally, a performance analysis is introduced that is able to estimate the processing time of a query for different processing strategies and different communication and processing architecture configurations. With the help of this performance analysis, architectural bottlenecks may be exposed and future optimized architectures, besides the two prototypes presented here, may be determined.

field programmable custom computing machines | 2016

A LUT-Based Approximate Adder

Andreas Becher; Jorge Echavarria; Daniel Ziener; Stefan Wildermann; Jürgen Teich

In this paper, we propose a novel approximate adder structure for LUT-based FPGA technology. Compared with a full featured accurate carry-ripple adder, the longest path is significantly shortened which enables the clocking with an increased clock frequency. By using the proposed adder structure, the throughput of an FPGA-based implementation can be significantly increased. On the other hand, the resulting average error can be reduced compared to similar approaches for ASIC implementations.

data management on new hardware | 2018

Optimistic regular expression matching on FPGAs for near-data processing

Andreas Becher; Stefan Wildermann; Jürgen Teich

Regular expressions (regex) are the main means to search for specific patterns in the vast amount of stored textual information. As a consequence, different designs of hardware accelerators have been proposed that enable memory-bound regex processing. Here, the regular expression to be evaluated is translated to a non-deterministic (NFA) or deterministic finite automaton (DFA) which is then mapped onto the hardware design. The available hardware resources of the design imply the maximum size (in terms of amount of states and transitions) of the supported automata. However, regular expressions may be arbitrarily complex. As a remedy, we propose optimistic regular expression evaluation which follows the idea of pruning the DFA of a regex such that it fits the available hardware resources. Consequently, we obtain a DFA with not only matching states but also an uncertain state. Texts marked as uncertain have to be re-evaluated by software. This is particularly tailored to near-data processing where the optimistic regex evaluation is performed near the data source thus reducing the overall amount of data to be transmitted to the requesting host. A prototype is implemented within Googles RE2 meaning a complete coverage of RE2 supported regular expression for the proposed design. Regular expression evaluation of up to 2.66 GByte/s could be achieved on an FPGA-based Zynq SoC.

reconfigurable computing and fpgas | 2016

Hybrid energy-aware reconfiguration management on Xilinx Zynq SoCs

Andreas Becher; Jutta Pirkl; Achim Herrmann; Jürgen Teich; Stefan Wildermann

Partial Reconfiguration is a common technique on FPGA platforms to load hardware accelerators at runtime without interrupting the remaining system. One crucial element is the time needed for reconfiguration as it affects usability, performance and energy consumption. Furthermore, many systems have to share partial areas between multiple applications and users. In this paper, we introduce a novel open-source reconfiguration manager for Xilinx Zynq SoCs which a) allows partial area sharing and b) includes a hybrid reconfiguration approach utilizing both the Processor Configuration Access Port (PCAP) and the Internal Configuration Access Port (ICAP) in order to minimize reconfiguration time and system energy consumption. We evaluate our design and identify the sweet spots between energy consumption and latency of accelerator availability with an example use case. By means of the hybrid approach, a speedup for the full configuration after powering on the FPGA of up to 64 % in comparison to solely using the PCAP interface can be achieved.

software and compilers for embedded systems | 2017

Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics

Jutta Pirkl; Andreas Becher; Jorge Echavarria; Jürgen Teich; Stefan Wildermann

Approximate Computing aims at trading off computational accuracy against improvements regarding performance, resource utilization and power consumption by making use of the capability of many applications to tolerate a certain loss of quality. A key issue is the dependency of the impact of approximation on the input data as well as user preferences and environmental conditions. In this context, we therefore investigate the concept of self-adaptive image processing that is able to autonomously adapt 2D-convolution filter operators of different accuracy degrees by means of partial reconfiguration on Field-Programmable-Gate-Arrays (FPGAs). Experimental evaluation shows that the dynamic system is able to better exploit a given error tolerance than any static approximation technique due to its responsiveness to changes in input data. Additionally, it provides a user control knob to select the desired output quality via the metric threshold at runtime.

reconfigurable computing and fpgas | 2016

ReOrder: Runtime datapath generation for high-throughput multi-stream processing

Andreas Becher; Stefan Wildermann; Moritz Mühlenthaler; Jürgen Teich

Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerators internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.

field-programmable technology | 2016

FAU: Fast and error-optimized approximate adder units on LUT-Based FPGAs

Jorge Echavarria; Stefan Wildermann; Andreas Becher; Jürgen Teich; Daniel Ziener

During the design of embedded systems, many design decisions have to be made to trade off between conflicting objectives such as cost, performance, and power. Approximate computing allows to optimize each objective, yet for the sake of accuracy. This means that a functional flaw is allowed to produce an error as long as this is small enough to maintain a feasible operation of the system or guarantee a certain accuracy of the results. In this paper, we propose a new technique for approximate addition optimized for LUT-Based FPGAs with segmented carry chains. Our optimized adder structure is able to a) best exploit artifacts of LUT-Based FPGAs such as unused inputs and b) provide a smaller average error than previously proposed approximate adder structures, as well as c) a reduced critical path delay than dedicated accurate logic in modern FPGAs. We present a novel stochastic error calculus that is able to take into account also non-uniform input distributions and present a detailed comparison of approximate adder structures proposed in literature with our novel LUT-Based approximate arithmetic structure.

Explore More