Rainer G. Spallek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rainer G. Spallek is active.

Explore More

Publication

Featured researches published by Rainer G. Spallek.

digital systems design | 2007

Secure, Real-Time and Multi-Threaded General-Purpose Embedded Java Microarchitecture

M. Zabel; T.B. Preusser; P. Reichel; Rainer G. Spallek

This paper presents a novel implementation of an embedded Java microarchitecture for secure, real-time, and multi-threaded applications. A general-purpose platform is established through the support of modern features of object-oriented languages, such as exception handling, automatic garbage collection and interface types. New techniques have been implemented for specific real-time issues, such as an integrated stack and thread management for fast context switching, concurrent garbage collection for real-time threads and autonomous control flows through preemptive round-robin scheduling.

application specific systems architectures and processors | 2011

Next-generation massively parallel short-read mapping on FPGAs

Oliver Knodel; Thomas B. Preusser; Rainer G. Spallek

The mapping of DNA sequences to huge genome databases is an essential analysis task in modern molecular biology. Having linearized reference genomes available, the alignment of short DNA reads obtained from the sequencing of an individual genome against such a database provides a powerful diagnostic and analysis tool. In essence, this task amounts to a simple string search tolerating a certain number of mismatches to account for the diversity of individuals. The complexity of this process arises from the sheer size of the reference genome. It is further amplified by current next-generation sequencing technologies, which produce a huge number of increasingly short reads. These short reads hurt established alignment heuristics like BLAST severely. This paper proposes an FPGA-based custom computation, which performs the alignment of short DNA reads in a timely manner by the use of tremendous concurrency for reasonable costs. The special measures to achieve an extremely efficient and compact mapping of the computation to a Xilinx FPGA architecture are described. The presented approach also surpasses all software heuristics in the quality of its results. It guarantees to find all alignment locations of a read in the database while also allowing a freely adjustable character mismatch threshold. On the contrary, advanced fast alignment heuristics like Bowtie and Maq can only tolerate small mismatch maximums with a quick deterioration of the probability to detect existing valid alignments. The performance comparison with these widely used software tools also demonstrates that the proposed FPGA computation achieves its guaranteed exact results in very competitive time.

java technologies for real-time and embedded systems | 2007

Bump-pointer method caching for embedded Java processors

Thomas B. Preußer; Martin Zabel; Rainer G. Spallek

Caching of complete methods has been suggested to simplify the determination of the worst-case execution time (WCET) in the presence of a memory hierarchy [9]. While this previous approach limits possible cache misses to method invocations and returns, it still assumes a conventional blocked organization of the cache memory. This paper proposes and evaluates a new approach organizing the cached methods within a linked list while tag matching is limited to a sliding window of at most three methods over this linked list. The main advantages of this approach are the avoidance of low block utilization by small methods through bump-pointer space allocation and a further simplification of the WCET analysis by an easy miss prediction based solely on call stack information available locally.

field-programmable custom computing machines | 2012

Short-Read Mapping by a Systolic Custom FPGA Computation

Thomas B. Preuber; Oliver Knodel; Rainer G. Spallek

The mapping of reads, i.e. short DNA base pair strings, to large genome databases has become a critical operation for genetic analysis and diagnosis. Although this mapping operation is a simple string search tolerant of some character mismatches, it is yet extremely challenging due to the tremendous size of the searched genome databases. It is the heavy use of search heuristics such as BLAST, Maq and Bowtie, which makes the economic deployment of read mappers possible. While these heuristics achieve feasible computation times, they also sacrifice the accuracy of the mapping results, which is itself a high value for reliable diagnostics. The traditional software implementations are unable to exploit the tremendous parallelism, which is available in the mapping of thousands and millions of reads. Merely a handful of concurrent control flows, and thus searches, can be performed efficiently on contemporary multicores. Even GPU assistance only enables a few dozens of parallel searches. This paper proposes a systolic custom computation on FPGA, which implements the read mapping on a massively parallel architecture. It implements a true search and guarantees to find all read mappings under a configurable threshold of base pair mismatches. The highly regular design from compact string matchers enables the implementation of thousands of parallel search engines on a single FPGA device. The presented map per platform combines highest computational performance with an excellent result accuracy. Its performance is more than twice as high as that of a recently published comparable FPGA map per. Already when implemented on a contemporary mid-size FPGA, it meets the search speed of software heuristics, which only detect little more than half of the valid read mappings. The map per easily scales to large FPGA devices, which can, thus, implement accurate high-performance volume mappers. Accurate mapping is made available in application domains that could only afford fuzzy heuristics by now.

field-programmable logic and applications | 2009

Mapping basic prefix computations to fast carry-chain structures

Thomas B. Preusser; Rainer G. Spallek

Carry chains are a standard feature of modern FPGA architectures. They enable compact, regular and yet very fast implementations of the binary word addition even outpacing sophisticated parallel prefix networks for bit width far beyond 100. Although they are equally suited for other simple prefix computations, their employment in the implementation of such user functions is hindered by unportable low-level and vendor- or even device-specific means to implement the desired mapping. This paper names suitable example applications, identifies the class of prefix computations generically mappable to carry chains and presents a universal procedure to achieve this mapping by a transformation building upon the well-supported binary word addition. Synthesis results are presented to illustrate both the gain in speed and the significant reduction of area through the employment of this approach.

field programmable logic and applications | 1998

Increasing Microprocessor Performance with Tightly-Coupled Reconfigurable Logic Arrays

Sergej Sawitzki; Achim Gratz; Rainer G. Spallek

Conventional approaches to increase the performance of microprocessors often do not provide the performance boost one has hoped for due to diminishing returns. We propose the extension of a conventional hardwired microprocessor with a reconfigurable logic array, integrating both conventional and reconfigurable logic on the same die. Simulations have shown that even a comparatively simple and compact extension allows performance gains of 2–4 times over conventional RISC processors of comparable complexity, making this approach especially interesting for embedded microprocessors.

Archive | 2008

Architecture of Computing Systems – ARCS 2008

Uwe Brinkschulte; Theo Ungerer; Christian Hochberger; Rainer G. Spallek

Invited Program.- Keynote: Grand Challenges of Computer Engineering.- Keynote: The Impact of Operating Systems on Modern CPU Designs (and Vice Versa).- I Hardware Design.- System Level Simulation of Autonomic SoCs with TAPES.- Topology-Aware Replica Placement in Fault-Tolerant Embedded Networks.- Design of Gate Array Circuits Using Evolutionary Algorithms.- II Pervasive Computing.- Direct Backtracking: An Advanced Adaptation Algorithm for Pervasive Applications.- Intelligent Vehicle Handling: Steering and Body Postures While Cornering.- III Network Processors and Memory Management.- A Hardware Packet Re-Sequencer Unit for Network Processors.- Self-aware Memory: Managing Distributed Memory in an Autonomous Multi-master Environment.- IV Reconfigurable Hardware.- Dynamic Reconfiguration of FlexRay Schedules for Response Time Reduction in Asynchronous Fault-Tolerant Networks.- Synthesis of Multi-dimensional High-Speed FIFOs for Out-of-Order Communication.- A Novel Routing Architecture for Field-Programmable Gate-Arrays.- V Real-Time Architectures.- A Predictable Simultaneous Multithreading Scheme for Hard Real-Time.- Soft Real-Time Scheduling on SMT Processors with Explicit Resource Allocation.- A Hardware/Software Codesign of a Co-processor for Real-Time Hyperelliptic Curve Cryptography on a Spartan3 FPGA.- VI Organic Computing.- A Reference Architecture for Self-organizing Service-Oriented Computing.- Towards Self-organising Smart Camera Systems.- Using Organic Computing to Control Bunching Effects.- VII Computer Architecture.- A Generic Network Interface Architecture for a Networked Processor Array (NePA).- Constructing Optimal XOR-Functions to Minimize Cache Conflict Misses.- Potentials of Branch Predictors: From Entropy Viewpoints.

field-programmable logic and applications | 2010

Enhancing FPGA Device Capabilities by the Automatic Logic Mapping to Additive Carry Chains

Thomas B. Preusser; Rainer G. Spallek

This paper presents an approach to the automatic mapping of arbitrary combinational circuits to the arithmetic carry-chain structures widely available in modern FPGAs. This capability is highly valuable as it enables the utilization of these fast special-purpose structures for general-purpose logic. The described approach is both automatic and generally applicable to all carry-chain architectures designed for binary addition. It, thus, lifts severe constraints left by previous works. It helps to reduce the pressure on the general-purpose routing resources and accelerates critical logic paths. The proposed mapping is further shown to enhance the logic capability of a logic block containing a k-input lookup table (k-LUT) to implement many (k+1)-input functions on common FPGA architectures. This, in particular, also applies to all logic functions with a non-inverting path as introduced by Anderson and Wang. This makes their envisioned gains in logic density achievable even on current devices without requiring their architectural extension. The benefits of the carry-chain mapping are experimentally evaluated on the basis of combinational MCNC benchmarks. It is shown how carry chains can be recovered within a functional standard mapping and that a device mapping aware of carry chains achieves a reduction of the combinational delay of about a 20 percent.

international parallel and distributed processing symposium | 2002

Improving code efficiency for reconfigurable VLIW processors

Steffen Köhler; Jens Braunes; Sergej Sawitzki; Rainer G. Spallek

High code efficiency (operations per instruction) combined with a high degree of instruction level parallelism can rarely be obtained by hardwired microprocessor designs for a broad application domain. The implementation of reconfigurable execution units is a promising way to enhance code efficiency and microprocessor performance. However, the unit reconfiguration process introduces an additional dimension to the code generation phase, which complicates scheduling and may lead to code deficiencies if resource conflicts occure. This paper discusses code generation issues for a runtime-reconfigurable VLIW processor model, which combines fixed and flexible functional units (FU) in one template. Reconfigurable units (RFU) can be adapted to the application demands exploiting more coarse-grain parallelism than common instruction-level FUs. A case study illustrates the extraction of conditions for reconfigurable instructions proves scheduling possibilities for a set of common DSP benchmark algorithms. The software environment described includes a retargetable, parallelizing C compiler based on the SUIF compiler kit and a simulator, which can be used for identifying application-specific SIMD-instruction candidates and for evaluating the runtime behavior of the created object code.

java technologies for real-time and embedded systems | 2010

Application requirements and efficiency of embedded Java bytecode multi-cores

Martin Zabel; Rainer G. Spallek

This paper introduces a new Java Bytecode Multi-Core System-on-a-Chip architecture which scales well in chip-area and performance. Especially, the area efficiency is greater 1 (about 120%), demonstrating that we gained a higher speed-up compared to the additional hardware costs. Based on the evaluation of four different applications, the cores are connected to the shared heap by a full-duplex bus with pipelined transactions. Each multi-threaded realtime-capable core is equipped with local on-chip memory for the Java operand stack and a method cache to further reduce the memory bandwidth requirements. As opposed to related projects, synchronization is supported on a per object-basis (independent locks) instead of a single global lock. Application threads are distributed automatically using a round-robin scheme. The multi-port memory manager includes an exact and fully concurrent garbage collector for automatic memory management. The design can be synthesized for a variable number of parallel cores and shows a linear increase in chip-space. Speed-up and area-efficiency are measured for the same four different applications and are compared to related projects.

Explore More