Martin Zabel
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Zabel.
java technologies for real-time and embedded systems | 2007
Thomas B. Preußer; Martin Zabel; Rainer G. Spallek
Caching of complete methods has been suggested to simplify the determination of the worst-case execution time (WCET) in the presence of a memory hierarchy [9]. While this previous approach limits possible cache misses to method invocations and returns, it still assumes a conventional blocked organization of the cache memory. This paper proposes and evaluates a new approach organizing the cached methods within a linked list while tag matching is limited to a sliding window of at most three methods over this linked list. The main advantages of this approach are the avoidance of low block utilization by small methods through bump-pointer space allocation and a further simplification of the WCET analysis by an easy miss prediction based solely on call stack information available locally.
java technologies for real-time and embedded systems | 2010
Martin Zabel; Rainer G. Spallek
This paper introduces a new Java Bytecode Multi-Core System-on-a-Chip architecture which scales well in chip-area and performance. Especially, the area efficiency is greater 1 (about 120%), demonstrating that we gained a higher speed-up compared to the additional hardware costs. Based on the evaluation of four different applications, the cores are connected to the shared heap by a full-duplex bus with pipelined transactions. Each multi-threaded realtime-capable core is equipped with local on-chip memory for the Java operand stack and a method cache to further reduce the memory bandwidth requirements. As opposed to related projects, synchronization is supported on a per object-basis (independent locks) instead of a single global lock. Application threads are distributed automatically using a round-robin scheme. The multi-port memory manager includes an exact and fully concurrent garbage collector for automatic memory management. The design can be synthesized for a variable number of parallel cores and shows a linear increase in chip-space. Speed-up and area-efficiency are measured for the same four different applications and are compared to related projects.
reconfigurable computing and fpgas | 2005
Martin Zabel; Steffen Köhler; M. Zimmerling; T.B. Preuber; Rainer G. Spallek
This work introduces a new digital signal processor (DSP) architecture concept, which provides increased instruction-level parallelism (ILP), flexibility and scalability compared to state-of-the-art DSPs. The concept can be characterized by an enhanced RISC microprocessor with a tightly coupled reconfigurable ALU array, a vector load/store unit and a control flow manipulation unit. These units implement coarse-grain reconfigurable structures by means of switchable contexts. In contrast to previous work, context activation is performed event-driven according to the instruction pointer of the RISC microprocessor. The synchronous operation of the context-controlled functional units enables an ILP comparable to complex VLIW/SIMD processors, without introducing additional instruction overhead. The reconfigurable units can be adapted to the application demands exploiting parallelism more coarse-grain than common instruction-level functional units. To evaluate the concept, we present a parametrizable template model of the DSP architecture based on a standard ARM7 RISC microprocessor. The DSP model includes an architecture description based on our own ADL/simulation environment and a VHDL RTL model for the purpose of FPGA prototype evaluation. Further, we show detailed quantitative performance and utilization evaluation results related to the ALU array geometry, memory transfer bandwidth and the number of configuration contexts. First experiments executing DSP algorithms have indicated that the proposed architecture can exploit more of the potential application parallelism at a reasonable amount of hardware costs compared to conventional digital signal processors
symposium on computer arithmetic | 2011
Thomas B. Preußer; Martin Zabel; Rainer G. Spallek
This work describes the carry-compact addition (CCA), a novel addition scheme that allows the acceleration of carry-chain computations on contemporary FPGA devices. While based on concepts known from the carry-look ahead addition and from parallel prefix adders, their adaptation by the CCA takes the context of an FPGA as implementation environment into account. These typically provide carry-chain structures to accelerate the simple ripple-carry addition (RCA). Rather than contrasting this scheme with the hierarchical addition approaches favored in hard-core VLSI designs, the CCA combines the benefits of both and uses hierarchical structures to shorten the critical path, which is still left on a core carry chain. In contrast to previous studies examining the asymptotically superior parallel prefix adders on FPGAs, the CCA is shown to outperform the standard RCA already for operand widths starting at 50~bits. Wider adders such as used in extended-precision floating-point units and in cryptographic applications even benefit from increasing speedups. The concrete mapping of the CCA as achieved for current Xilinx and Altera architectures is described and shown to be very favorable so as to yield a high speedup for a very modest investment of additional LUT resources.
java technologies for real-time and embedded systems | 2007
Thomas B. Preußer; Martin Zabel; Rainer G. Spallek
This paper describes an approach that enables the fast constant-time and memory-efficient runtime handling of interface data types as found in several object-oriented programming languages like Java. It extends an idea presented by League et al. [22] to attach an itable to a class object to obtain an interface object. A practical implementation of this approach based on an automated rather than a manual type conversion is presented. Its practibility in the context of Java is evaluated by an adaptation in the SableVM [15]. Several measures for its improvement have been derived and implemented. The adoption of the resulting technique for the implementation of interface method dispatches within SHAP [26, 32], a small-footprint embedded implementation of a Java bytecode processor, is described. This realization currently also contains a tradeoff compromising some generality of the support for interface typecasts while it ensures both a small memory demand as well as a fast constant-time interface method dispatch. The loss of generality is shown to be a minimal practical impact under the measures taken before.
signal processing systems | 2005
T.B. Preuber; Martin Zabel; Rainer G. Spallek
This paper explores the analogies among the carry propagation within binary adders and the token passing within arbiter implementations. This analysis identifies a common design space, thus decreasing the design costs and time by efficient re-use beyond individual application domains. The immediate utilization of available carry-propagation networks is outlined and justified. This, for instance, enables designers to choose directly from a large pool of well-studied parallel prefix networks. While these solutions are, due to their regularity, favorable for VLSI ASIC designs, they do usually not synthesize well on FPGAs. Extending the analogy between carry propagation and token passing to this domain, the appropriate utilization of carry chains commonly available on FPGAs is demonstrated to yield small and fast arbiters.
field-programmable logic and applications | 2004
Steffen Köhler; Jens Braunes; Thomas B. Preußer; Martin Zabel; Rainer G. Spallek
This work introduces a new concept of enhancing a RISC microprocessor with a tightly coupled reconfigurable ALU array, a vector load/store unit and a control flow manipulation unit. These units implement coarse-grain reconfigurable structures by means of switchable contexts. Context activation is performed event-driven according to the instruction pointer of the RISC microprocessor. The synchronous operation of the context controlled functional units enables instruction level parallelism (ILP) comparable to complex VLIW processors, without introducing instruction overhead. The reconfigurable units can be adapted to the application demands exploiting parallelism more coarse-grain than common instruction-level functional units. To evaluate the concept, a standard ARM RISC microprocessor was chosen to be tightly coupled to these reconfigurable units. Architecture description and simulation were performed using RECAST, a reconfiguration-enabled architecture description language and simulation tool-set. The software environment also includes a retargetable, parallelizing C compiler based on the SUIF compiler kit. First experiments executing DSP algorithms have indicated, that the proposed architecture can exploit more of the potential application parallelism than conventional VLIW processors.
reconfigurable computing and fpgas | 2016
Thomas B. Preußer; Martin Zabel; Patrick Lehmann; Rainer G. Spallek
Standard libraries and frameworks boost the productivity and performance significantly as they enable the re-use of optimized solutions for standard tasks. Hardware designs are often unnecessarily complex because a) a rich RTL library of standard solutions is missing and b) designs must often sacrifice portable and readable behavioral descriptions so as to meet timing and area constraints on the targeted device. The PoC Library addresses these issues. First of all, it provides abstracted solutions for standard tasks. These include single- and dual-port memory components as well as higher-level data structures such as FIFOs, stacks and deques built on top of them. The library further comprises cross-clock triggers, arithmetic and algorithmic cores, as for wide addition and sorting, as well as communication stack implementations. Each implementation is encapsulated by a stable interface that is independent from the specific target platform. Nonetheless, device-specific optimizations are available through specialized implementations, which are selected internally whenever this is beneficial or necessitated by the vendor flow. The provided modules are highly parametrizable to fit the application needs and enable design space exploration. An extensive set of utility functions and frequently used data types benefits the conciseness of both library and user code. Finally, PoC enables the continuous verification of its IP cores by automated testbenches. This verification flow is only one part of a flow infrastructure that also supports the generation of re-usable netlists as to speed up the integration of more complex cores into an application design. The flow infrastructure is implemented in Python and supports various simulation backends, synthesis tool chains and operating systems.
southern conference programmable logic | 2014
Oliver Knodel; Martin Zabel; Patrick Lehmann; Rainer G. Spallek
The future of hardware development lies in massively parallel hardware architectures as used in embedded as well as high-performance systems, for instance streaming-based, realtime and database applications. Especially field-programmable gate arrays provide a platform for the rapid development of integrated circuits and the accompanied software. For reasons of energy efficiency, it is increasingly important to tailor hardware directly to the application. As such systems are very complex, the training of engineers has to start early. Furthermore, the usual curricula in computer science and electrical engineering teach only basic skills. In this paper we present lectures and especially practical FPGA design courses for bachelor and master students. We introduce a selection of individual projects, which were realized by students in practical courses. With examples from final bachelor projects and master theses we demonstrate the quality of education and its integration into current research. We describe possible improvements of labs, such as automated test benches and a remote FPGA laboratory for advanced courses.
field programmable logic and applications | 2014
Oliver Knodel; Martin Zabel; Patrick Lehmann; Rainer G. Spallek
The future of hardware development lies in massively parallel hardware architectures as used in embedded as well as high-performance systems, for instance streaming-based, real-time and database applications. Especially field-programmable gate arrays provide a platform for the rapid development of integrated circuits and the accompanied software. For reasons of energy efficiency, it is increasingly important to tailor hardware directly to the application. As such systems are very complex, the training of engineers has to start early. Furthermore, the usual curricula in computer science and electrical engineering teach only basic skills. Our approach does not start with specialized courses in master or diploma programs, but already with the motivation of primary school children for technical disciplines. Girls and boys are addressed equally. In school, children are made familiar with programmable circuits and thus motivated to study computer science with a specialization in computer engineering. In this paper we present our lectures and practical courses for bachelor and master students. With examples from final bachelor and master projects we demonstrate the quality of our education and its integration into current research.