Heiko Falk
Hamburg University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heiko Falk.
design automation conference | 2009
Heiko Falk; Jan C. Kleinsorge
Caches are notorious for their unpredictability. It is difficult or even impossible to predict if a memory access will result in a definite cache hit or miss. This unpredictability is highly undesired especially when designing real-time systems where the worst-case execution time (WCET) is one of the key metrics. Scratchpad memories (SPMs) have proven to be a fully predictable alternative to caches. In contrast to caches, however, SPMs require dedicated compiler support. This paper presents an optimal static SPM allocation algorithm for program code. It minimizes WCETs by placing the most beneficial parts of a programs code in an SPM. Our results underline the effectiveness of the proposed techniques. For a total of 73 realistic benchmarks, we reduced WCETs on average by 7.4% up to 40%. Additionally, the run times of our ILP-based SPM allocator are negligible.
euromicro conference on real-time systems | 2011
Timon Kelter; Heiko Falk; Peter Marwedel; Sudipta Chattopadhyay; Abhik Roychoudhury
In the domain of real-time systems, the analysis of the timing behavior of programs is crucial for guaranteeing the schedulability and thus the safeness of a system. Static analyses of the WCET (Worst-Case Execution Time) have proven to be a key element for timing analysis, as they provide safe upper bounds on a programs execution time. For single-core systems, industrial-strength WCET analyzers are already available, but up to now, only first proposals have been made to analyze the WCET in multicore systems, where the different cores may interfere during the access to shared resources. An important example for this are shared buses which connect the cores to a shared main memory. The time to gain access to the shared bus may vary significantly, depending on the used bus arbitration protocol and the access timings. In this paper, we propose a new technique for analyzing the duration of accesses to shared buses. We implemented a prototype tool which uses the new analysis and tested it on a set of real world benchmarks. Results demonstrate that our analysis achieves the same precision as the best existing approach while drastically outperforming it in matters of analysis time.
symposium on code generation and optimization | 2009
Paul Lokuciejewski; Daniel Cordes; Heiko Falk; Peter Marwedel
A static loop analysis is a program analysis computing loop iteration counts. This information is crucial for different fields of applications. In the domain of compilers, the knowledge about loop iterations can be exploited for aggressive loop optimizations like Loop Unrolling. A loop analyzer also provides static information about code execution frequencies which can assist feedback-directed optimizations. Another prominent application is the static worst-case execution time (WCET) analysis which relies on a safe approximation of loop iteration counts.In this paper, we propose a framework for a static loop analysis based on Abstract Interpretation, a theory of a sound approximation of program semantics. To accelerate the analysis, we preprocess the analyzed code using Program Slicing, a technique that removes statements irrelevant forthe loop analysis. In addition, we introduce a novel polytope-based loop evaluation that further significantly reduces the analysis time. The efficiency of our loop analyzer is evaluated on a large number of benchmarks. Results show that 99% of the considered loops could be successfully analyzed in an acceptable amount of time. This study points out that our methodology is best suited for real-world problems.
ACM Transactions in Embedded Computing Systems | 2014
Philip Axer; Rolf Ernst; Heiko Falk; Alain Girault; Daniel Grund; Nan Guan; Bengt Jonsson; Peter Marwedel; Jan Reineke; Christine Rochange; Maurice Sebastian; Reinhard von Hanxleden; Reinhard Wilhelm; Wang Yi
A large class of embedded systems is distinguished from general-purpose computing systems by the need to satisfy strict requirements on timing, often under constraints on available resources. Predictable system design is concerned with the challenge of building systems for which timing requirements can be guaranteed a priori. Perhaps paradoxically, this problem has become more difficult by the introduction of performance-enhancing architectural elements, such as caches, pipelines, and multithreading, which introduce a large degree of uncertainty and make guarantees harder to provide. The intention of this article is to summarize the current state of the art in research concerning how to build predictable yet performant systems. We suggest precise definitions for the concept of “predictability”, and present predictability concerns at different abstraction levels in embedded system design. First, we consider timing predictability of processor instruction sets. Thereafter, we consider how programming languages can be equipped with predictable timing semantics, covering both a language-based approach using the synchronous programming paradigm, as well as an environment that provides timing semantics for a mainstream programming language (in this case C). We present techniques for achieving timing predictability on multicores. Finally, we discuss how to handle predictability at the level of networked embedded systems where randomly occurring errors must be considered.
international conference on hardware/software codesign and system synthesis | 2007
Heiko Falk; Sascha Plazar; Henrik Theiling
Caches are notorious for their unpredictability. It is difficult or even impossible to predict if a memory access results in a definite cache hit or miss. This unpredictability is highly undesired for real-time systems. The Worst-Case Execution Time (WCET) of a software running on an embedded processor is one of the most important metrics during real-time system design. The WCET depends to a large extent on the total amount of time spent for memory accesses. In the presence of caches, WCET analysis must always assume a memory access to be a cache miss if it can not be guaranteed that it is a hit. Hence, WCETs for cached systems are imprecise due to the overestimation caused by the caches. Modern caches can be controlled by software. The software can load parts of its code or of its data into the cache and lock the cache afterwards. Cache locking prevents the caches contents from being flushed by deactivating the replacement. A locked cache is highly predictable and leads to very precise WCET estimates, because the uncertainty caused by the replacement strategy is eliminated completely. This paper presents techniques exploring the lockdown of instruction caches at compile-time to minimize WCETs. In contrast to the current state of the art in the area of cache locking, our techniques explicitly take the worst-case execution path into account during each step of the optimization procedure. This way, we can make sure that always those parts of the code are locked in the I-cache that lead to the highest WCET reduction. The results demonstrate that WCET reductions from 54% up to 73% can be achieved with an acceptable amount of CPU seconds required for the optimization and WCET analyses themselves.
real time technology and applications symposium | 2012
Sudipta Chattopadhyay; Chong Lee Kee; Abhik Roychoudhury; Timon Kelter; Peter Marwedel; Heiko Falk
With the advent of multi-core architectures, worst case execution time (WCET) analysis has become an increasingly difficult problem. In this paper, we propose a unified WCET analysis framework for multi-core processors featuring both shared cache and shared bus. Compared to other previous works, our work differs by modeling the interaction of shared cache and shared bus with other basic micro-architectural components (e.g. pipeline and branch predictor). In addition, our framework does not assume a timing anomaly free multi-core architecture for computing the WCET. A detailed experiment methodology suggests that we can obtain reasonably tight WCET estimates in a wide range of benchmark programs.
Real-time Systems | 2010
Heiko Falk; Paul Lokuciejewski
The current practice to design software for real-time systems is tedious. There is almost no tool support that assists the designer in automatically deriving safe bounds of the worst-case execution time (WCET) of a system during code generation and in systematically optimizing code to reduce WCET.This article presents concepts and infrastructures for WCET-aware code generation and optimization techniques for WCET reduction. All together, they help to obtain code explicitly optimized for its worst-case timing, to automate large parts of the real-time software design flow, and to reduce costs of a real-time system by allowing to use tailored hardware.
embedded systems for real-time multimedia | 2006
Heiko Falk; Paul Lokuciejewski; Henrik Theiling
This paper presents techniques to integrate worst-case execution time (WCET) data into a compiler. Currently, a tight integration of WCET into compilers is strongly desired, but only some ad-hoc approaches were reported currently. Previous work mainly used self-written WCET estimators with limited functionality and preciseness during compilation. A very tight integration of a high quality WCET analyzer into a compiler was not yet achieved. This work is the first to present such a tight coupling between a compiler and the WCET analyzer aiT. This is done by automatically translating the assembly-like contents of the compilers low-level format (LLIR) to aiTs exchange format CRL2. Additionally, the results produced by aiT are automatically collected and re-imported into the compiler infrastructure. The work described in this paper is smoothly integrated into a C compiler for the Infineon TriCore processor. It opens up new possibilities for the design of WCET-aware optimizations in the future. The concepts for extending the compiler structure are kept very general so that they are not limited to WCET information. Rather, it is possible to use our concepts also for multi-objective optimization of e. g. best-case execution time (BCET) or energy dissipation
ACM Transactions in Embedded Computing Systems | 2014
Sudipta Chattopadhyay; Lee Kee Chong; Abhik Roychoudhury; Timon Kelter; Peter Marwedel; Heiko Falk
With the advent of multi-core architectures, worst case execution time (WCET) analysis has become an increasingly difficult problem. In this paper, we propose a unified WCET analysis framework for multi-core processors featuring both shared cache and shared bus. Compared to other previous works, our work differs by modeling the interaction of shared cache and shared bus with other basic micro-architectural components (e.g. pipeline and branch predictor). In addition, our framework does not assume a timing anomaly free multi-core architecture for computing the WCET. A detailed experiment methodology suggests that we can obtain reasonably tight WCET estimates in a wide range of benchmark programs.
euromicro conference on real-time systems | 2008
Paul Lokuciejewski; Heiko Falk; Peter Marwedel
Procedure Positioning is a well known compiler optimization aiming at the improvement of the instruction cache behavior. A contiguous mapping of procedures calling each other frequently in the memory avoids overlapping of cache lines and thus decreases the number of cache conflict misses. In standard literature, these positioning techniques are guided by execution profile data and focus on an improved average-case performance. We present two novel positioning optimizations driven by worst-case execution time (WCET) information to effectively minimize the programs worst-case behavior. WCET reductions by 10% on average are achieved. Moreover, a combination of positioning and the WCET-driven Procedure Cloning optimization proposed in [14] is presented improving the WCET analysis by 36% on average.