Florian Brandner | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Florian Brandner is active.

Explore More

Publication

Featured researches published by Florian Brandner.

conference on object-oriented programming systems, languages, and applications | 2010

SPUR: a trace-based JIT compiler for CIL

Michael Bebenita; Florian Brandner; Manuel Fähndrich; Francesco Logozzo; Wolfram Schulte; Nikolai Tillmann; Herman Venter

Tracing just-in-time compilers (TJITs) determine frequently executed traces (hot paths and loops) in running programs and focus their optimization effort by emitting optimized machine code specialized to these traces. Prior work has established this strategy to be especially beneficial for dynamic languages such as JavaScript, where the TJIT interfaces with the interpreter and produces machine code from the JavaScript trace. This direct coupling with a JavaScript interpreter makes it difficult to harness the power of a TJIT for other components that are not written in JavaScript, e.g., the DOM implementation or the layout engine inside a browser. Furthermore, if a TJIT is tied to a particular high-level language interpreter, it is difficult to reuse it for other input languages as the optimizations are likely targeted at specific idioms of the source language. To address these issues, we designed and implemented a TJIT for Microsofts Common Intermediate Language CIL (the target language of C#, VisualBasic, F#, and many other languages). Working on CIL enables TJIT optimizations for any program compiled to this platform. In addition, to validate that the performance gains of a TJIT for JavaScript do not depend on specific idioms of JavaScript that are lost in the translation to CIL, we provide a performance evaluation of our JavaScript runtime which translates JavaScript to CIL and then runs on top of our CIL TJIT.

networks on chips | 2012

A Statically Scheduled Time-Division-Multiplexed Network-on-Chip for Real-Time Systems

Martin Schoeberl; Florian Brandner; Jens Sparsø; Evangelia Kasapaki

This paper explores the design of a circuit-switched network-on-chip (NoC) based on time-division-multiplexing (TDM) for use in hard real-time systems. Previous work has primarily considered application-specific systems. The work presented here targets general-purpose hardware platforms. We consider a system with IP-cores, where the TDM-NoC must provide directed virtual circuits - all with the same bandwidth - between all nodes. This may not be a frequent scenario, but a general platform should provide this capability, and it is an interesting point in the design space to study. The paper presents an FPGA-friendly hardware design, which is simple, fast, and consumes minimal resources. Furthermore, an algorithm to find minimum-period schedules for all-to-all virtual circuits on top of typical physical NoC topologies like 2D-mesh, torus, bidirectional torus, tree, and fat-tree is presented. The static schedule makes the NoC time-predictable and enables worst-case execution time analysis of communicating real-time tasks.

design, automation, and test in europe | 2011

Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach

Martin Schoeberl; Pascal Schleuniger; Wolfgang Puffitsch; Florian Brandner; Christian W. Probst; Sven Karlsson; Tommy Thorn

Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual- issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance.

acm symposium on applied computing | 2010

RTTM: real-time transactional memory

Martin Schoeberl; Florian Brandner; Jan Vitek

Hardware transactional memory is a promising synchronization technology for chip-multiprocessors. It simplifies programming of concurrent applications and allows for higher concurrency than lock based synchronization. Standard transactional memory is optimized for average case throughput, but for real-time systems we are interested in worst-case execution times. We propose real-time transactional memory (RTTM) as a time-predictable synchronization solution for chip-multiprocessors in real-time systems. We define the hardware for time-predictable transactions and provide a bound for the maximum transaction retries. The proposed RTTM is evaluated with a simulation of a Java chip-multiprocessor.

international symposium on object/component/service-oriented real-time distributed computing | 2013

A time-predictable stack cache

Sahar Abbaspour; Florian Brandner; Martin Schoeberl

Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap allocated and stack allocated data) share the same cache. This sharing leads to less precise results of the cache analysis part of the WCET analysis. Splitting the data cache for different data areas enables composable data cache analysis. The WCET analysis tool can analyze the accesses to these different data areas independently. In this paper we present the design and implementation of a cache for stack allocated data. Our port of the LLVM C++ compiler supports the management of the stack cache. The combination of stack cache instructions and the hardware implementation of the stack cache is a further step towards time-predictable architectures.

compilers, architecture, and synthesis for embedded systems | 2007

Compiler generation from structural architecture descriptions

Florian Brandner; Dietmar Ebner; Andreas Krall

With increasing complexity of modern embedded systems, the availability of highly optimizing compilers becomes more and more important. At the same time, application specific instruction-set processors (ASIPs) are used to fine-tune hardware platforms to the intended application, demanding the availability of retargetable components throughout thewhole tool chain. A very promising approach is to model the target architecture using a dedicated description language that is rich enough to generate hardware components and the required tool chain, e.g., assembler, linker, simulator, and compiler. In this work we present a new structural architecture description language (ADL) that is used to derive the architecture dependent components of a compiler backend - most notably an instruction selector based on tree pattern matching. We combine our backend with gcc, thereby opening up the way for a large number of readily available high level optimizations. Experimental results show that the automatically derived code generator is competitive in comparison to a handcrafted compiler backend.

languages, compilers, and tools for embedded systems | 2008

Generalized instruction selection using SSA -graphs

Dietmar Ebner; Florian Brandner; Bernhard Scholz; Andreas Krall; Peter Wiedermann; Albrecht Kadlec

Instruction selection is a well-studied compiler phase that translates the compilers intermediate representation of programs to a sequence of target-dependent machine instructions optimizing for various compiler objectives (e.g. speed and space). Most existing instruction selection techniques are limited to the scope of a single statement or a basic block and cannot cope with irregular instruction sets that are frequently found in embedded systems. We consider an optimal technique for instruction selection that uses Static Single Assignment (SSA) graphs as an intermediate representation of programs and employs the Partitioned Boolean Quadratic Problem (PBQP) for finding an optimal instruction selection. While existing approaches are limited to instruction patterns that can be expressed in a simple tree structure, we consider complex patterns producing multiple results at the same time including pre/post increment addressing modes, div-mod instructions, and SIMD extensions frequently found in embedded systems. Although both instruction selection on SSA-graphs and PBQP are known to be NP-complete, the problem can be solved efficiently - even for very large instances. Our approach has been implemented in LLVM for an embedded ARMv5 architecture. Extensive experiments show speedups of up to 57% on typical DSP kernels and up to 10% on SPECINT 2000 and MiBench benchmarks. All of the test programs could be compiled within less than half a minute using a heuristic PBQP solver that solves 99.83% of all instances optimally.

real-time networks and systems | 2013

Static analysis of worst-case stack cache behavior

Alexander Jordan; Florian Brandner; Martin Schoeberl

Utilizing a stack cache in a real-time system can aid predictability by avoiding interference that heap memory traffic causes on the data cache. While loads and stores are guaranteed cache hits, explicit operations are responsible for managing the stack cache. The behavior of these operations can be analyzed statically. We present algorithms that derive worst-case bounds on the latency-inducing operations of the stack cache. Their results can be used by a static WCET tool. By breaking the analysis down into subproblems that solve intra-procedural data-flow analysis and path searches on the call-graph, the worst-case bounds can be efficiently yet precisely determined. Our evaluation using the MiBench benchmark suite shows that only 37% and 21% of potential stack cache operations actually store to and load from memory, respectively. Analysis times are modest, on average running between 0.46s and 1.30s per benchmark, depending on the size of the stack cache.

languages compilers and tools for embedded systems | 2006

Effective compiler generation by architecture description

Stefan Farfeleder; Andreas Krall; Edwin Steiner; Florian Brandner

Embedded systems have an extremely short time to market and therefore require easily retargetable compilers. Architecture description languages (ADLs) provide a single concise architecture specification for the generation of hardware, instruction set simulators and compilers. In this article, we present an ADL for compiler generation. From a specification, we can derive an optimized tree pattern matching instruction selector, a register allocator and an instruction scheduler. Compared to a hand-crafted back end, the generated compiler produces smaller and faster code.The ADL is rich enough that other tools, such as assemblers, linkers, simulators and documentation, can all be obtained from a single specification.

Proceedings of the 20th International Conference on Real-Time and Network Systems | 2012

Static routing in symmetric real-time network-on-chips

Florian Brandner; Martin Schoeberl

With the rising number of cores on a single chip the question on how to organize the communication among those cores becomes more and more relevant. A common solution is to use a network-on-chip (NoC) that provides communication bandwidth, routing, and arbitration among the cores. The use of NoCs in real-time systems is problematic, since the shared network and all cores connected to it have to be analyzed to derive time bounds of real-time tasks. We propose to use a statically scheduled, time-division-multiplexed NoC design that allows a decoupled analysis of individual real-time tasks. Our network provides virtual circuits between all cores. These virtual circuits are implemented by delivering messages periodically on a static, fixed routing schedule. Since the routing does not change, it can be pre-computed offline. This work focuses on the computation of routing schedules for symmetric NoC topologies, e.g., torus and hyper-cube. Due to the symmetry, the all-to-all communication can be modeled via simplified communication patterns that are concurrently processed by all routers. The scheduling problem is solved by a heuristic that tries to maximize the overlap of active patterns. Our experiments show that, for larger networks, our heuristic yields schedule lengths that are only 15% to 20% longer than theoretical lower bounds.

Explore More