Benny Akesson
Czech Technical University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Benny Akesson.
embedded and real-time computing systems and applications | 2008
Benny Akesson; Liesbeth Steffens; Eelke Strooisma; Kees Goossens
The convergence of application domains in new systems-on-chip (SoC) results in systems with many applications with a mix of soft and hard real-time requirements. To reduce cost, resources, such as memories and interconnect, are shared between applications. However, resource sharing introduces interference between the sharing applications, making it difficult to satisfy their real-time requirements. Existing arbiters do not efficiently satisfy the requirements of applications in SoCs, as they either couple rate or allocation granularity to latency, or cannot run at high speeds in hardware with a low-cost implementation. The contribution of this paper is an arbiter called credit- controlled static-priority (CCSP), consisting of a rate regulator and a static-priority scheduler. The rate regulator isolates applications by regulating the amount of provided service in a way that decouples allocation granularity and latency. The static-priority scheduler decouples latency and rate, such that low latency can be provided to any application, regardless of the allocated rate. We show that CCSP belongs to the class of latency-rate servers and guarantees the allocated rate within a maximum latency, as required by hard real-time applications. We present a hardware implementation of the arbiter in the context of a DDR2 SDRAM controller. An instance with six ports running at 200 MHz requires an area of 0.0223 mm2 in a 90 nm CMOS process.
design, automation, and test in europe | 2011
Benny Akesson; Kgw Kees Goossens
Designing multi-processor systems-on-chips becomes increasingly complex, as more applications with realtime requirements execute in parallel. System resources, such as memories, are shared between applications to reduce cost, causing their timing behavior to become inter-dependent. Using conventional simulation-based verification, this requires all concurrently executing applications to be verified together, resulting in a rapidly increasing verification complexity. Predictable and composable systems have been proposed to address this problem. Predictable systems provide bounds on performance, enabling formal analysis to be used as an alternative to simulation. Composable systems isolate applications, enabling them to be verified independently. Predictable and composable systems are built from predictable and composable resources. This paper presents three general techniques to implement and model predictable and composable resources, and demonstrates their applicability in the context of a memory controller. The architecture of the memory controller is general and supports both SRAM and DDR2/DDR3 SDRAM and a wide range of arbiters, making it suitable for many predictable and composable systems. The modeling approach is based on a shared-resource abstraction that covers any combination of supported memory and arbiter and enables system-level performance analysis with a variety of well-known frameworks, such as network calculus or data-flow analysis.
embedded software | 2015
Martin Schoeberl; Sahar Abbaspour; Benny Akesson; Neil C. Audsley; Raffaele Capasso; Jamie Garside; Kees Goossens; Sven Goossens; Scott Hansen; Reinhold Heckmann; Stefan Hepp; Benedikt Huber; Alexander Jordan; Evangelia Kasapaki; Jens Knoop; Yonghui Li; Daniel Prokesch; Wolfgang Puffitsch; Peter P. Puschner; André Rocha; Cláudio Silva; Jens Sparsø; Alessandro Tocchi
Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.The T-CREST platform is evaluated with two industrial use cases. An application from the avionic domain demonstrates that tasks executing on different cores do not interfere with respect to their WCET. A signal processing application from the railway domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result of a collaborative research and development project executed by eight partners from academia and industry. The European Commission funded T-CREST.
design, automation, and test in europe | 2013
Slm Sven Goossens; Benny Akesson; Kgw Kees Goossens
Complex Systems-on-Chips (SoC) are mixed time-criticality systems that have to support firm real-time (FRT) and soft real-time (SRT) applications running in parallel. This is challenging for critical SoC components, such as memory controllers. Existing memory controllers focus on either firm real-time or soft real-time applications. FRT controllers use a close-page policy that maximizes worst-case performance and ignore opportunities to exploit locality, since it cannot be guaranteed. Conversely, SRT controllers try to reduce latency and consequently processor stalling by speculating on locality. They often use an open-page policy that sacrifices guaranteed performance, but is beneficial in the average case. This paper proposes a conservative open-page policy that improves average-case performance of a FRT controller in terms of bandwidth and latency without sacrificing real-time guarantees. As a result, the memory controller efficiently handles both FRT and SRT applications. The policy keeps pages open as long as possible without sacrificing guarantees and captures locality in this window. Experimental results show that on average 70% of the locality is captured for applications in the CHStone benchmark, reducing the execution time by 17% compared to a close-page policy. The effectiveness of the policy is also evaluated in a multi-application use-case, and we show that the overall average-case performance improves if there is at least one FRT or SRT application that exploits locality.
Circuits and Systems | 2011
Benny Akesson; Anca Mariana Molnos; Andreas Hansson; Jude Ambrose Angelo; Kees Goossens
System-on-chip (soc) design gets increasingly complex, as a growing number of applications are integrated in modern systems. Some of these applications have real-time requirements, such as a minimum throughput or a maximum latency. To reduce cost, system resources are shared between applications, making their timing behavior inter-dependent. Real-time requirements must hence e verified for all possible combinations of concurrently executing applications, which is not feasible with commonly used simulation-based techniques. This chapter addresses this problem using two complexity-reducing concepts: composability and predictability. Applications in a composable system are completely isolated and cannot affect each others behaviors, enabling them to be independently verified. Predictable systems, on the other hand, provide lower bounds on performance, allowing applications to be verified using formal performance analysis. Five techniques to achieve composability and/or predictability in soc resources are presented and we explain their implementation for processors, interconnect, and memories in our platform.
digital systems design | 2011
Karthik Chandrasekar; Benny Akesson; Kgw Kees Goossens
Power modeling and estimation has become one of the most defining aspects in designing modern embedded systems. In this context, DDR SDRAM memories contribute significantly to system power consumption, but lack accurate and generic power models. The most popular SDRAM power model provided by Micron, is found to be inaccurate or insufficient for several reasons. First, it does not consider the power consumed when transitioning to power-down and self-refresh modes. Second, it employs the minimal timing constraints between commands from the SDRAM datasheets and not the actual duration between the commands as issued by an SDRAM memory controller. Finally, without adaptations, it can only be applied to a memory controller that employs a close-page policy and accesses a single SDRAM bank at a time. These critical issues with Microns power model impact the accuracy and the validity of the power values reported by it and resolving them, forms the focus of our work. In this paper, we propose an improved SDRAM power model that estimates power consumption during the state transitions to power-saving states, employs an SDRAM command trace to get the actual timings between the commands issued and is generic and applicable to all DDRx SDRAMs and all memory controller policies and all degrees of bank interleaving. We quantitatively compare the proposed model against the unmodified Micron model on power and energy for DDR3-800. We show differences of up to 60% in energy-savings for the precharge power-down mode for a power-down duration of 14 cycles and up to 80% for the self-refresh mode for a self-refresh duration of 560 cycles.
design, automation, and test in europe | 2014
Karthik Chandrasekar; Slm Sven Goossens; Christian Weis; Martijn Martijn Koedam; Benny Akesson; Norbert Wehn; Kgw Kees Goossens
Manufacturing-time process (P) variations and runtime voltage (V) and temperature (T) variations can affect a DRAMs performance severely. To counter these effects, DRAM vendors provide substantial design-time PVT timing margins to guarantee correct DRAM functionality under worst-case operating conditions. Unfortunately, with technology scaling these timing margins have become large and very pessimistic for a majority of the manufactured DRAMs. While run-time variations are specific to operating conditions and as a result, their margins difficult to optimize, process variations are manufacturing-time effects and excessive process-margins can be reduced at run-time, on a per-device basis, if properly identified. In this paper, we propose a generic post-manufacturing performance characterization methodology for DRAMs that identifies this excess in process-margins for any given DRAM device at runtime, while retaining the requisite margins for voltage (noise) and temperature variations. By doing so, the methodology ascertains the actual impact of process-variations on the particular DRAM device and optimizes its access latencies (timings), thereby improving its overall performance. We evaluate this methodology on 48 DDR3 devices (from 12 DIMMs) and verify the derived timings under worst-case operating conditions, showing up to 33.3% and 25.9% reduction in DRAM read and write latencies, respectively.
international conference on hardware/software codesign and system synthesis | 2013
Slm Sven Goossens; Jja Jasper Kuijsten; Benny Akesson; Kgw Kees Goossens
Verifying real-time requirements of applications is increasingly complex on modern Systems-on-Chips (SoCs). More applications are integrated into one system due to power, area and cost constraints. Resource sharing makes their timing behavior interdependent, and as a result the verification complexity increases exponentially with the number of applications. Predictable and composable virtual platforms solve this problem by enabling verification in isolation, but designing SoC resources suitable to host such platforms is challenging. This paper focuses on a reconfigurable SDRAM controller for predictable and composable virtual platforms. The main contributions are: 1) A run-time reconfigurable SDRAM controller architecture, which allows trade-offs between guaranteed bandwidth, response time and power. 2) A methodology for offering composable service to memory clients, by means of composable memory patterns. 3) A reconfigurable Time-Division Multiplexing (TDM) arbiter and an associated reconfiguration protocol. The TDM slot allocations can be changed at run time, while the predictable and composable performance guarantees offered to active memory clients are unaffected by the reconfiguration. The SDRAM controller has been implemented as a TLM-level SystemC model, and in synthesizable VHDL for use on an FPGA.
euromicro conference on real-time systems | 2014
Yonghui Yonghui Li; Benny Akesson; Kgw Kees Goossens
Memory controller design is challenging as real-time embedded systems feature an increasing diversity of real-time and non-real-time applications with variable transaction sizes. To satisfy the requirements of the applications, tight bounds on the worst-case execution time (WCET) of memory transactions must be provided to real-time applications, while the lowest possible average execution time must be given to the rest. Existing real-time memory controllers cannot efficiently achieve this goal as they either bound the WCET by sacrificing the average execution time, or are not scalable to directly support variable transaction sizes, or both. In this paper, we propose to use dynamic command scheduling, which is capable of efficiently dealing with transactions with variable sizes. The three main contributions of this paper are: 1) a back-end architecture for a real-time memory controller with a dynamic command scheduling algorithm, 2) a formalization of the timings of the memory transactions for the proposed architecture and algorithm, and 3) two techniques to bound the WCET of transactions with both fixed and variable sizes, respectively. We experimentally evaluate the proposed memory controller and compare both the worst-case and average-case execution times of transactions to a state-of-the-art semi-static approach. The results demonstrate that dynamic command scheduling outperforms the semi-static approach by 33.4% in the average case and performs at least equally well in the worst case. We also show the WCET is tight for transactions with fixed and variable sizes, respectively.
design, automation, and test in europe | 2013
Hardik Shah; Alois Knoll; Benny Akesson
The transition towards multi-processor systems with shared resources is challenging for real-time systems, since resource interference between concurrent applications must be bounded using timing analysis. There are two common approaches to this problem: 1) Detailed analysis that models the particular resource and arbiter cycle-accurately to achieve tight bounds. 2) Using temporal abstractions, such as latency-rate (LR) servers, to enable unified analysis for different resources and arbiters using well-known timing analysis frameworks. However, the use of abstraction typically implies reducing the tightness of analysis that may result in over-dimensioned systems, although this pessimism has not been properly investigated. This paper compares the two approaches in terms of worst-case execution time (WCET) of applications sharing an SDRAM memory under Credit-Controlled Static-Priority (CCSP) arbitration. The three main contributions are: 1) A detailed interference analysis of the SDRAM memory and CCSP arbiter. 2) Based on the detailed analysis, two optimizations are proposed to the LR analysis that increase the tightness of its interference bounds. 3) An experimental comparison of the two approaches that quantifies their impact on the WCET of applications from the CHStone benchmark.