Angeliki Kritikakou
University of Rennes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Angeliki Kritikakou.
real-time networks and systems | 2014
Angeliki Kritikakou; Christine Rochange; Madeleine Faugere; Claire Pagetti; Matthieu Roy; Sylvain Girbal; Daniel Gracia Pérez
When integrating mixed critical systems on a multi/many-core, one challenge is to ensure predictability for high criticality tasks and an increased utilization for low criticality tasks. In this paper, we address this problem when several high criticality tasks with different deadlines, periods and offsets are concurrently executed on the system. We propose a distributed run-time WCET controller that works as follows: (1) locally, each critical task regularly checks if the interferences due to the low criticality tasks can be tolerated, otherwise it decides their suspension; (2) globally, a master suspends and restarts the low criticality tasks based on the received requests from the critical tasks. Our approach has been implemented as a software controller on a real multi-core COTS system with significant gains.
euromicro conference on real-time systems | 2014
Angeliki Kritikakou; Claire Pagetti; Olivier Baldellon; Matthieu Roy; Christine Rochange
Although multi/many-core platforms enable the parallel execution of tasks, the sharing of resources may lead to long WCETs that fail to meet the real-time constraints of the system. Then, a safe solution is the execution of the most critical tasks in isolation followed by the execution of the remaining tasks. To improve the system performance, we propose an approach where a critical task can run in parallel with less critical tasks, as long as the real-time constraints are met. When no further interferences can be tolerated, the proposed run-time control suspends the low critical tasks until the termination of the critical task. In this paper, we describe the design and prove the correctness of our approach. To do so, a graph grammar is defined to formally model the critical task as a set of control flow graphs on which a safe partial WCET analysis is applied and used at run-time to control the safe execution of the critical task.
The Journal of Supercomputing | 2015
Vasilios I. Kelefouras; Angeliki Kritikakou; Elissavet Papadima; Constantinos E. Goutis
In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented. This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 1.2 up to 1.45). This is achieved by fully exploiting the combination of the software (e.g., data reuse) and hardware parameters (e.g., data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions. The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores.
design, automation, and test in europe | 2017
Steven Derrien; Isabelle Puaut; Panayiotis Alefragis; Marcus Bednara; Harald Bucher; Clément David; Yann Debray; Umut Durak; Imen Fassi; Christian Ferdinand; Damien Hardy; Angeliki Kritikakou; Gerard K. Rauwerda; Simon Reder; Martin Sicks; Timo Stripf; Kim Sunesen; Timon D. ter Braak; Nikolaos S. Voros; Jürgen Becker
Parallel architectures are nowadays not only confined to the domain of high performance computing, they are also increasingly used in embedded time-critical systems. The ARGO H2020 project1 provides a programming paradigm and associated tool flow to exploit the full potential of architectures in terms of development productivity, time-to-market, exploitation of the platform computing power and guaranteed real-time performance. In this paper we give an overview of the objectives of ARGO and explore the challenges introduced by our approach.
international conference on computer design | 2017
Lei Mo; Angeliki Kritikakou; Olivier Sentieys
Multicore architectures are now widely used in energy-constrained real-time systems, such as energy-harvesting wireless sensor networks. To take advantage of these multicores, there is a strong need to balance system energy, performance and Quality-of-Service (QoS). The Imprecise Computation (IC) model splits a task into mandatory and optional parts allowing to tradeoff QoS. The problem of mapping, i.e. allocating and scheduling, IC-tasks to a set of processors to maximize system QoS under real-time and energy constraints can be formulated as a Mixed Integer Linear Programming (MILP) problem. However, state-of-the-art solving techniques either demand high complexity or can only achieve feasible (suboptimal) solutions. In this paper, we develop an effective decomposition-based approach to achieve an optimal solution while reducing computational complexity. It decomposes the original problem into two smaller easier-to-solve problems: a master problem for IC-tasks allocation and a slave problem for IC-tasks scheduling. We also provide comprehensive optimality analysis for the proposed method. Through the simulations, we validate and demonstrate the performance of the proposed method, resulting in an average 55% QoS improvement with regards to published techniques.
ieee computer society annual symposium on vlsi | 2017
Rafail Psiakis; Angeliki Kritikakou; Olivier Sentieys
Critical applications require reliable processors that combine performance with low cost and energy consumption. Very Long InstructionWord (VLIW) processors have inherent resource redundancy not constantly used due to application’s fluctuating Instruction Level Parallelism (ILP). Reliability through idle slots utilization is explored either at compile-time, increasing code size and storage requirements, or at run-time only inside the current instruction bundle, adding unnecessary time slots and degrading performance. To address this issue, we propose a technique to explore the idle slots inside and across original and replicated instruction bundles reclaiming more efficiently the idle slots and creating a compact schedule. To achieve this, a dependency analysis is applied at run-time. The execution of both original and replicated instructions is allowed at any adequate function unit, providing higher flexibility on instruction scheduling. The proposed technique achieves up to 26% reduction in performance degradation over existing approaches.
Computer Languages, Systems & Structures | 2015
Vasilios I. Kelefouras; Angeliki Kritikakou; Constantinos E. Goutis
It is well-known that today?s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degradation of another. Second, they take into account only part of the specific algorithm?s information. Third, they take into account only a few hardware architecture parameters. These approaches cannot give an optimal solution.In this paper, a new methodology/pre-compiler is introduced, which speeds up loop kernels, by overcoming the above problems. This methodology solves four of the major scheduling sub-problems, together as one problem and not separately; these are the sub-problems of finding the schedules with the minimum numbers of (i) L1 data cache accesses, (ii) L2 data cache accesses, (iii) main memory data accesses, (iv) addressing instructions. First, the exploration space (possible solutions) is found according to the algorithm?s information, e.g. array subscripts. Then, the exploration space is decreased by orders of magnitude, by applying constraint propagation to the software and hardware parameters.We take the C-code and the memory architecture parameters as input and we automatically produce a new faster C-code; this code cannot be obtained by applying the existing compiler transformations to the original code. The proposed methodology has been evaluated for five well-known algorithms in both general and embedded processors; it is compared with gcc and clang compilers and also with iterative compilation. HighlightsFor the first time, the software optimization problem is addressed theoretically.Exploit the software structure and the hardware architecture parameters.Solve the major scheduling subproblems together as one problem and not separately.
international new circuits and systems conference | 2017
Rafail Psiakis; Angeliki Kritikakou; Olivier Sentieys
Error occurrence in embedded systems has significantly increased. Although inherent resource redundancy exist in processors, such as in Very Long Instruction Word (VLIW) processors, it is not always used due to low applications Instruction Level Parallelism (ILP). Approaches benefit the additional resources to provide fault tolerance. When permanent and soft errors coexist, spare units have to be used or the executed program has to be modified through self-repair or by using several stored versions. However, these solutions introduce high area overhead for the additional resources, time overhead for the execution of the repair algorithm and storage overhead of the multiversioning. To address these limitations, a hardware mechanism is proposed which at run-time replicates the instructions and schedules them at the idle slots considering the resource constraints. If a resource becomes faulty, the proposed approach efficiently rebinds both the original and replicated instructions during execution. In this way, the area overhead is reduced, as no spare resources are used, whereas time and storage overhead are not required. Results show up to 49% performance gain over existing techniques.
ACM Transactions on Design Automation of Electronic Systems | 2017
Angeliki Kritikakou; Thibaut Marty; Matthieu Roy
In real-time mixed-critical systems, Worst-Case Execution Time (WCET) analysis is required to guarantee that timing constraints are respected—at least for high-criticality tasks. However, the WCET is pessimistic compared to the real execution time, especially for multicore platforms. As WCET computation considers the worst-case scenario, it means that whenever a high-criticality task accesses a shared resource in multicore platforms, it is considered that all cores use the same resource concurrently. This pessimism in WCET computation leads to a dramatic underutilization of the platform resources, or even failing to meet the timing constraints. In order to increase resource utilization while guaranteeing real-time guarantees for high-criticality tasks, previous works proposed a runtime control system to monitor and decide when the interferences from low-criticality tasks cannot be further tolerated. However, in the initial approaches, the points where the controller is executed were statically predefined. In this work, we propose a dynamic runtime control which adapts its observations to online temporal properties, further increasing the dynamism of the approach, and mitigating the unnecessary overhead implied by existing static approaches. Our dynamic adaptive approach allows one to control the ongoing execution of tasks based on runtime information, and further increases the gains in terms of resource utilization compared with static approaches.
ACM Transactions on Design Automation of Electronic Systems | 2016
Angeliki Kritikakou; Francky Catthoor; Vasilios I. Kelefouras; Constantinos E. Goutis
The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy per memory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They have to approximate the accessed parts of the array leading to overestimation of the required resources. Otherwise, their exploration time is increased with an increase over the number of the different accessed parts of the array. We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring memory accesses and overlapping writes and reads.