[PDF] Worst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation

Abstract

Runtime monitoring plays a key role in the assurance of modern intelligent cyber-physical systems, which are frequently data-intensive and safety-critical. While graph queries can serve as an expressive yet formally precise specification language to capture the safety properties of interest, there are no timeliness guarantees for such auto-generated runtime monitoring programs, which prevents their use in a real-time setting. The main challenge is that the worst-case execution time (WCET) bounds provided by current static WCET computation methods for such programs can only provide very conservative and impractical estimations, which would result in wasteful resource allocation or inadequate scheduling of monitors. This paper presents a WCET analysis method for data-driven monitoring programs derived from graph queries. The method incorporates results obtained from low-level timing analysis into the objective function of a modern graph solver. This allows the systematic generation of input graph models up to a specified size (referred to as witness models) for which the monitor is expected to take the most time to complete. Hence the estimated execution time of the monitors on these graphs can be considered as safe WCET. Moreover, in case the runtime graph model outgrows the size that was used to determine WCET at design time, our approach provides a fast but more conservative recomputation of safe execution time bounds on-the-fly using runtime model statistics. The benefit is that such on-line WCET estimation is still comparable to the one which solely relies on traditional approaches. Finally, we perform experiments with query-based programs executed in a real-time platform over a set of generated models to investigate the relationship between execution times and their estimates, and we compare WCETs obtained with the different approaches.

Full PDF

WWorst-Case Execution Time Calculation forQuery-Based Monitors by Witness Generation

MÁRTON BÚR,

McGill University, Canada

KRISTÓF MARUSSY,

Budapest University of Technology and Economics, Hungary

BRETT H. MEYER,

McGill University, Canada

DÁNIEL VARRÓ,

McGill University, Canada and Budapest University of Technology and Economics, Hungary

Runtime monitoring plays a key role in the assurance of modern intelligent cyber-physical systems, which are frequently data-intensiveand safety-critical. While graph queries can serve as an expressive yet formally precise specification language to capture the safetyproperties of interest, there are no timeliness guarantees for such auto-generated runtime monitoring programs, which preventstheir use in a real-time setting. The main challenge is that the worst-case execution time (WCET) bounds provided by current staticWCET computation methods for such programs can only provide very conservative and impractical estimations, which would resultin wasteful resource allocation or inadequate scheduling of monitors. This paper presents a WCET analysis method for data-drivenmonitoring programs derived from graph queries. The method incorporates results obtained from low-level timing analysis intothe objective function of a modern graph solver. This allows the systematic generation of input graph models up to a specified size(referred to as witness models ) for which the monitor is expected to take the most time to complete. Hence the estimated executiontime of the monitors on these graphs can be considered as safe WCET. Moreover, in case the runtime graph model outgrows the sizethat was used to determine WCET at design time, our approach provides a fast but more conservative recomputation of safe executiontime bounds on-the-fly using runtime model statistics. The benefit is that such on-line WCET estimation is still comparable to the onewhich solely relies on traditional approaches. Finally, we perform experiments with query-based programs executed in a real-timeplatform over a set of generated models to investigate the relationship between execution times and their estimates, and we compareWCETs obtained with the different approaches.CCS Concepts: •

Computer systems organization → Real-time system specification ; •

Software and its engineering → Automated static analysis ; Model-driven software engineering.Additional Key Words and Phrases: real-time systems, worst-case execution time analysis, graph queries, model generation

ACM Reference Format:

Márton Búr, Kristóf Marussy, Brett H. Meyer, and Dániel Varró. 2021. Worst-Case Execution Time Calculation for Query-BasedMonitors by Witness Generation.

J. ACM

37, 4, Article 111 (August 2021), 29 pages. https://doi.org/10.1145/1122445.1122456

Authors’ addresses: Márton Búr, [email protected], McGill University, 3480 Rue University, Montreal, Quebec, H3A 2K6, Canada; KristófMarussy, [email protected], Budapest University of Technology and Economics, Magyar tudósok körútja 2, Budapest, 1117, Hungary; Brett H. Meyer,[email protected], McGill University, Canada; Dániel Varró, [email protected], McGill University, Canada, Budapest University of Technologyand Economics, Hungary.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-partycomponents of this work must be honored. For all other uses, contact the owner/author(s).© 2021 Copyright held by the owner/author(s).Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . S E ] F e b Márton Búr, Kristóf Marussy, Brett H. Meyer, and Dániel Varró

Runtime monitoring has become a key technique in the assurance of safety-critical and intelligent cyber-physicalsystems (CPS) such as autonomous vehicles [38] (e.g., self-driving cars, drones) where traditional upfront design timeverification is problematic due to the dynamically changing environment and the data-intensive nature of the system.Runtime monitoring programs execute as part of the system to analyze events and execution traces [5] in order todetect potentially critical situations. Since this requires formal precision to capture safety requirements, logic-basedformalisms (e.g., propositional logic, temporal logic) are frequently used to specify sequences that violate a requirement.Furthermore, monitoring programs can be automatically synthesized from such specifications that are ready to be usedin traditional hard real-time systems without compromising task schedulability and real-time properties of the existingprogram [25, 39].Unfortunately, existing runtime monitoring approaches used in safety-critical applications have some major limita-tions, which are increasingly problematic for the new generation of data-intensive, intelligent, and self-adaptive, yetsafety-critical CPSs. First, the expressiveness of the specification language is considered moderate [24], which hinders bothdescription and comprehension of complex rules by engineers. Moreover, safety-critical programs typically use staticallyallocated data with bounded input sizes and they conservatively avoid many programming language constructs.Recent advances in runtime monitoring aim to overcome these limitations by (1) offering high-level and expressivequery-based [10] or rule-based [24] formalisms to capture the properties to be monitored, and (2) using runtime graphmodels as an in-memory knowledge base which capture dynamic changes in the system or its environment on ahigh-level of abstraction [10, 23]. As a key conceptual benefit, such approaches enable the development of data-drivensafety monitors (instead of event-driven ones) where aggregated changes triggered by complex sequences of atomicevents can be detected directly over an evolving data model. Such automatically synthesized monitoring programs usecomplex data structures and complicated control flow.When a safety-critical program is executed in a hard real-time environment, both correct and timely execution isessential, otherwise an error or a deadline miss can lead to catastrophic consequences [40]. For this reason, correctnessis typically addressed by rigorous testing or formal verification while worst-case execution time (WCET) analysis isemployed to compute a safe upper bound of maximum required execution time for a function.While recent research has investigated data-driven runtime monitors for intelligent and safety-critical CPSs in adistributed environment [9, 18, 23], and various testing approaches have been proposed [1, 42], the timeliness aspect ofthe problem has been neglected. In fact, only very few initial ideas are available [11, 48].In order to obtain practical WCET bounds for data-driven runtime monitors that are safe and tight, major challengesin timing analysis need to be tackled. First, detailed information is needed about the executable binary and executionplatform, including precise memory, pipeline, and cache descriptions. Moreover, determining the WCET for a genericprogram traces back to solving the halting problem, thus existing WCET computation approaches prohibit the useof certain programming constructs. Furthermore, programs using dynamic memory allocation are not supported bycurrent analysis tools because of the non-deterministic features of the allocation process would make safe WCETestimations highly pessimistic and thus impractical [26]. Finally, the domain-specific relationships among data entitiespose several nontrivial restrictions on the program flow facts, i.e., it is common that the longest program path providedby a static analysis is infeasible. This additional information largely helps to reduce safe WCET bounds; however, thereis no generally applicable method to efficiently exploit information about data flow during timing analysis.

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 3

Contributions.

This paper aims to address WCET estimation in the very challenging setting of data-driven runtimemonitoring programs. Such monitors can be automatically derived from high-level graph query specifications and areevaluated over dynamically changing graph models as data structures used in real-time systems. In particular, the papermakes the following contributions.(1) We adapt data-driven runtime monitoring programs derived from high-level graph query specifications [9] toreal-time platforms.(2) We provide an algorithm for static analysis for monitoring programs to estimate execution time on a givenruntime model snapshot.(3) By using an existing graph solver, we derive witness models which exhibit theoretical WCET bounds for graphquery programs up to a predefined model size.(4) We adapt a symbolic WCET analysis approach [4] for graph query programs and exploit model statistics toprovide a coarse but still practical estimate for runtime models that are larger than the ones covered by staticestimations.(5) We perform an extensive experimental assessment of query evaluation times over a variety of graph modelsexecuted on a real-time platform, and we compare the results with the different WCET estimates.Data-driven runtime monitoring programs derived from graph queries differ substantially from traditional safety-critical programs by using complex graph data structures and complex algorithms. Up to our best knowledge, this isthe first approach to provide precise WCET analysis of query-based runtime monitors by exploiting state-of-the-artmodel generation techniques and model statistics. This enables the use of such programs in a real-time context byproviding practical yet conservative WCET estimates that can be dynamically maintained and recomputed at runtimeto reallocate time slots upon significant changes in the underlying graph model.The outline of the paper is as follows. In Section 2, we present a smart railway system as a motivating example forquery-based monitoring which is followed by the conceptual foundations of runtime graph models, graph queries, andmodel generation. Section 3 provides an overview of query-based runtime monitors. Section 4 describes the adaptationof graph queries to real-time systems. Performance evaluation of embedded query programs and worst case executiontime computation for the case-study is discussed in Section 5. Related works are introduced in Section 6, while Section 7concludes the paper.

This section overviews the core concepts of data-driven runtime monitors and revisits the foundations of domain-specific graph model generation. High-level rule-based specification languages [9, 18, 24] have been recently proposedto specify safety criteria for runtime monitors using either either an event stream [24] or a runtime graph model ([email protected]) [9, 18] as underlying knowledge representation. Moreover, recent results in model generation allowsystematic generation of domain specific models that fulfill a given set of well-formedness criteria [42].

We illustrate high-level distributed runtime models and query-based data-driven monitors in the context of the opensource

Model-Based Demonstrator for Smart and Safe Cyber-Physical Systems (MoDeS3) [51] educational platform,which showcases various challenges of modern intelligent but safety-critical CPS applications. Figure 1 presents aself-contained excerpt of the demonstrator that is a model railway system with an added layer of safety to prevent

Manuscript submitted to ACM

Márton Búr, Kristóf Marussy, Brett H. Meyer, and Dániel Varró

Fig. 1. Runtime monitoring by graph queries trains from collision and derailment using runtime safety monitors. The railway track is equipped with several sensorsand actuators indicated with black triangles in the lower part of Figure 1. Train shunt detectors are capable of sensingtrains when they move on to a particular segment of the track, while turnout equipments allow reading and setting thedirections of the associated turnout.The system is managed by a distributed monitoring service running on a network of heterogeneous computing units,such as Arduinos, Raspberry Pis, BeagleBone Blacks, etc. Relevant runtime information gained from sensor reads (e.g.,the occupancy of a segment, or the status of a turnout) is uniformly captured in an in-memory runtime graph model ,which is also deployed on the platform. Safety monitors are formally captured as graph queries (which are frequentlyused for checking design time consistency constraints in design tools of various embedded systems). Alerts from themonitoring services may trigger control commands of actuators (e.g., to change turnout direction) to guarantee safeoperation. The monitoring and control programs are running in a real-time setting on the computing units.While the MoDeS3 platform can demonstrate various challenges of CPSs, this paper exclusively focuses on thereal-time aspect of the runtime monitoring service which is deployed to some embedded devices with limited resources(memory, CPU, etc.). In particular, our aim is to compute tight WCET estimations for complex runtime monitoringprograms based on graph queries evaluated over a runtime graph model running on a single computing unit. Weaccomplish this by building atop existing symbolic WCET analysis methods and advanced graph model generationtechniques.

The [email protected] paradigm [7] places models at the center of contemporary cyber-physical and self-adaptivesystems in order to efficiently capture runtime information about the system and its environment. In this work, we relyon (typed and directed) graph models used as a knowledge base at runtime for the monitored system. Such graphs aredynamically changing in-memory data structures which encode domain-specific instance models typed over a domainmetamodel. A runtime (instance) model captures a snapshot of the underlying system in operation [7, 45].Relevant changes in the system are reflected in the runtime model (in an event-driven or time-triggered way) andoperations executed on the runtime model (e.g., setting values of controllable attributes of objects or updating linksbetween objects) are reflected in the system itself (e.g., by executing scripts or calling services).

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 5

The core concepts (classes) in a domain and the relations (references) betweenthose concepts are often captured in a metamodel . In this paper, we formally capture metamodels by a logic signatureand instance models as logic structures following [34].

Definition 2.1. A metamodel is formally represented as a logic signature Σ = { C , . . . , C 𝑙 , R , . . . , R 𝑚 } , where { C 𝑖 } 𝑙𝑖 = are unary class symbols and { R 𝑗 } 𝑚𝑗 = are binary relation symbols .The definition of metamodel can include binary attribute symbols such as in [10]. However, their handling is analogousto binary relation symbols, thus their discussion is excluded from here. Definition 2.2. An instance model over a metamodel Σ is a logic structure 𝑀 = ⟨O 𝑀 , I 𝑀 ⟩ where O 𝑀 is a finite set ofdomain objects in 𝑀 and I 𝑀 provides interpretations for the class and reference predicates in Σ such that I 𝑀 ( C 𝑖 ) ⊆ O 𝑀 is the set of objects of type C 𝑖 for each C 𝑖 ∈ Σ , and I 𝑀 ( R 𝑗 ) ⊆ O 𝑀 × O 𝑀 is the set of relation links of type R 𝑗 for each R 𝑗 ∈ Σ .To provide a condensed characterization of instance models, we will collect various model statistics at runtime. Forsimplicity, we will restrict our attention to the type distributions (number of objects of each type). Definition 2.3.

The model statistics for an instance model 𝑀 is a function stats 𝑀 : { C , . . . , C 𝑙 } → N which denotesthe number of objects of type C 𝑖 , i.e. stats 𝑀 ( C 𝑖 ) = |I 𝑀 ( C 𝑖 )| . Example 2.4.

An excerpt of the MoDeS3 metamodel with metamodel constraints is shown in Figure 2(a) using theEclipse Modeling Framework (EMF) notation [47]. A model has exactly one

Modes3ModelRoot that contains all otherobjects within the model (as indicated by the containment references). One domain concept is

Train . Class

Segment represents a section of the railway track with the connectedTo reference which describes what other segments itis linked to (up to two). Moreover, each train maintains a location reference to a segment to describe its currentposition. Likewise, instances of the

Segment class also maintain a reference occupiedBy to express if they are currentlyoccupied by a train. Moreover, a specialized

Segment is a

Turnout that can change its connections between straight and divergent segments. In this example, { Modes3ModelRoot , Train , Segment , Turnout } ⊂ Σ are unary class predicates,while { Location , OccupiedBy , ConnectedTo , Straight , Divergent , Trains , Segments , Turnouts } ⊂ Σ are binary referencepredicates.Figure 2(b) shows a graphical presentation of an instance model, i.e. a snapshot of the MoDeS3 runtime model withthe following model statistics. The graph has a total of 12 objects. There are 9 Segment instances with their respective connectedTo references and two of them are also instances of

Turnout . The turnout represented by tu is capable ofswitching between segments s and s , while tu is capable of switching between segments s and s . Additionally,there are three different trains on the track tr ... with their respective locations being s , s , and s . The formal definitions of metamodel and instance modelenable the formulation of first-order logic (FOL) predicates, which can be evaluated as graph queries over the logicstructure of an instance model.Informally, base predicates check either for equality or for the existence of certain objects and references of arespective type (predicate) in the underlying runtime model. Then complex predicates are derived by traditional FOLconnectives (e.g. not, exists, forall, and, or).

Manuscript submitted to ACM

Márton Búr, Kristóf Marussy, Brett H. Meyer, and Dániel Varró

Segmentid : uint_32Modes3ModelRootid : uint_32 TurnoutTrainid : uint_32speed : double = 0.0 [1] straight [1] divergent[0..2] connectedTo[1] location[0..1] occupiedBy[0..*] trains [0..*] segments[0..*] turnouts (a) Metamodel with metamodel constraints (b) Runtime model snapshot with the following modelstatistics: 7×

Segments , 2×

Turnouts , 3×

Trains and themaximum out-degree of links of type connectedTo is 2

Fig. 2. The MoDeS3 metamodel and instance model

Definition 2.5. A first-order logic predicate (or query ) 𝜑 , where 𝑣 , . . . , 𝑣 𝑛 denote free variables (not appearing in anyquantifiers) of 𝜑 can be evaluated over a instance model 𝑀 along a variable binding 𝑍 : { 𝑣 , . . . , 𝑣 𝑛 } → O 𝑀 (denoted as ⟦ 𝜑 ⟧ 𝑀𝑍 ) to return either true (1) or false (0) as follows: ⟦ ⟧ 𝑀𝑍 ≔ ⟦ ⟧ 𝑀𝑍 ≔ ⟦ C 𝑖 ( 𝑣 )⟧ 𝑀𝑍 ≔ 𝑍 ( 𝑣 ) ∈ I 𝑀 ( C 𝑖 ) ⟦ R 𝑗 ( 𝑣, 𝑣 ′ )⟧ 𝑀𝑍 ≔ ⟨ 𝑍 ( 𝑣 ) , 𝑍 ( 𝑣 ′ )⟩ ∈ I 𝑀 ( R 𝑗 )⟦ 𝑣 = 𝑣 ′ ⟧ 𝑀𝑍 ≔ 𝑍 ( 𝑣 ) = 𝑍 ( 𝑣 ′ ) ⟦¬ 𝜑 ⟧ 𝑀𝑍 ≔ − ⟦ 𝜑 ⟧ 𝑀𝑍 ⟦∃ 𝑣 : 𝜑 ⟧ 𝑀𝑍 ≔ (cid:212) 𝑥 ∈O 𝑀 ⟦ 𝜑 ⟧ 𝑀𝑍,𝑣 ↦→ 𝑥 ⟦∀ 𝑣 : 𝜑 ⟧ 𝑀𝑍 ≔ (cid:211) 𝑥 ∈O 𝑀 ⟦ 𝜑 ⟧ 𝑀𝑍,𝑣 ↦→ 𝑥 ⟦ 𝜑 ∨ 𝜑 ⟧ 𝑀𝑍 ≔ ⟦ 𝜑 ⟧ 𝑀𝑍 ∨ ⟦ 𝜑 ⟧ 𝑀𝑍 ⟦ 𝜑 ∧ 𝜑 ⟧ 𝑀𝑍 ≔ ⟦ 𝜑 ⟧ 𝑀𝑍 ∧ ⟦ 𝜑 ⟧ 𝑀𝑍 Definition 2.6. Predicate / query evaluation aims to find a variable binding 𝑍 : { 𝑣 , . . . , 𝑣 𝑛 } → O 𝑀 for a predicate 𝜑 that maps all free variables of the predicate to objects of 𝑀 such that the predicate evaluates to true , i.e., ⟦ 𝜑 ⟧ 𝑀𝑍 = Definition 2.7.

The match set of a query predicate 𝜑 with free variables 𝑣 , . . . , 𝑣 𝑛 is the set Matches ( 𝑀, 𝜑 ) = { 𝑍 : { 𝑣 , . . . , 𝑣 𝑛 } → O 𝑀 | ⟦ 𝜑 ⟧ 𝑀𝑍 = } .One element in this set is called a match , while | Matches ( 𝑀, 𝜑 )| denotes the size of the match set.Note that in our context, a match of a query will typically represent a violation of a well-formedness constraint ofthe domain or a hazardous situation with respect to a safety property. A domain metamodel is frequently complemented in practice withadditional metamodel and well-formedness constraints to restrict the possible relationships between domain concepts.

Metamodel constraints can be captured by FOL predicates and categorized as follows [34].i. A type hierarchy constraint defines a type system by supertype relations. For each object 𝑜 , there shall be asingle class C , such that for any class object 𝑜 is instance of C ′ iff C ′ is a supertype of C .ii. A type compliance constraint restricts the classes C and C of objects at the ends of a reference R .iii. A multiplicity constraint may be placed on lower bounds on the number of references adjacent to an object 𝑜 .iv. An inverse relation constraint prescribes that references R and R ′ always occur in pairs. Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 7v. A containment hierarchy constraint ensures that models are arranged in a strict tree hierarchy via the containment references starting from a root object.

Example 2.8.

The formula 𝜑 LM illustrates a constraint of the metamodel for location multiplicity (LM). This formulaevaluates to 1 for an object passed as a parameter if it is of type Train and it does not have exactly has one segment asits location (i.e., has zero or more than one). A query can be defined with the following formula: 𝜑 LM ( 𝑡 ) = Train ( 𝑡 ) ∧(¬(∃ 𝑠 , 𝑠 : Location ( 𝑡, 𝑠 ) ∧ Location ( 𝑡, 𝑠 ) → 𝑠 = 𝑠 ) ∨ ¬(∃ 𝑠 : Location ( 𝑡, 𝑠 )) .In this work, we are interested in the timing analysis of monitors that take well-formed models as their input. Suchadditional well-formedness constraints of the domain can also be captured by FOL predicates [42, 43]. When a constraintis formalized as a FOL predicate, it captures erroneous model fragments. As such, we expect that the respective FOLpredicate has empty match sets in a model. Formally, if a well-formedness constraint captured by FOL predicate 𝜑 , then Matches ( 𝑀, 𝜑 ) = ∅ for all well-formed instance models 𝑀 . Example 2.9.

The metamodel constraints of the MoDeS3 domain shown in Figure 2(a) allow the creation of a modelthat have no real-life counterparts. For example, metamodel constraints allow the creation of a

Turnout that has two connectedTo references to two distinct

Segments , but none of these

Segments are the continuation of the

Turnout in the straight or divergent directions. Such a turnout in an instance model would represent a physically impossiblesituation, therefore we exclude such cases from our analysis by introducing a well-formedness constraint captured by thefollowing FOL predicate: 𝜑 SD ( 𝑡 ) = ∃ 𝑠 , 𝑠 : ConnectedTo ( 𝑡, 𝑠 ) ∧ ConnectedTo ( 𝑡, 𝑠 ) ∧ ¬( 𝑠 = 𝑠 ) ∧ ¬( Straight ( 𝑡, 𝑠 ) ∨ Straight ( 𝑡, 𝑠 ) ∨ Divergent ( 𝑡, 𝑠 ) ∨ Divergent ( 𝑡, 𝑠 )) . Graph queries have been proposed to specify safety properties for runtime monitoring in [9] on a high level of abstractionby focusing on structural dependencies between system entities. Informally, a graph query captures a potentially unsafesituation that may occur at runtime. In this work, we use FOL predicates to define graph queries over instance models.Similarly, the OCL standard has been used in [30] for similar purposes.Graph queries of runtime monitors are evaluated over a runtime model which reflects the current state of the monitoredsystem , e.g. data received from different sensors, the services allocated to computing units, or the health information ofcomputing infrastructure. In accordance with the [email protected] paradigm [7, 45], observable changes of the realsystem gets updated — either periodically with a certain frequency, or in an event-driven way upon certain triggers.Monitors formalized as a FOL predicates capture potentially unsafe cases represented in a runtime model. If a runtimemonitoring goal captured by the FOL predicate 𝜑 , then the elements of Matches ( 𝑀, 𝜑 ) contain all objects from 𝑀 whereimmediate action is required to avoid failures.Classical event-based runtime monitors rely on some temporal logic formalism to detect sequences of events occurringin the system at different points in time, while the underlying data model used in such monitors is restricted to atomicpropositions. On the other hand, data-driven runtime monitors defined by graph queries can check structural propertiesof a runtime model that represents a snapshot of the underlying system. In other words, they focus on the data availableon the underlying system at a given point of time (rather than detecting the sequence of events that evolved the systeminto the particular state).As such, event-based and data-driven monitors are complementary techniques. While graph queries can be extendedto express temporal behavior [17], our current work is restricted to (structural) safety properties where the violation ofa property is expressible by graph queries. Manuscript submitted to ACM

Márton Búr, Kristóf Marussy, Brett H. Meyer, and Dániel Varró 𝜑 MT ( mt , 𝑡 ) = ∃ loc : OccupiedBy ( loc , 𝑡 ) ∧ Turnout ( mt ) ∧ Straight ( mt , loc ) ∧ ¬ ConnectedTo ( loc , mt ) (a) Graph query as logic predicate loc : Segment mt : Turnoutt : Train NEG connectedTostraightoccupiedBymisalignedTurnout(mt, t) (b) Graphical query presentation pattern misalignedTurnout (mt , t) { Segment.occupiedBy(loc , t); Turnout(mt); Turnout.straight(mt , loc); neg find connected(loc , mt); } private pattern connected(s1 , s2) { Segment.connectedTo(s1 , s2); } (c) Description of a query and its subquery in VQL Fig. 3. Monitoring goal formulated as a graph query 𝜑 MT for misalignedTurnout Example 2.10.

On a railway track, a misaligned turnout (MT) refers to a state where a turnout is set to a directionthat differs from the direction of an incoming train. Trains passing through such misaligned turnouts can damage therailway equipment and can lead to derailment [36]. Query 𝜑 MT shown in Figure 3(a) captures a (simplified) hazardouscase and identifies violating situations. The query returns pairs of trains t and turnouts mt where the train is locatedon a segment loc that is the straight continuation of the turnout, but the turnout is currently not connected to thissegment. Any match of this query highlights a train and a turnout where immediate action (stop the train or switchthe direction of the turnout) is required. Figure 3(b) shows the same graph query in graphical presentation (used inmodeling tools). Listing 3(c) shows the textual description of 𝜑 MT using VIATRA Query Language (VQL) [6], which is agraph query language often used in CPS design tools. The expressiveness of the VQL converges to first-order logicwith transitive closure, thus it provides a rich language for capturing a variety of complex structural conditions anddependencies between various entities in a graph model. Automated synthesis of domain-specific graph models has been actively researched in the field of model-based softwareengineering [8, 20, 42, 44]. Hereby, we revisit some core concepts.A model generation task takes the following four required inputs: • A metamodel Σ = { C , . . . , C 𝑙 , R , . . . , R 𝑚 } with class and reference predicate symbols. • A theory of constraints T = { 𝜑 , . . . , 𝜑 𝑛 } expressed as FOL (error) predicates . • Type scopes S : { 𝐶 , . . . , 𝐶 𝑘 } → IV N , where IV N is the set of natural number intervals. Type scopes specify theminimum and the maximum number of instances of objects by type, i.e., if S( 𝐶 𝑖 ) = [ 𝐿 𝑖 , 𝑈 𝑖 ] , then solution modelsmust contain at least 𝐿 𝑖 and at most 𝑈 𝑖 instances of the class 𝐶 𝑖 ∈ Σ . • An objective function which is a linear function that assigns a real number to a model based on the num-ber of matches for selected predicates and assigned weights. Formally, a linear objective function is 𝑓 ( 𝑀 ) = (cid:205) 𝑛𝑖 = | Matches ( 𝑀,𝜓 𝑖 )| · 𝑤 𝑖 , where 𝜓 𝑖 are predicates and 𝑤 𝑖 ∈ Z are weights. Definition 2.11.

An instance model 𝑀 satisfies theory T and the type scopes S , written as T , S ⊨ 𝑀 , if • no constraints are violated, i.e. for all error predicates 𝜑 ∈ T : Matches ( 𝑀, 𝜑 ) = ∅ , and Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 9 • the number of model elements of a specific type satisfy the type scope (written as S ⊨ stats 𝑀 ); formally, for allclass symbols C 𝑖 ∈ Σ , stats 𝑀 ( C 𝑖 ) ∈ S( C 𝑖 ) . Definition 2.12.

The solution of the model generation task is a set of models that are instances of the input metamodel,satisfy all constraints, and respect the provided type scopes: solutions ( Σ , T , S) = { 𝑀 | 𝑀 is an instance of the metamodel Σ and T , S ⊨ 𝑀 } . Definition 2.13.

The optimal solutions of the model generation are solution models that maximize the value of thelinear objective function. optimal ( Σ , T , S , 𝑓 ) = { 𝑀 ∈ solutions ( Σ , T , S) | ∀ 𝑀 ′ ∈ solutions ( Σ , T , S) : 𝑓 ( 𝑀 ′ ) ≤ 𝑓 ( 𝑀 )} . Our work relies on the model generator presented in [42] which was proved to be complete and sound in [49].Informally, it is able to derive all instance models in a domain (up to a designated size defined by the scopes) whichsatisfy the constraints.

The implementation of runtime models and graph queries requires extra attention to facilitate the timing analysisof query-based monitoring programs. This section overviews our assumptions and requirements about the practicalapplication of the theoretical background introduced in Section 2.

For data-driven monitors, the structure of the underlying graph model directly impacts the performance of queryevaluation. Since an embedded device may have limited available CPU and memory resources, a lightweight datastructure is needed to efficiently capture runtime graph models. While the in-depth discussion of such a graph datastructure is out of scope for this paper, we make the following assumptions about the supported operations of theunderlying graph: • Dynamic element creation and deletion.

The runtime model serves as the knowledge base about the under-lying system and its environment. For this reason, it needs to accommodate graph models without a theoretical apriori upper bound for model size. Based on [26], one way to support this is to allocate the maximum amountof memory that is physically possible to be used for storing the graph. However, only the allocated memory isdetermined at compile time, the type (and distribution) of objects stored in the graph is runtime information. • Maintenance of model statistics.

As objects in the graph model are created and deleted, high-level modelstatistics [50] such as the number of instances of each type (i.e., class and reference) in the model should bemaintained continuously to allow real-time access to them. • Indexing of objects by type using unique identifiers.

As query evaluation typically starts by iterating overall elements of a given type or accessing specific objects, it necessitates efficient object access, e.g. by maintaininga real-time index for memory resident data [16]. • Navigability along edges.

Many steps in query evaluation navigate along the edges (references) of selectedobjects to find further appropriate variable substitutions for unbound query variables. A simple way to supportthis feature is by, e.g., maintaining direct pointers in the objects to reachable objects.

Manuscript submitted to ACM typedef struct { uint16_t segment_id; Train *train; Segment *connected_to [2]; uint8_t connected_to_count; } Segment; typedef struct { uint16_t train_id; double speed; Segment *location; } Train; Listing 1. Classes of the MoDeS3 domain typedef union { Segment segment; Train train; } Object; struct Modes3ModelRoot { Object objects[SEGMENTS + TRAINS]; uint16_t object_count; uint16_t segments[SEGMENTS]; uint16_t segment_count; uint16_t trains[TRAINS]; uint16_t train_count; } runtime_model; Listing 2. Generic graph object and model rootFig. 4. Example implementation of a generic graph data structure with Segment and Turnout domain classes

Example 3.1.

Listing 1 shows a possible C implementation of data structures for

Segment and

Train classes presentin the metamodel depicted in Figure 2(a). Lines 2, 9 and 10 are fields created from respective attributes. For each type,an globally unique id attribute that encodes the type of the object is mandatory for indexing and model manipulation.Furthermore, in this example, we implement references (i.e., links in the graph model) as pointers (line 11) or pointerarrays with sizes (lines 4 and 5). Representing links between objects with pointers is highly efficient from a performanceviewpoint.Listing 2 shows how a graph model container Modes3ModelRoot (lines 5-12) can allocate static memory for genericgraph objects represented by the union type

Object (lines 1-4) in C. Runtime model statistics are captured by thecounters in lines 7, 9, and 11. The maximum used memory by the graph is preallocated in line 6 by the objects array which has a length of the sum of maximum expected number of trains (denoted by the constant

TRAINS ) andthe maximum expected number of segments (

SEGMENTS ). At the same time, the arrays segments (line 9) and trains (line 11) keep track of the indexes of the respective objects within the objects array. The use of these two latter datastructures is to facilitate model updates: the id attributes of a given model object is used to index these arrays (theseunique identifiers encode the type of the object, i.e., which array should be indexed) and obtain their position in the objects array. Graph query evaluation (aka graph pattern matching) is the process offinding all matches of a query over a specific model [50]. When query evaluation is initiated, the initial empty variablebinding is gradually extended to retrieve all matches from the entire model.Various query evaluation strategies exist in literature [21]. Our runtime monitoring framework uses a local search-based query evaluation strategy to find matches of monitoring queries based on [50]. To obtain efficient performance atruntime, query evaluation is guided by a search plan [50], which maps each constraint in the query to a single pair of ⟨ Stepnumber , Operationtype ⟩ . In this tuple, the first value specifies the order in which query evaluation should attemptto satisfy the respective constraint. The second value can be either extend or check , depending on the current binding ofconstraint variables (to objects in the runtime model) while the constraint is enforced: • An extend operation evaluates a constraint with at least one free variable. Execution of such operations requiresiterating over all potential variable substitutions and selecting the ones for which the constraint evaluates to 1. Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 11

Algorithm 1: Code generation from search plans Function

CompileSearchPlan(sp, idx) is if idx > sp.size() then return code for storing a match ; step = sp[idx] matcherCode = "" if step is extend then for uv ∈ step.getFreeVariables() do matcherCode + = AddLoopFor(uv, step.getConstraintFor(uv)) end else if step is check then matcherCode + = AddIfFor(step.getAllVariables(), step.getConstraint()) end return matcherCode + CompileSearchPlan(sp, idx + end Table 1. A possible search plan for query misaligned-Turnout where free variables are underlined

Constraint Step

Turnout ( mt ) Straight ( mt , loc ) ¬ ConnectedTo ( loc , mt ) OccupiedBy ( loc , t ) • A check operation evaluates a constraint with only bound variables. Execution of such operations determinesif the constraint evaluates to 1 over the actual variable binding.Constructing effective search plans for graph queries is a complex challenge. It is outside of the scope of the currentpaper and has been formerly extensively studied (see, e.g., [28, 50] for possible solutions). However, we present apseudo code that generates embedded query code from a search plan in algorithm 1. The CompileSearchPlan function isparameterized with a search plan and a given search step index. Line 2 returns a code snippet to register a match ifthe provided index is beyond the index of the final search step. Otherwise, the search step is extracted (line 3) and thevariable matcherCode to hold the generated code is initialized to an empty string (line 4). Then, if the current searchstep is an extend , it iterates over all free variables (line 6) and generates a series of embedded for loops to bind these tothe respective candidate model objects selected by the constraint in the step (lines 7-8). Otherwise, the current stepis a check (line 10) and inserts an if condition (lines 11-12). Finally, in line 14, the generation continues recursivelyappending the code generated from the subsequent steps to the result. The the query code for the entire search plan sp can be generated by calling CompileSearchPlan ( sp , ) . Example 3.2.

Table 1 shows a possible search plan for the 𝜑 MT query. Each row represents a search operation. Thefirst column is the assigned operation number (or step). The second column (Constraint) shows which constraint isenforced by the given step and the third column shows the variables that are already bound by the previous operationswhen the current operation begins execution. The fourth column shows the search operation type (check or extend)which is based on the variable bindings prior to the execution of the search operation: if the constraint parameters areall bound, then it is a check, otherwise, it is an extend.Data-driven monitors aim to find matches of graph queries over the entire runtime graph model using a localsearch-based query evaluation strategy. When such graph queries are used in a real-time system, they need to retrieveall matches of a query in the model by a deadline. This is carried out by using a depth-first search graph traversalalgorithm derived from the search plan of the query. This keeps the memory footprint of the algorithm constant, thusonly the graph data may change over time as the model evolves.The operations of the query search plan are translated to structured imperative code: • Each extend operation is either a single assignment to a variable or a for loop iterating over a set of candidatevariable bindings, depending on the multiplicity of the respective navigation edge (reference constraint). • Each check operation is mapped to an if statement that checks whether the current variable binding satisfies agiven condition created from the query constraint. Manuscript submitted to ACM

Example 3.3.

Listing 3 shows the C code generated from the query specification of misalignedTurnout . Assumingthat a global variable model points to the root of the entire graph model including its up-to-date model statistics, callingthe function mt_matcher with a pointer to the result set structure results will compute and store all matches over themodel in results .In the example, the initially bound variables are assumed to be empty, as indicated in Line 2 (L2 for short) with

NULL values, because we aim to find all matches in the entire model. In L3, the size of the result set is initialized to 0. The for loop in L5 represents step 1 from the search plan (see Table 1) and iterates over all turnouts in the model, binding thevariable vars.mt to all possible objects in L6. Lines 8 and 9 together represent search step 2. In L8, the vars.loc isassigned the segment referred by vars.mt via the link straight . If such a segment exists in L9, execution continueswith the third search operation that is mapped to L11–L14.The generated code for ¬ ConnectedTo (line 3 in Table 1) checks (as negative condition) if the vars.loc->connected_to array holds a pointer to the turnout vars.mt . The execution only continues if no such reference exists, i.e., ¬ ConnectedTo = vars.loc is assigned to vars.t . If such a train exists, a match is found and registered by assigning the correspondingvariable values to parameter variables in a new match (L19 and L20) and increasing the counter of found matches match_cntr . The execution concludes with saving the number of matches (L25).When engineering safety-critical software, cyclomatic complexity (CC) is frequently used as a metric to estimate thecomplexity of the code [40]. As a general recommendation, code with high CC is traditionally avoided in a safety-criticalsystem as it requires extra efforts to test and maintain. However, the derived imperative source code of data-drivenmonitoring programs is inherently complex even for small queries, which is largely attributed to the declarative natureof query specifications. For example, the CC of Listing 3 is 5, which already indicates substantial complexity. A keycontribution of the current paper is to provide novel rigorous timing analysis for data-driven monitors in order toenable their use in a safety-critical context. Estimating the WCET of query-based monitors is a highly complex task which involves multiple classic challengesof timing analysis. First, query programs that take a snapshot of a runtime graph model as their input need to beanalyzed. The runtime model of the system is a continuously changing data structure that captures the most up to dateknowledge of the underlying running system. As such, programs with changing memory demands need to be analyzed ,which is considered as a major challenge in the domain of WCET analysis [26]. While the available physical memory ofthe execution platform sets a de facto upper limit for model size, existing approaches can only provide very coarseestimates based on this parameter, which is often impractical.Another major challenge is that query execution time is heavily data-dependent , i.e. the same control flow of a queryprogram may have substantially different runtimes based upon the structural characteristics of the underlying graph

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 13 void mt_matcher(MTMatchSet *results) { MTVars vars = { mt = NULL , loc = NULL , t = NULL }; int match_cntr = 0; | // Constraint: Turnout ( mt ) | for(int i0 = 0; i0 < model ->turnout_cnt; i0++) { | vars.mt = model ->nodes[model ->turnout_ids[i0]]; | | // Constraint: Straight ( mt , loc ) | | vars.loc = mt->straight; | | if(vars.loc != NULL) { | | | // Constraint: ¬ Connected ( mt , loc ) | | | int is_connected = 0; | | | is_connected |= vars.loc ->connected_to[0] == vars.mt; | | | is_connected |= vars.loc ->connected_to[1] == vars.mt; | | | if(is_connected == 0) { | | | | // Constraint: OccupiedBy ( loc , t ) | | | | vars.t = mt.loc ->train; | | | | if(vars.t != NULL){ | | | | // Register match | | | | results ->matches[match_cntr].mt = vars.mt; | | | | results ->matches[match_cntr ++].t = vars.t; | | | | } | | | } | | } | } results ->size = match_cntr; } Listing 3. Source code generated for query misalignedTurnout Fig. 5. CFG of mt_matcher

Fig. 6. Classification of query input models and model updates from the perspective of WCET analysis model. On the one hand, existing static timing analysis approaches fail to provide safe and practical WCET boundsbecause the longest execution path is often infeasible. Assuming some constraints on model size (e.g. capped by availablememory) and some general restrictions on model scope (e.g. there are more segments than trains in any real model),a key open challenge is how to provide a model where the execution time of a particular query program will likely bemaximal — or at least, sufficiently high to provide a safe WCET estimate.To tackle this challenge, we introduce the concept of witness models , illustrated in Figure 6. Figure 6 sketches the model space of runtime graph models (represented with dots), i.e. the set of all monitor inputs. Possible changes madeto a model at runtime (depicted as arrows) result in a new model. In order to make a practical WCET estimate for queryprograms, we make some explicit assumptions about realistic (and consistent) models captured in the form of a modelscope .A witness model of a model scope (red dots in Figure 6) is a consistent model which maximizes the WCET estimate forall models within the given scope . It can serve as representative data to calculate WCET for any model within the scope.If a certain change brings the model outside the given model scope, then the WCET estimate obtained using the witnessmodel from the scope may no longer be safe on the model. In such a case, either the witness model of the new scope Manuscript submitted to ACM

Existing WCET analysis methods.

Static WCET analysis is typically divided into two major phases: flow analysis and low-level analysis . Flow analysis aims at reconstructing the program flow and deriving control flow graphs (CFGs), whilelow-level analysis aims at computing hardware-specific timing parameters of basic program block (BB) executions.Note that our current work primarily focuses on flow analysis of WCET estimation while low-level WCET analysis isout of scope.A common flow analysis approach for static WCET computation is the implicit path enumeration technique (IPET) [33].This method analyzes the control flow of the program to compute a sequence of instructions that yields the longestpossible execution. IPET is based on solving an integer linear programming (ILP) problem constructed from the programCFG and flow facts.The IPET method requires complex computations to solve the underlying ILP problem. As such, it is applicable fordesign time WCET computation for real-time systems, but not for WCET recomputation at runtime. However, thesymbolic method proposed in [4] is capable of providing parametric WCET formulae which are cheap to recompute incase the program flow facts change.

Our approach addresses the WCET estimation challenge for graph query programs by providing two complementarysafe WCET estimation methods : a static one (used at design time) and an on-line one (usable at runtime). On the one hand,our static WCET estimation can provide upfront WCET bounds for models within the model scope by synthesizing andexploiting witness models. This estimation achieves tighter WCET bounds (compared to other existing methods) by (1)precisely incorporating data flow information during WCET estimation and by (2) excluding unrealistic models thatwould often yield high WCET estimates. However, when a runtime model falls outside all model scopes consideredat design time, no static WCET estimates are available for the program execution. Therefore, we provide an on-linetechnique to rapidly recalculate WCET at runtime by adapting parametric WCET formulae [4, 13] and exploiting someaggregated model statistics.The top part of Figure 7 presents the high-level workflow with design time tasks of obtaining two complementaryWCET estimates. The static WCET estimation relies on objective-guided generation of witness models where theobjective function is derived from the monitoring goal and the low-level timing properties of the monitoring program.The process starts with the synthesis and compilation of the monitoring program (marked with A in Figure 7), and it isfollowed by classic WCET analysis, e.g., using the IPET method ( B ). Subsequently, the results of the WCET analysis areused to compute a model generation objective ( C ) which drives the model generation process ( E ). This objective is tomaximize the function that computes the execution time of the query program over a given instance model.In parallel with these activities, constraints are derived for generating well-formed models in the given domain ( D ).Combining the results from activities C and D , the model generation step ( E ) uses a graph solver to systematicallygenerate the model that maximizes the objective function, i.e., provides a safe and tight estimation of the longest runtime of the query program. As a result, the workflow not only computes a safe, static WCET value , but generates a witness model where the estimated query program run time is used as the WCET estimation. Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 15

Fig. 7. Workflow of WCET estimation for query-based monitors

The proposed on-line WCET estimation, also shown in Figure 7, starts with obtaining the source code and executablefor the query program ( A ), then performing static WCET analysis ( B ). Using the results of ( B ), a parametric WCETformula is derived ( F ) using the algorithm described in [4]. While obtaining this formula happens at design time, theexact WCET bounds are obtained at runtime once the relevant underlying runtime model statistics are known.The bottom part of Figure 7 shows the general outline of a program which employs real-time graph queries. Onceupdates to the runtime model are completed ( G ), the parametric WCET formula (computed at design time in F ) isinstantiated and a safe, on-line WCET bound is obtained ( H ). Computing a formula requires minimal computationaleffort, thus it can be repeatedly recomputed during program execution. The on-line and static WCET bounds, are thensimply compared, and because they are both safe estimates, the lower value is selected as WCET estimate ( I ). If theunderlying runtime model lies outside each model scope for which static WCET estimation is available or violateswell-formedness constraints, the on-line WCET estimation is selected automatically. This value is then used as therequired time window while scheduling of tasks ( J ), and after completing query evaluation and the rest of the tasks,program execution will eventually continue processing model updates ( G ). To characterize the data-dependent execution time of graph query programs, we derive an upper bound function 𝑓 q assigning approximate run times of the query q to model 𝑀 . Formally, RT q ( 𝑀 ) ≤ 𝑓 q ( 𝑀 ) , where RT q ( 𝑀 ) is theexecution time of the query program q on instance model 𝑀 .For each basic block BB 𝑖 of the CFG of the query program q , we construct a graph predicate 𝜓 BB 𝑖 . The free variables 𝑣 , . . . , 𝑣 𝑘 of 𝜓 BB 𝑖 correspond to the program variables within the program scope when BB 𝑖 is executed. When thequery program runs on an input model 𝑀 , each execution of BB 𝑖 corresponds to a match 𝑍 : { 𝑣 , . . . , 𝑣 𝑘 } → O 𝑀 of 𝜓 BB 𝑖 ( 𝑍 ∈ Matches ( 𝜓 BB 𝑖 , 𝑀 ) ), where 𝑍 ( 𝑣 𝑗 ) is the model object referenced by the variable 𝑣 𝑗 upon the execution of BB 𝑖 .To achieve this, we set 𝜓 BB 𝑖 to the conjunction of extend and check constraints in effect on the variables in theprogram scope. Extend operations (evaluated by loops) introduce new free variables, while check operations (evaluatedby if conditions) only restrict the possible binding of existing variables. As we have shown in Section 3.2, search-basedquery plans translate to a series of nested loops and if conditions. Thus, 𝜓 BB 𝑖 is the conjunction of extend constraintsassociated with loops and check constraints associated with if blocks that enclose BB 𝑖 . For enclosing else blocks, thenegation of the check condition is taken instead. Manuscript submitted to ACM BB ∗ 𝑗 of loop headers require special attention, since a loop variable 𝑣 𝑘 can be uninitialized or it mayhave a value from the previous iteration of the loop. Hence, in addition to the predicate 𝜓 BB ∗ 𝑗 with free variables 𝑣 , . . . , 𝑣 𝑘 − , 𝑣 𝑘 , we also introduce a predicate 𝜓 ′ BB ∗ 𝑗 with free variables 𝑣 , . . . , 𝑣 𝑘 − to represent the first execution with 𝑣 𝑘 still uninitialized. Definition 4.1.

The upper bound for the execution time of q on a model 𝑀 can be written using the aforementionedgraph predicates by summing up the worst-case execution times of basic blocks weighted by the number of times eachbasic block is executed as follows: 𝑓 q ( 𝑀 ) = ∑︁ 𝑖 ∈ 𝐷 (cid:0) 𝑇 ( BB 𝑖 ) · | Matches ( 𝜓 BB 𝑖 , 𝑀 )| (cid:1) + ∑︁ 𝑗 ∈ 𝐿 (cid:2) 𝑇 ( BB ∗ 𝑗 ) · (cid:0) | Matches ( 𝜓 ′ BB ∗ 𝑗 , 𝑀 )| + | Matches ( 𝜓 BB ∗ 𝑗 , 𝑀 )| (cid:1)(cid:3) , (1) total of all BB executions exceptloop headerstotal of initial and repeated loopheader executions where • 𝑇 is a function that returns the WCET of a basic block in the CFG; • 𝐷 is the set of the indices of basic blocks that are not loop headers; and • 𝐿 is the set of indices of loops.The function 𝑓 q is a linear function of the match counts of the graph predicates as defined in Section 2.4. Therefore itis not only an upper bound for the execution time of q on a given model 𝑀 , but may also serve as an objective functionin a model generation problem. In Figure 7, the function 𝑓 q is defined in activity C and used in activity E . Example 4.2.

We illustrate the execution time estimation method using the query misalignedTurnout . To constructthe graph predicates 𝜓 BB 𝑖 and 𝜓 ′ BB ∗ 𝑗 for the query program in Listing 3, we have to inspect the query plan in Table 1, itstraceability to the generated code (shown as comments in Listing 3), and the CFG of the code (Figure 5). By tracing eachbasic block to the code lines and to the query constrains, we may obtain 𝜓 BB = , 𝜓 BB = Turnout ( mt ) , 𝜓 BB = , 𝜓 BB = Turnout ( mt ) , 𝜓 BB = Turnout ( mt ) ∧ Straight ( mt , loc ) , 𝜓 BB = Turnout ( mt ) ∧ Straight ( mt , loc ) ∧ ¬ Conntected ( mt , loc ) , 𝜓 BB = Turnout ( mt ) ∧ Straight ( mt , loc ) ∧ ¬ Conntected ( mt , loc ) ∧ OccupiedBy ( loc , 𝑡 ) , 𝜓 BB = Turnout ( mt ) . Since BB is a loop header, we also have 𝜓 ′ BB = BB with thevariable mt yet uninitialized. Therefore we can write the upper bound of the execution time on a model 𝑀 as 𝑓 mt ( 𝑀 ) = ∑︁ 𝑖 = (cid:0) 𝑇 ( BB 𝑖 ) · | Matches ( 𝜓 BB 𝑖 , 𝑀 )| (cid:1) + 𝑇 ( BB ) · | Matches ( 𝜓 ′ BB , 𝑀 )| . In the static WCET analysis step, we compute an upper bound of the execution time of a model query program given aset of constraints (defining the space of well-formed models) and the scope of the analysis at design time (see activity E in Figure 7).Given a theory T and type scopes S , we derive the WCET estimate of a query program q of the set of modelssatisfying T and S by maximizing the upper bound function 𝑓 q . Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 17 𝜑 ct − outDeg = ConnectedTo ( 𝑡, 𝑠 ) ∧ ConnectedTo ( 𝑡, 𝑠 ) ∧ ConnectedTo ( 𝑡, 𝑠 ) ∧ ¬( 𝑠 = 𝑠 ∨ 𝑠 = 𝑠 ∨ 𝑠 = 𝑠 ) ∈ TS ( Segment ) = [ , ] , S (

Turnout ) = [ , ] , S (

Train ) = [ , ] (a) Witness model 𝑀 ∗ for query misalignedTurnout with 12 objectsand satisfying theory T and model scope S (b) Instance model 𝑀 ′ with 12 objects and satisfying theory T but exceedingthe model scope S (more Trains and

Turnouts ) Fig. 8. Illustrating model generation problems for witness models

Definition 4.3.

This yields a static

WCET estimate

WCET s q (T , S) as the computation of this estimate necessitates theuse of a model generator at design time. Formally, we have WCET s q (T , S) = 𝑓 q ( 𝑀 ∗ ) , where 𝑀 ∗ ∈ optimal ( Σ , T , S , 𝑓 q ) , (2)where 𝑀 ∗ is a witness model of the maximum value of 𝑓 q .Therefore, RT q ( 𝑀 ) ≤ WCET s q (T , S) holds for all instance models T , S ⊨ 𝑀 .We include the witness model 𝑀 ∗ (with the used theory T and model scope S ) for illustration in Figure 8(a) whichmaximizes 𝑓 mt and yields the WCET s mt estimate for the query program misalignedTurnout from Listing 3. The theory T used in the generation process contained the multiplicity constraint 𝜑 ct − outDeg that caps the out-degree of connectedTo references at 2.The witness model 𝑀 ∗ can be inspected to study the extreme execution time of the query program q and may aid infurther query optimization. However, 𝑀 ∗ is not necessarily an input where the actual program WCET is exhibited:it may be the case that RT q ( 𝑀 ∗ ) < RT q ( 𝑀 worst ) for some other model T , S ⊨ 𝑀 worst , even though we still have RT q ( 𝑀 worst ) ≤ 𝑓 q ( 𝑀 worst ) ≤ 𝑓 q ( 𝑀 ∗ ) = WCET s q (T , S) . In any case, our static WCET estimate can serve as a safe boundfor execution time.Gradual refinement of the theory T and the scopes S can aid the designer in query program analysis. In particular,if the estimated WCET s q (T , S) is too high, we may extend the set of constraints to T ′ ⊋ T to more precisely specifythe space of well-formed models. Alternatively, if it is not feasible to further extend the theory of well-formednessconstraints T and thus restrict the set of well-formed models, we may opt for constraining the model scope S . Thisproperty is summarized Proposition 4.1. Proposition 4.1.

For a query program q , theories T , T ′ , and model scopes S , S ′ the following inequality holds (seeproof sketch in Appendix A.): WCET s q (T ′ , S ′ ) ≤ WCET s q (T , S) if T ′ ⊇ T and ∀ C 𝑖 ∈ Σ : S ′ ( C 𝑖 ) ⊆ S( C 𝑖 ) . (3) Manuscript submitted to ACM

The primary goal of on-line WCET estimation computed at runtime is to serve as a fallback to cover cases wherethe underlying runtime model lies outside the model scope used for computing static WCET bounds or violateswell-formedness constraints.Our idea is to exploit model statistics collected at runtime, such as (1) the number of nodes that are instances of acertain class or (2) the maximum out-degree of a node w.r.t. a given reference type. As discussed in Section 3.1, thesemodel statistics can be collected and maintained as part of the updates to the runtime model. As such, the currentvalues of model statistics can be used as flow facts for loop bounds when instantiating a WCET formula of a specificquery. The resulting WCET value can be used to reallocate execution time slots and reschedule tasks on-the-fly [15].In Section 3.2, we showed how search-based query plans can be translated to a series of embedded loops andif-conditions. Thus, the CFG of such a program has several cycles. We leverage the algorithm presented in [4] that takesa program CFG and outputs a formula where the parameters are loop bounds, i.e., how many times a cycle in the CFGis executed (see activity F in Figure 7). Definition 4.4. A parametric WCET estimation formula for a graph query program q used to derive WCET bounds atruntime can be defined as follows: WCET o q ( stats ) = ∑︁ 𝑖 ∈ 𝐷 𝑇 ( BB i ) + ∑︁ 𝑗 ∈ 𝐿 (cid:16) 𝑇 ( BB ∗ 𝑗 ) + 𝑙 𝑗 ( stats ) · 𝑇 ( Loop 𝑗 , stats ) (cid:17) (4) 𝑇 ( Loop 𝑗 , stats ) = ∑︁ 𝑘 ∈ 𝐷 𝑗 𝑇 ( BB k ) + ∑︁ 𝑚 ∈ 𝐿 𝑗 (cid:0) 𝑇 ( BB ∗ 𝑚 ) + 𝑙 𝑚 ( stats ) · 𝑇 ( Loop 𝑚 , stats ) (cid:1) (5)In these formulae • stats is the model statistics which corresponds to the model scope of a given concrete model; • 𝑙 𝑘 returns the loop bound of the 𝑘 -th loop for a model statistics ( 𝑙 𝑘 ( stats ) ∈ N ); • 𝑇 is a function that returns the WCET of a basic block or loop in the CFG; • 𝐷 is the set of BB indices that are not contained in any loops but are part of the longest program execution pathin the CFG of the query program q ; • 𝐷 𝑗 ( 𝑗 >

0) is the set of BB indices contained directly in

Loop 𝑗 (i.e., not part of other loops) that are part of thelongest path within the loop; • 𝐿 is the set of loop indices that are not contained in any loop in the CFG of q ; • 𝐿 𝑗 ( 𝑗 >

0) is the set of loop indices contained directly in

Loop 𝑗 ; and • the BB for loop header of Loop 𝑗 is denoted with BB ∗ 𝑗 .Once WCET o q is formulated, it is easy to instantiate it because a multiplication is done for each parameter and then,the timing values are summed up. This computation is simple enough to quickly obtain a new WCET estimate whenmodel statistics are available (activity H in Figure 7). Example 4.5.

Figure 5 shows the CFG built from the mt_matcher function with its corresponding BBs. The linescorresponding to BBs in Listing 3 are shown next to the nodes of the CFG. The WCET formula for the mt_matcher

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 19function built from the CFG shown in Figure 5 is the following:

WCET o mt ( stats ) = 𝑇 ( BB ) + 𝑇 ( BB ) + ∑︁ 𝑗 ∈{ } (cid:16) 𝑇 ( BB ∗ 𝑗 ) + 𝑙 𝑗 ( stats ) · 𝑇 ( Loop 𝑗 ) (cid:17) = 𝑇 ( BB ) + 𝑇 ( BB ) + 𝑇 ( BB ) + 𝑙 ( stats ) · (cid:32) 𝑇 ( BB ) + ∑︁ 𝑘 = 𝑇 ( BB 𝑘 ) (cid:33) Here 𝑇 ( BB 𝑖 ) is the WCET of a basic block BB 𝑖 and the value of 𝑙 ( stats ) is the flow fact for loop bound, which is in thiscase the number of turnouts in a given model.To illustrate the impact of model statistics, we compare the model 𝑀 ∗ presented in Figure 8(a) with model statistics stats 𝑀 ∗ with the one obtained from a synthetic but still well-formed runtime model shapshot 𝑀 ′ shown in Figure 8(b)with model statistics stats 𝑀 ′ . Both models have a total of 12 nodes but their model statistics (i.e., the number of instancesof each class) are different; stats 𝑀 ∗ ( Turnout ) =

3, while for the other model stats 𝑀 ′ ( Turnout ) =

4. For this reason,the query program mt_matcher takes longer time to complete when evaluated over 𝑀 ′ . The query plan starts withiterating over all turnout nodes, so the WCET o mt parameter is 𝑙 ( stats 𝑀 ∗ ) = 𝑀 ∗ , while 𝑙 ( stats 𝑀 ′ ) = 𝑀 ′ . We propose a hybrid estimation method to leverage both the static and the on-line estimates. For models satisfying thetype scopes S taken into account when calculating the static estimate, the lowest of the two estimates is taken (seeactivity I in Figure 7). For models outside of S , we fall back to the on-line estimates. Definition 4.6.

The hybrid WCET estimate of a query q over a well-formed runtime model with statistics stats isformally defined by the function WCET h q ( stats ) =  min { WCET s q (T , S) , WCET o q ( stats )} , if S ⊨ stats , WCET o q ( stats ) , otherwise, (6)where T , S , and the value of WCET s q (T , S) is provided ahead of time.Computing WCET h q only requires the type scope check S ⊨ stats and the computation of the minimum in additionto the evaluation of the WCET o q estimate, both of which can be done in constant time. Thus, there is no significantoverhead compared to the WCET o q estimate. We may avoid checking whether the current state of the runtime modelsatisfies T , since, by assumption, T is chosen such that all possible runtime models are well-formed.The static WCET estimate of a query program for some particular scope of models S may not be tighter than theon-line WCET computed for a model 𝑀 in the scope. It may be the case that WCET o q ( stats 𝑀 ) ≤ WCET s q (T , S) even if T , S ⊨ 𝑀 , especially when stats 𝑀 is much smaller than the stats M ∗ belonging to the witness model 𝑀 ∗ providing thestatic estimate. We observed this to be the case in two out of four experiments which uses a realistic runtime model(see Section 5.3).However, for any fixed (well-formed) model 𝑀 , WCET s q computed by 𝑓 q ( 𝑀 ) is always at least as tight as WCET o q .Compared to the WCET o q estimate, WCET s q may take into account the theory T in addition to the statistics stats 𝑀 ,and 𝑓 q also has access to the whole model 𝑀 . This claim is also confirmed by our experiments (see Section 5.3) andformalized in Proposition 4.2. Manuscript submitted to ACM

Proposition 4.2.

The following inequality holds between execution times and their estimates: RT q ( 𝑀 ) ≤ 𝑓 q ( 𝑀 ) ≤ WCET s q (T , (cid:155) stats 𝑀 ) ≤ WCET o q ( stats 𝑀 ) , (7)where (cid:155) stats 𝑀 ( C 𝑖 ) = [ stats 𝑀 ( C 𝑖 ) , stats 𝑀 ( C 𝑖 )] is the scope corresponding exactly to the model statistics stats 𝑀 , i.e., isthe scope where the lower and upper bound is equal to the number of elements in the model statistics. See proof sketchin Appendix A. We conducted experiments to answer the following research questions related to the WCET of query programs:RQ1 How do measured query execution times over witness models relate to query execution times over randommodels?RQ2 How do static WCET estimates differ from real query execution times over witness models?RQ3 How do static and on-line WCET estimates compare when applied to graph query programs?RQ4 How does query program complexity influence the error of computed WCET bounds by various WCET estimationmethods?

To address these research questions, we use graph queries from the domain of the MoDeS3 CPSdemonstrator [51]. This demonstrator uses high-level runtime monitoring rules captured as graph queries, and showcasessynthesized monitoring programs executing these queries over the runtime graph model of the underlying runningsystem. Our experiments focus only on query evaluation, and updates to the runtime model are out of scope for thecurrent paper. Therefore, we ran the query programs on various snapshots of runtime graph models. We evaluated thefollowing queries adopted from [9]: • Close trains ( ct ): The headway distance needs to be respected on the track, and this query highlights locationswhere two trains are only one free segment away from each other. • End of siding ( eos ): This query finds trains that are dangerously close (one segment distance) to an end of thetrack. • Misaligned turnout ( mt ): This is the query introduced in the running example of Section 2.3. • Train locations ( tl ): A simple query to find pairs of trains and segments that describe the locations of eachtrain.The calculation of query search plans is out of scope of the current paper, but they were created and optimized basedon the typical model statistics of runtime model snapshots in the MoDeS3 system. Although search plans were shownto be highly efficient if they are updated as the properties of the undelying model changes [50], the ones calculated forthe realistic model were used throughout the entire evaluation. For example, the search plan presented in Table 1 is theone used by the program executing the query Misaligned turnout . A query program takes a graph as input and computes the query results over this graph. Weinformally expect that the time needed to compute the complete query result set heavily depends on the structureof the input graph 𝑀 , which, in our case characterised by the function stats 𝑀 . In the following, we describe how weobtained a variety of models to assess the impact of models with different characteristics on query evaluation times. Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 21To obtain first one realistic model , we manually captured a detailed runtime model snapshot of MoDeS3 that is similarto the one presented in Figure 2(b) and counts a total of 25 objects. Then, using the metamodel in the MoDeS3 case study,we generated four witness models for a given model scope such that a chosen query is estimated to have the longestpossible execution time. For all of these models, we employed the same model scope inspired by the railway domain: upto one fifth of the objects can be

Trains and up to one fifth of the objects can be

Turnouts . The rest of the objects are

Segments ; we capped the maximum number of objects at 25. The resulting models are syntactically valid and they canrepresent a realistic railway system thanks to the domain-specific model scope and well-formedness constraints.The second set of models contained a total of

50 randomly generated graphs which do not necessarily representrealistic railway configurations. We used an open-source EMF random model generator for this purpose. Nevertheless,these models still respect the same model scope that was used for generating realistic models (i.e., they all contain up toa total of 25 objects, with up to five Turnouts and five

Trains ) but may violate other constraints of the MoDeS3 domain.

The query programs were executed on a microcontroller that had no other tasks (e.g., interrupts)running. Because these programs take tens of microseconds to execute, we applied the following measurement setup toobtain an estimate of the average program run times:1. First we created an infinite loop and observed the loop execute 𝑛 times under a fixed time duration 𝑇 meas (in ourcase 𝑇 meas = 𝑠 ).2. Then, we added query execution to the loop and repeated it 100 over the same graph, and counted 𝑚 loopexecutions under 𝑇 meas .3. Finally, we used the formula 𝑇 query = ( 𝑚 − 𝑛 ) · 𝜇𝑠 to get the average run times of query evaluations on giveninput models.Repeated measurements of the average program run times obtained this way show negligible variation (in the order of0 . 𝜇𝑠 ) for a given input graph, which is unsurprising since it is the only task running in the system.The bare-metal query programs were compiled with the compile flags recommended by OTAWA (this includes usingGCC compiler for ARM version 7.2.1 20170904 (release) with -O0 and -g3 flags) and they were executed on the InfineonRelax Lite Kit-V1 Board . This board has an XMC4500 F100-K1024 microcontroller and it is driven by a 120MHz systemclock. This microcontroller is considered to be a mature industrial microcontroller and has an ARM Cortex-M4 core.For the present evaluation, the available 4KB instruction cache was not used because the initial evaluation results withcache showed that the WCET estimations provided by OTAWA using the available hardware platform model were lowerthan the measured execution times, which indicates serious issues with estimated WCETs. Not using the instructioncache on the device and removing the cache analysis step from the OTAWA script yielded credible WCET estimations.Besides, the device does not have any data cache. The embedded code used for the experiments as well as compiler andother configurations are available online . As a baseline for our comparison, we computed a (static) WCET for the queryprograms with the IPET method [14] using the OTAWA framework [3] ( owcet tool version V1.2.0).To compute

WCET 𝑜 of query programs, OTAWA derived the program CFG and determined BB execution times fromthe compiled binary and the hardware platform model. Then, this information was used to compute parametric WCETformulae. Due to the lack of available tool support, we used a semi-automated WCET formulae computation by applying https://github.com/atlanmod/mondo-atlzoo-benchmark https://imbur.github.io/cps-query/ Manuscript submitted to ACM l l l ll l l l C l o s e t r a i n s E nd o f s i d i ng M i s a li gned t u r nou t T r a i n l o c a t i on s Query E x e c u t i on T i m e s ( m i c r o s e c ond s ) (a) Measured query execution times over random models (box), wit-ness model (red dot), realistic model (blue dot) C l o s e t r a i n s E nd o f s i d i ng M i s a li gned t u r nou t T r a i n l o c a t i on s Query M ode l v a r i an t (b) Cross-comparison of measured query run times over realistic models(microseconds) Fig. 9. Query execution times on fully random models and realistic models the algorithm described by Ballabriga et al. [4]. The platform model of the microcontroller and the OTAWA script wastaken from a public repository of an external research group specialized in the analysis of embedded systems . The measured query run times over the set of 50 randomly generated models is captured by the boxes in Figure 9(a).Each query was evaluated and timed on all of the random models. Additionally, the respective average query executiontime over each witness model (generated by the graph solver specifically for a given query) is also added to this figurefor comparison (red dots).The heatmap in Figure 9(b) presents the obtained query run times for each query over all witness models generatedby the solver (e.g., Witness model for close trains represents the model that maximizes the estimated run time of thequery

Close trains ) and the realistic model taken from the MoDeS3 system (MoDeS3 snapshot model). The diagonalin this figure shows the measured execution times over models dedicated to maximize the WCET estimation of acorresponding query.

Findings for

RQ1 . The first observation from Figure 9(a) is that execution times on random models do not exceed theexecution time measured on the model generated to maximize the expected execution time of a query. For the almosttrivial query

Train locations it was possible to achieve the longest measured execution time (4.25 𝜇𝑠 ) using randomgenerated models, but not for the other three more complex queries. The biggest relative difference was for Misalignedturnout , where the maximum execution time measured for a random model was 6.44 𝜇𝑠 and the execution time for themodel provided by the graph solver was 7.83 𝜇𝑠 , a 22% increase. This finding supports the usefulness of deriving witnessmodels by graph generation which maximize the estimated WCET.Another important observation is that execution time is highly sensitive to the structure and statistics of the runtimemodel. For example, the query Misaligned turnout evaluated over its witness model takes 1.70× longer to complete https://github.com/uastw-sat/ARMv7t-WCET-AnalysisManuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 23compared to the average execution time over random models, while it takes 2.67× longer compared to the shortestmeasured execution time over random models. In this case, query execution starts with objects of type Turnout ; severalrandom models have less than the maximum amount of

Turnout objects allowed by the model scope.Moreover, we also compare the query execution times over the realistic MoDeS3 runtime model; see the bottom rowof Figure 9(b). Here, the results (blue dots) fall within the range of execution times measured over random models. Infact, the relative deviation from the mean is at most 17%.Finally, we notice that query trainLocations , which is a trivial query, exhibits the same execution time for eachwitness model. This query simply iterates over

Trains , and for all other queries there is the same number of

Train objects in each generated model.

Findings for

RQ2 . As a baseline for comparison, we use the execution times presented in the diagonal of the heatmapin Figure 9(b) (i.e., the maximum measured execution times for a query) to approximate the tightness of the static WCETestimate. Since

WCET s ( 𝑀 ∗ ) provides a safe overestimation, the actual WCET of the query program needs to be lowerthan this value and greater than or equal to the longest measured execution time. We have the following findings: • WCET s ct ( 𝑀 ∗ ct ) = . 𝜇𝑠 — up to 32% overestimation of actual WCET • WCET s eos ( 𝑀 ∗ eos ) = . 𝜇𝑠 — up to 28% overestimation of actual WCET • WCET s mt ( 𝑀 ∗ mt ) = . 𝜇𝑠 — up to 47% overestimation of actual WCET • WCET s tl ( 𝑀 ∗ tl ) = . 𝜇𝑠 — up to 40% overestimation of actual WCETThe measured run times over the respective witness models are always below the static WCET estimations by28%–47%, which means that for each query, the precision of the estimation of the actual WCET over well-formedmodels of the selected queries is tighter than or equal to these percentages. These numbers show promising first resultsproduced by the approach given that no former WCET estimation methods target this class of data-driven programs. Summary . Query execution times show a great variation for different models with the same number of objects. Assuch, the impact of the graph structure on query execution time dominates the impact of sheer model size. For eachquery, longest execution times were measured over the auto-generated witness models of the query. Our static WCETestimation technique overestimates the actual WCET by a maximum of 28%–47%.

First, query programs were analyzed using the Eclipse plugin for OTAWA to visualize the CFG and obtain the worst-casetiming properties of BBs. Additionally, we computed a safe WCET estimation for each query using the IPET plugin forOTAWA with values shown in column

IPET in Table 2. Since the real model size and metrics are not known at designtime, the static WCET estimates by IPET are applicable only to a given model scope. Once the model scope was selected,loop bounds were provided based on the maximum number of object instances allowed by the model scope.Second, we incorporated the estimated worst-case BB times into the objective function used in the model generationstep. The final maximum values provided by the solver are shown in the

Graph solver column. To see the potentialbenefit of solver-based static WCET computation, the difference from the IPET-based WCET is shown in parentheses.Finally, due to the lack of mature tool support, we semi-automatically derived parametric WCET formulae [4] for thefour queries from the CFG and BB timing properties computed by OTAWA. Then, we instantiated these formulas withthe model statistics of the generated witness models (column

Witness model ) and added the difference to the baseline

Manuscript submitted to ACM

Table 2. Comparison of safe WCET estimates over well-formed models with the same model scope (up to five trains, up to fiveturnouts, and up to a total of 25 objects)

Query CC Static estimate (

WCET s ) On-line estimate ( WCET o )IPET Witness generator Witness model Realistic model Close trains 𝜇𝑠 𝜇𝑠 (-15%) 35.67 𝜇𝑠 (+3%) 28.68 𝜇𝑠 (-17%) End of siding

10 24.92 𝜇𝑠 𝜇𝑠 (-21%) 25.58 𝜇𝑠 (+1%) 20.61 𝜇𝑠 (-17%) Misaligned t. 𝜇𝑠 𝜇𝑠 (+1%) 11.58 𝜇𝑠 (+1%) 11.58 𝜇𝑠 (+1%) Train locations 𝜇𝑠 𝜇𝑠 (+3%) 5.96 𝜇𝑠 (+3%) 4.91 𝜇𝑠 (-15%)IPET estimate. Similarly, we used the statistics of the realistic MoDeS3 runtime model to compare the on-line WCETestimate with the one produced by the IPET method. These results are shown in the column Realistic model . Findings for

RQ3 . Static WCET estimations provided by the graph solver for

Misaligned turnout and

Train locations queries are higher than the ones obtained by the IPET method by a very thin margin (1% and 3% more, respectively).However, for the other two

Close trains and

End of siding queries, the provided estimates are significantly lower thanthe results produced by IPET (15% and 21% less). The explanation is the methods synthesized from these latter twoqueries have higher program complexity, thus the precise count of BB executions performed by the model generatoroutweighs the benefits of the IPET method which relies on condensed model statistics provided as flow facts. Overall,this shows the strength of the proposed solver-based WCET estimation method.The parametric (online) WCET formulae of monitor executions provide slightly higher estimates for each queryinvestigated on witness models when compared to (static) IPET estimates. However, the rapidly recomputable formulaprovides 15%–17% better estimates in three out of the four cases over the MoDeS3 snapshot realistic model. In case of

Misaligned turnout it overestimates the time provided by IPET by only 1%. The reason behind these differences is theruntime model statistics for the MoDeS3 snapshot model has one train less than the maximum number allowed by themodel scope, which is a key factor in the formulae of

Close trains , End of siding , and

Train locations , while the formulafor

Misaligned turnout does not depend on this number.

Findings for

RQ4 . On the one hand, static WCET estimates provided using auto-generated witness models haveincreased precision for more complex queries (complexity measured in cyclomatic complexity – column CC in Table 2).A possible explanation for this lies in how the solver evaluates the objective function (i.e., computes the WCET estimate)over a given graph. Rather than merely relying on program loop bounds, the solver is able to better estimate the exactnumber of executions of individual BBs while evaluating a query over a given graph.On the other hand, the computational complexity of queries does not impact the precision of on-line WCETestimations over witness models compared to the estimation obtained by IPET. This result shows the scalability of theon-line WCET computation approach w.r.t. program complexity. Summary . Static WCET bounds computed by a graph solver can provide significant improvements for complexquery-based monitoring programs compared to results obtained from traditional methods. Such static estimatescomplemented with on-line WCET estimates based on runtime model statistics provide safe and tight execution timebounds. The improvement of WCET bounds of inspected query programs is 13% on average when compared to valuesobtained by IPET.

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 25

Computed WCET values presented in this section are reasonable w.r.t. to the measured longest execution times.However, the platform model of the microprocessor may not be completely accurate, which can result in impreciseWCET. Furthermore, the evaluation platform only ran the monitoring programs; no other tasks were running on thesame device. Finally, the algorithm for obtaining the parametric WCET formula [4] supports contextual information forrefining BB block timings (e.g., the effect of processor pipeline), but our formulae did not use this. For this reason, theformulae we used might provide less tight estimates, but the computed WCET bounds are still safe.In addition to the hardware-specific considerations, evaluation of the WCET estimation techniques using additionalcase-studies with query-based runtime monitors from different domains could further improve the confidence in theevaluation results.

Numerous static and probabilistic WCET analysis methods and surveys exist [32, 52] to provide an extensive summaryof their capabilities, while the Abella et al. [2] focus on the comparison of the most common WCET estimation forprograms in real-time systems. Based on the categorization of approaches of this latter work, our approach is a high-levelstatic deterministic timing analysis which provides safe execution bounds for embedded programs executing complexgraph queries.Graph models and queries have been often used in the UML-based design of real-time systems [12, 22, 31]. However,in this work, these techniques are applied at runtime for monitoring purposes in real-time systems, for which only afew related papers exist. We provide an overview of related work below that focuses on topics relevant to query-basedmonitors.

Dynamic memory allocation in embedded systems.

A well-known challenge in programs with dynamic memoryneeds is the ability to precisely predict the behavior of the memory allocator [53]. In general, allocators do not provideguarantees about the memory addresses reserved at runtime. This makes low-level WCET analysis problematic becausethere is no information about what cache sets will the newly allocated memory belong to. This way every time adynamically allocated memory is accessed, the analyzer needs to assume that the all contents of the cache is invalidated.A further issue with using dynamic memory allocation is that the allocator itself is using some internal data structurefor tracking in-use memory blocks. This way, whenever an allocation is initiated, the access to these internal datastructures pollutes the cache.A solution in [53] to the nondeterminism of memory allocators is to use deterministic ones instead [27, 35]. Suchdeterministic allocators are able to provide guarantees regarding the placement of the allocated memory blocks andthey serve allocations in 𝑂 ( ) time, but the memory tends to be more fragmented compared to traditional allocators,which may result in poor memory utilization.Another approach to circumvent the limitations of dynamic allocation is to a priori compute the memory usage ofthe application [26]. The idea is to allocate memory in advance that the program will need at runtime. This methodoptimizes the reserved memory size by reusing some data structures multiple times for different purposes at runtime.In this case, however, detailed information is required about the memory needs of the program, which is not alwaysavailable. Manuscript submitted to ACM

Hard real-time monitors in embedded systems.

The concept of predictable monitoring was introduced in [54] wherestatic scheduling techniques were used to show that a monitor fits its allocated timeframe. However, this work assumesthat the execution times of the monitoring programs and known, which is not true in case of our work. The TemporalRover [19] opts for generating monitoring code from temporal logic with low overhead. The verification of propertiesis done mostly on a remote host, only basic sub-formulae evaluation is done on a device and results are communicatedto the host. Synchronous component execution and observable program states are the main assumptions made in [39]to support sampling-based monitoring of real-time systems. This work follows a standard model of hard real-timescheduling, where monitors are a collection of recurring tasks that obtain inputs and compute output in a priori-boundedamount of time, which is different from our presented dynamically changing time windows for query-based monitors.

Real-time database queries.

In real-time databases [37], access to data has strict time constraints. The work in [29]presents a data sampling-based statistical method to evaluate aggregate queries in a database. There is a trade-offbetween time available for query execution and the precision of the estimate. Such estimations would not be acceptablein a monitoring setting where precise query results are expected. The real-time object-oriented database RODAIN [46],which targets telecommunication applications, does not support hard real-time transaction (i.e., query) types, because itis considered too costly for the target domain. However, our objective is exactly to provide such guarantees over graphmodels to support hard real-time applications.

WCET of graph model-based computations.

One of the few related works that investigates real-time properties ofgraph-based techniques is [48]. This work evaluates the applicability of story diagrams to recognize hazardous situationsin real-time systems. Story diagrams are especially suitable for event-driven data transformations, which differs fromour data-driven approach. Another important assumption in this work is the limited model size on which story diagramsare applied. This differs significantly from our setup, because we allow the model not to be bounded at design time.

In this paper, we presented a method to provide safe and practical WCET bounds for runtime monitoring programs derivedfrom graph queries to enable their use in real-time systems. On the one hand, we provide a static WCET estimate byexploiting automatically generated witness models (using an advanced graph solver) which maximize the executiontime taken by the query-based monitor to complete. On the other hand, we combine state-of-the-art parametric WCETcomputation with runtime graph statistics to allow online WCET recomputation at runtime upon relevant modelchanges to enable to reallocate time slots to a tighter bound.We carried out extensive evaluation of our approach on an industry-grade hardware platform using a variety of graphmodels as inputs for query programs, and assessed the tightness of computed WCET using three different algorithms.We managed to construct witness models for highest estimated execution times of queries as well as random graphmodels as inputs for graph query programs as an attempt to showcase high execution times. While we have no formalguarantee that worst-case timing behavior is exhibited on witness models as inputs, in all our experiments, the longestexecution times were always measured on such witness models.Our results showed that our proposed WCET estimation approach often provides tighter estimates than the classicIPET method. Moreover, the actual structure of the underlying graph has a major impact on query execution time (weexperienced a 3.08×multiplier between shortest and longest measured execution in an extreme case) and it is a moreimportant factor than the sheer model size.

Manuscript submitted to ACM orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 27Since our current technique relies on fixed worst-case timings for basic program blocks to yield a safe but conservativeestimate, our future work aims to improve on tightness by incorporating context-dependent basic block execution times.Moreover, the assessment of benefits of the on-line computed WCET in scheduling is subject to future investigations.As a part of a long-term future research agenda, the presented WCET estimation approach could be extended toa distributed setting. In addition to the characteristics of the graph model, query evaluation costs need to take intoaccount network latency and data allocation.

REFERENCES [1] Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Autonomous Cars for Feature InteractionFailures Using Many-Objective Search. In . 143–154.[2] Jaume Abella et al. 2015. WCET analysis methods: Pitfalls and challenges on their trustworthiness. (2015), 39–48.[3] Clément Ballabriga, Hugues Cassé, Christine Rochange, and Pascal Sainrat. 2010. OTAWA: An Open Toolbox for Adaptive WCET Analysis. In

LNCS .Vol. 6399.[4] Clément Ballabriga, Julien Forget, and Giuseppe Lipari. 2017. Symbolic WCET computation.

ACM Trans. Embedded Comput. Syst.

17, 2 (2017).[5] Ezio Bartocci et al. 2018. Specification-Based Monitoring of Cyber-Physical Systems: A Survey on Theory, Tools and Applications. In

Lectures onRuntime Verification . 135–175.[6] Gábor Bergmann, Zoltán Ujhelyi, István Ráth, and Dániel Varró. 2011. A Graph Query Language for EMF Models. In

Theory and Practice of ModelTransformations - 4th International Conference . 167–182.[7] Gordon S. Blair, Nelly Bencomo, and Robert B. France. 2009. [email protected].

IEEE Computer

42, 10 (2009), 22–27.[8] Erwan Brottier, Franck Fleurey, Jim Steel, Benoit Baudry, and Yves Le Traon. 2006. Metamodel-based test generation for model transformations: analgorithm and a tool. In . 85–94.[9] Márton Búr, Gábor Szilágyi, András Vörös, and Dániel Varró. 2018. Distributed graph queries for runtime monitoring of cyber-physical systems. In

LNCS . Vol. 10802. 111–128.[10] Márton Búr, Gábor Szilágyi, András Vörös, and Dániel Varró. 2019. Distributed graph queries over [email protected] for runtime monitoring ofcyber-physical systems. In

Int. J. Software Tools Technol. Trans.

ACM/IEEE 22nd International Conference on ModelDriven Engineering Languages and Systems . 233–238.[12] Sven Burmester, Holger Giese, Martin Hirsch, and Daniela Schilling. 2004. Incremental design and formal verification with UML/RT in the FUJABAreal-time tool suite. In

Proceedings of the International Workshop on Specification and Validation of UML Models for Real Time and Embedded Systems,SVERTS2004, Satellite Event of the 7th International Conference on the Unified Modeling Language, UML . Citeseer.[13] Stefan Bygde, Andreas Ermedahl, and Björn Lisper. 2011. An efficient algorithm for parametric WCET calculation.

J. Syst. Archit.

57, 6 (2011),614–624.[14] Hugues Cassé and Pascal Sainrat. 2006. OTAWA, a framework for experimenting WCET computations.

January (2006), 1–8.[15] H. Chetto, M. Silly, and T. Bouchentouf. 1990. Dynamic scheduling of real-time tasks under precedence constraints.

Real-Time Systems

2, 3 (1990),181–194.[16] Kong-Rim Choi and Kyung-Chang Kim. 1996. T*-tree: a main memory database index structure for real time applications. In . 81–88.[17] István Dávid, István Ráth, and Dániel Varró. 2018. Foundations for Streaming Model Transformations by Complex Event Processing.

Softw. Syst.Model.

17, 1 (2018), 135–162.[18] Wei Dou, Domenico Bianculli, and Lionel Briand. 2018. Model-Driven Trace Diagnostics for Pattern-Based Temporal Specifications. In . 278–288.[19] Doron Drusinsky. 2000. The temporal rover and the ATG rover.

Lecture Notes in Computer Science (including subseries Lecture Notes in ArtificialIntelligence and Lecture Notes in Bioinformatics)

Proceedings. 2004 FirstInternational Workshop on Model, Design and Validation, 2004.

AAAI FS

Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (2003), 38–47. https://doi.org/10.1145/940071.940078[23] Thomas Hartmann, François Fouquet, Assaad Moawad, Romain Rouvoy, and Yves Le Traon. 2019. GREYCAT: Efficient what-if analytics for data inmotion at scale.

Information Systems

83 (2019), 101–117.[24] Klaus Havelund. 2015. Rule-based runtime verification revisited.

Int. J. Software Tools Technol. Trans.

17, 2 (2015), 143–170.Manuscript submitted to ACM [25] Klaus Havelund and Grigore Rosu. 2002. Synthesizing Monitors for Safety Properties. In

LNCS . Vol. 2280. 342–356.[26] Jörg Herter and Jan Reineke. 2009. Making Dynamic Memory Allocation Static to Support WCET Analysis. In .[27] Jörg Herter, Jan Reineke, and Reinhard Wilhelm. 2008. CAMA: Cache-aware memory allocation for WCET analysis. In

Work-In-Progress Session ofthe 20th Euromicro Conference on Real-Time Systems .[28] Ákos Horváth, Gergely Varró, and Dániel Varró. 2007. Generic search plans for matching advanced graph patterns.

Electronic Communications ofthe EASST

SIGMOD Rec. (1989), 10.[30] Muhammad Zohaib Iqbal, Shaukat Ali, Tao Yue, and Lionel Briand. 2015. Applying UML/MARTE on industrial projects: challenges, experiences,and guidelines.

Softw. Syst. Model.

14, 4 (2015), 1367–1385.[31] Jan Jürjens. 2003. Developing safety-critical systems with UML.

Lecture Notes in Computer Science (including subseries Lecture Notes in ArtificialIntelligence and Lecture Notes in Bioinformatics)

Programming and Computer Software

42, 1 (2016), 41–48.[33] Y.-T.S. Li and Sharad Malik. 1997. Performance analysis of embedded software using implicit path enumeration.

IEEE T. Comput. Aid. D.

16, 12(1997), 1477–1487.[34] Kristóf Marussy, Oszkár Semeráth, and Dániel Varró. 2018. Incremental view model synchronization using partial models.

Proceedings - 21stACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2018 (2018), 323–333.[35] Miguel Masmano, Ismael Ripoll, Alfons Crespo, and Jorge Real. 2004. TLSF: A new dynamic memory allocator for real-time systems. In . 79–88.[36] Joseph D McDonald and Francis T Durso. 2015. A behavioral intervention for reducing postcompletion errors in a safety-critical system.

Humanfactors

57, 6 (2015), 917–929.[37] Gultekin Ozsoyoglu and Richard T Snodgrass. 1995. Temporal and real-time databases: A survey.

IEEE Trans. Knowl. Data Eng.

7, 4 (1995).[38] Christian Pek, Stefanie Manzinger, Markus Koschi, and Matthias Althoff. 2020. Using online verification to prevent autonomous vehicles fromcausing accidents.

Nature Machine Intelligence

2, 9 (sep 2020), 518–528. https://doi.org/10.1038/s42256-020-0225-y[39] Lee Pike, Alwyn Goodloe, Robin Morisset, and Sebastian Niller. 2010. Copilot: A Hard Real-Time Runtime Monitor. In

LNCS . Vol. 6418. 345–359.[40] Leanna Rierson. 2017.

Developing Safety-Critical Software . CRC Press. 22–27 pages.[41] Oszkár Semeráth, Rebeka Farkas, Gábor Bergmann, and Dániel Varró. 2020. Diversity of graph models and graph generators in mutation testing.

International Journal on Software Tools for Technology Transfer

22, 1 (2020), 57–78.[42] Oszkár Semeráth, András Szabolcs Nagy, and Dániel Varró. 2018. A graph solver for the automated generation of consistent domain-specific models.In . 969–980.[43] Oszkár Semeráth and Dániel Varró. 2017. Graph Constraint Evaluation over Partial Models by Constraint Rewriting. In

ICMT 2017 . 138–154.[44] Oszkár Semeráth, András Vörös, and Dániel Varró. 2016. Iterative and incremental model generation by logic solvers. In

International Conference onFundamental Approaches to Software Engineering . Springer, 87–103.[45] Michael Szvetits and Uwe Zdun. 2013. Systematic literature review of the objectives, techniques, kinds, and architectures of models at runtime.

Software & Systems Modeling

15, 1 (2013), 31–69.[46] Juha Taina and Kimmo Raatikainen. 1996. Rodain: A real-time object-oriented database system for telecommunications.

International Conference onInformation and Knowledge Management, Proceedings

Part F129290 (1996), 10–14.[47] The Eclipse Project [n.d.].

Eclipse Modeling Framework . The Eclipse Project. eclipse.org/emf.[48] Matthias Tichy, Holger Giese, and Andreas Seibel. 2006. Story diagrams in real-time software. In

Proc. of the 4th International Fujaba Days .[49] Dániel Varró, Oszkár Semeráth, Gábor Szárnyas, and Ákos Horváth. 2018.

Towards the Automated Generation of Consistent, Diverse, Scalable andRealistic Graph Models . Number 10800.[50] Gergely Varró, Frederik Deckwerth, Martin Wieber, and Andy Schürr. 2015. An algorithm for generating model-sensitive search plans for patternmatching on EMF models.

Software & Systems Modeling (2015), 597–621.[51] András Vörös et al. 2018. MoDeS3: Model-Based Demonstrator for Smart and Safe Cyber-Physical Systems. In

NASA Formal Methods . 460–467.[52] Reinhard Wilhelm et al. 2008. The Determination of Worst-Case Execution Times: Overview of the Methods and Survey of Tools.

ACM Trans.Embedded Comput. Syst.

7, 3 (2008), 36:1–36:53.[53] Reinhard Wilhelm et al. 2010. Static Timing Analysis for Hard Real-Time Systems. In

Verification, Model Checking, and Abstract Interpretation .[54] Haitao Zhu, Matthew B. Dwyer, and Steve Goddard. 2009. Predictable runtime monitoring.

Proceedings - Euromicro Conference on Real-Time Systems orst-Case Execution Time Calculation for Query-Based Monitors by Witness Generation 29

A PROOF SKETCHES

Proposition Proposition 4.1.

For a query program q , theories T , T ′ , and model scopes S , S ′ the following inequalityholds: If T ′ ⊇ T and ∀ C 𝑖 ∈ Σ : S ′ ( C 𝑖 ) ⊆ S( C 𝑖 ) then WCET s q (T ′ , S ′ ) ≤ WCET s q (T , S) .Proof. (Sketch.) Let M = { 𝑀 : T , S ⊨ 𝑀 } be the set of well-formed (WF) models in the model scope. It is sufficientto show that M ′ = { 𝑀 : T ′ , S ′ ⊨ 𝑀 } ⊆ M since the witness model 𝑀 ∗ ∈ M provides the longes estimated execution.Therefore, we have to consider the following two cases: (1) T ′ ⊋ T , S ′ = S and (2) T ′ = T , S ′ ⊊ S

1. Assume T ′ ⊋ T , S ′ = S , i.e., there is at least one additional WF constraint added to the theory of WF constraints,but the scope remains the same. The addition of a new WF constraint cannot invalidate existing constraints, i.e., ∀ 𝑀 : T ′ , S ⊨ 𝑀 → T , S ⊨ 𝑀 .2. Assume T ′ = T , S ′ ⊊ S , i.e., there is at least one C 𝑖 ∈ Σ such that S ′ ( C 𝑖 ) ⊊ S( C 𝑖 ) and the theory ofWF constraints remains the same. For the witness model T , S ⊨ 𝑀 ∗ , it is true that stats 𝑀 ∗ ( C 𝑖 ) ∈ S( C 𝑖 ) . If stats 𝑀 ∗ ( C 𝑖 ) ∉ S ′ ( C 𝑖 ) , the witness model T , S ′ ⊨ 𝑀 ∗∗ need to yield a lower WCET estimate, otherwise it wouldhave been included in the optimal solution using T , S . □ Proposition Proposition 4.2.

The following inequality holds between execution times and their estimates: RT q ( 𝑀 ) ≤ 𝑓 q ( 𝑀 ) ≤ WCET s q (T , (cid:155) stats 𝑀 ) ≤ WCET o q ( stats 𝑀 ) ,where (cid:155) stats 𝑀 ( C 𝑖 ) = [ stats 𝑀 ( C 𝑖 ) , stats 𝑀 ( C 𝑖 )] is the scope corresponding exactly to the model statistics stats 𝑀 .Proof. (Sketch.) We show that the following three inequalities hold:1. RT q ( 𝑀 ) ≤ 𝑓 q ( 𝑀 ) 𝑓 q ( 𝑀 ) ≤ WCET s q (T , (cid:155) stats 𝑀 ) WCET s q (T , (cid:155) stats 𝑀 ) ≤ WCET o q ( stats 𝑀 )

1. The function 𝑓 q precisely counts the BB executions of the query program q over model 𝑀 , and multiplies thisnumber by the execution time of the BB. Furthermore, we use the longest possible estimated execution times of BBswhen defining 𝑓 q . Therefore, RT q ( 𝑀 ) ≤ 𝑓 q ( 𝑀 ) holds.2. The definition of WCET s q is to compute the value of 𝑓 q for the witness model 𝑀 ∗ , which maximizes the valuereturned by this function. This means that for any model with the same statistics as 𝑀 , 𝑓 q ( 𝑀 ) ≤ WCET s q (T , (cid:155) stats 𝑀 ) holds.3. The formula which defines WCET o q ( stats 𝑀 ) sums BB execution times based on program control flow, and stats 𝑀 provides the flow facts for setting maximum loop bounds. These flow facts inherently overestimate execution counts incases where there is a variation in actual loop repetitions since it will assume the number of maximum repetitions. Onthe contrary, WCET s q (T , (cid:155) stats 𝑀 ) will precisely count how many times a BB is executed if a query is evaluated overmodel 𝑀 . This proves that the inequality WCET s q (T , (cid:155) stats 𝑀 ) ≤ WCET o q ( stats 𝑀 ) holds. □□