[PDF] Who is to Blame? Runtime Verification of Distributed Objects with Active Monitors

Abstract

Since distributed software systems are ubiquitous, their correct functioning is crucially important. Static verification is possible in principle, but requires high expertise and effort which is not feasible in many eco-systems. Runtime verification can serve as a lean alternative, where monitoring mechanisms are automatically generated from property specifications, to check compliance at runtime. This paper contributes a practical solution for powerful and flexible runtime verification of distributed, object-oriented applications, via a combination of the runtime verification tool Larva and the active object framework ProActive. Even if Larva supports in itself only the generation of local, sequential monitors, we empower Larva for distributed monitoring by connecting monitors with active objects, turning them into active, communicating monitors. We discuss how this allows for a variety of monitoring architectures. Further, we show how property specifications, and thereby the generated monitors, provide a model that splits the blame between the local object and its environment. While Larva itself focuses on monitoring of control-oriented properties, we use the Larva front-end StaRVOOrS to also capture data-oriented (pre/post) properties in the distributed monitoring. We demonstrate this approach to distributed runtime verification with a case study, a distributed key/value store.

Full PDF

DD. Ancona and G. Pace (Eds.): Veriﬁcationof Objects at RunTime EXecution 2018 (VORTEX 2018)EPTCS 302, 2019, pp. 32–46, doi:10.4204/EPTCS.302.3 c (cid:13)

W. Ahrendt, L. Henrio & W. OortwijnThis work is licensed under theCreative Commons Attribution License.

Who is to Blame? – Runtime Veriﬁcation ofDistributed Objects with Active Monitors

Wolfgang Ahrendt

Chalmers University of TechnologyGothenburg, Sweden [email protected]

Ludovic Henrio

Univ Lyon, EnsL, UCBL, CNRS, Inria, LIPLyon, France [email protected]

Wytse Oortwijn

University of TwenteEnschede, the Netherlands [email protected]

Since distributed software systems are ubiquitous, their correct functioning is crucially important.Static veriﬁcation is possible in principle, but requires high expertise and effort which is not fea-sible in many eco-systems. Runtime veriﬁcation can serve as a lean alternative, where monitoringmechanisms are automatically generated from property speciﬁcations, to check compliance at run-time. This paper contributes a practical solution for powerful and ﬂexible runtime veriﬁcation ofdistributed, object-oriented applications, via a combination of the runtime veriﬁcation tool L

ARVA and the active object framework P RO A CTIVE . Even if L

ARVA supports in itself only the generationof local, sequential monitors, we empower L

ARVA for distributed monitoring by connecting mon-itors with active objects, turning them into active, communicating monitors. We discuss how thisallows for a variety of monitoring architectures. Further, we show how property speciﬁcations, andthereby the generated monitors, provide a model that splits the blame between the local object and itsenvironment. While L

ARVA itself focuses on monitoring of control-oriented properties, we use theL

ARVA front-end S TA RVOO R S to also capture data-oriented (pre/post) properties in the distributedmonitoring. We demonstrate this approach to distributed runtime veriﬁcation with a case study, adistributed key/value store.

The days of stand-alone software applications are largely over.

Cloud solutions and mobile applications are prominent instances of a general development towards ever more distributed computing. Distributedsoftware is already ubiquitous, and will only grow from here. At the same time, the overwhelmingcombinatorial complexity of possible interactions and interleavings makes distributed software systemsparticularly prone to unforeseen, unintended behaviour of multiple criticality. This makes validationefforts even more important than in the stand-alone case. Distributed computational scenarios poseenormous challenges to analysis and veriﬁcation, however. There exist many approaches in the literature,partly supported by tools. But in general, sufﬁciently powerful static veriﬁcation approaches tend to bevery heavy.There is a recent trend towards more lightweight formal methods, which are easier to exploit but givelimited guarantees. One of them is runtime veriﬁcation , which combines full precision of the executionmodel (even including the real deployment environment) with full automation. On the other hand, it onlyever judges the observed runs, and cannot judge alternative and future runs. Another challenge in runtimeveriﬁcation is the computational overhead of monitoring the running system which can be prohibitive incertain settings. . Ahrendt, L. Henrio & W. Oortwijn practical solutions for ﬂexible runtime veriﬁcation of distributed, object-orientedapplications. Contemporary runtime veriﬁcation approaches allow users to specify properties on a highlevel, and hide the details of how the actual monitoring is performed. The active object design patternallows users to program distributed nodes by writing seemingly sequential code, and hide the details ofhow the proper communication and coordination between different machines is performed. By combin-ing these two principles, we achieve high-level monitor descriptions and high-level distributed program-ming at once. Another aim is to allow for a variety of monitoring architectures in a natural manner, suchthat the user can tailor the monitoring architecture to the characteristics of the monitored application andof the underlying network. We achieve this not only by monitoring active object applications , but alsoby using the active object paradigm in the monitors themselves . Another contribution is the integrationof blame-shifting into the monitoring, in the spirit of assume-guarantee reasoning. The speciﬁcation ofa node states for every failure whether it is blamed on the monitored node or on its environment. Theimplementation of the node has to ensure the absence of any failure that is blamed on the node, underthe assumption that no failure occurs which is blamed on the node’s environment. This supports thelocalisation of failure while limiting the communication load in the monitoring.Concretely, we made a connection between the runtime veriﬁcation tool L

ARVA [11] and the activeobject framework P RO A CTIVE [1]. L

ARVA (Logical Automata for Runtime Veriﬁcation and Analysis)is a tool for monitoring the execution and verifying at runtime the correct behaviour of programs, but itis only adapted to a sequential setting. It could be used to monitor a distributed application but only froma centralised point of view [10], limiting the scalability of the approach. In this work we investigate theusage of L

ARVA to perform distributed monitoring of applications. L

ARVA generates a set of monitorsfor several entities of the observed system and we use P RO A CTIVE to coordinate the different monitors.P RO A CTIVE is an active object library that integrates well with L

ARVA because it has no speciﬁc syn-tax for parallelism and distribution: coordination between active objects is performed automatically bythe middleware while the programmer only writes standard (sequential) Java code. This way standardL

ARVA monitors can be generated and P RO A CTIVE coordinates them in a natural manner. Further, whileL

ARVA itself focuses on monitoring of control-oriented properties, we use the S TA RVOO R S (S TA tic andRuntime Veriﬁcation of Object-O R iented Software) front-end for L ARVA [8], to also support the moni-toring of data-oriented (pre/post) properties.The paper is structured as follows. Section 2 gives prerequisites on L

ARVA and P RO A CTIVE . Sec-tion 3 discusses the connection of L

ARVA with P RO A CTIVE to enable distributed monitoring. In par-ticular, our solution allows L

ARVA monitors to be active objects and enables a variety of monitoringconﬁgurations. The notion of blame-shifting and the monitoring of data-oriented properties are also dis-cussed. Section 4 applies our approach on a case study: a distributed key/value store. Section 5 comparesour work to related research and Section 6 concludes.

Rather than writing our tooling from scratch, we combined the L

ARVA runtime veriﬁer with the P RO A C - TIVE programming platform for writing distributed Java applications. This section introduces bothL

ARVA and P RO A CTIVE and outlines some of their aspects that are relevant for the remainder of thepaper.4

Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors public class Bank { public bool login ( String code ) { · · · } public void logout ( String code ) { · · · } public void withdraw ( String code , double m ) { · · · } } (a) The Java implementation of a small banking system. out bad in login (cid:14) ¬\ result (cid:14) login (cid:14) \ result (cid:14) login (cid:14) true (cid:14) logout (cid:14) true (cid:14) withdraw (cid:14) true (cid:14) logout (cid:14) true (cid:14) get (cid:14) true (cid:14) (b) The DATE property that is used for runtime veriﬁcation.

Figure 1: An example of runtime veriﬁcation with L

ARVA . L ARVA L ARVA [11] is a runtime veriﬁcation platform that allows to verify control-ﬂow properties of Javaprograms, written in an automata-based speciﬁcation language called

DATE ( D ynamic A utomata with T imers and E vents). L ARVA generates runtime monitors as Java code out of the automata descriptionsof the input properties, and links these monitors to the Java system by using A

SPECT

J.We illustrate the use of L

ARVA with a small example, a banking system with access control, whereusers may login, logout, and withdraw money. Figure 1 sketches a

Bank class (1a), and the

DATE property (1b) that users must ﬁrst login before being able to logout or withdraw, and logged-in usersshould not log in again.The

DATE speciﬁcation in Figure 1b consists of three states: (i) the initial state out , describing thelogged-out state of the program; (ii) the normal state in that models the logged-in program state; and (iii)the bad state bad that models the erroneous state as result of runtime violations. Moreover, DATE transitions are labelled by a triple of the form e (cid:14) c (cid:14) a , where: e is an event that is connected to methodinvocations or executions in the program and triggers the transition; c is a condition that must hold forthe transition to be taken; and a is an action , a code snippet that is executed when the transition is taken.In general, the condition- and action components of transitions may access or update global variables.In fact, since the L ARVA compiler translates

DATE speciﬁcations to Java, the c and a components maycontain arbitrary Java snippets.The transitions in Figure 1b use three different events, namely login , logout and withdraw , whichcorrespond to invoking the corresponding methods in the implementation. The \ result placeholder inthe login transitions is bound to the return value of login , so the self-loop in out is taken if the login inthe program appeared unsuccessful. The other transitions have true as their condition; these transitionsare taken unconditionally when triggered. Furthermore, none of the transitions have an associated action, . Ahrendt, L. Henrio & W. Oortwijn Bank b = ( Bank ) PAActiveObject . newActive ( Bank . class . getName () , null ) ; Account a = b . createAccount ( userName ) ; b . deposit ( a , ) ; Balance m = b . getBalance ( a ) ; int i = m . getValue () ; Figure 2: A simple P RO A CTIVE example.so they are left blank.L

ARVA automatically generates monitors from

DATE property speciﬁcations (via A

SPECT

J). Wetherefore often identify, in the discussion, a

DATE property with the monitor generated from it. P RO A CTIVE programming platform P RO A CTIVE [1, 6] is a Java library for building concurrent and distributed software that implementsthe active objects [5] design pattern. The active objects paradigm—largely inspired by Actors [19]—simpliﬁes distributed programming by abstracting concurrency and locality from the programmer. Sim-ilar to actors, the primitive unit of computation are objects that have their own thread of control. Activeobjects may have private state and public methods, and threads may communicate by calling the meth-ods on remote objects. Method invocations are decoupled from method execution, which simpliﬁes thedesign of distributed systems. By invoking a method, the calling thread pushes a request message intothe queue of the callee thread, which will return a future and will eventually process the request by ex-ecuting the method. This asynchronous construction allows threads to continue working while waitingfor remote calls to ﬁnish.The P RO A CTIVE platform implements the active objects paradigm for Java. In particular, P RO A C - TIVE allows to register Java objects as being “active”, which exposes the public methods of these objectsto other active objects, which may be hosted on different JVMs, possibly located on different machines.Under the hood, when activating a Java object, P RO A CTIVE spawns a worker thread for the object andconstructs a proxy that handles all (network) communication the object gets involved in (method calls onactive objects are handled via RMI). However, P RO A CTIVE hides all these technical details from theirusers; handling active objects has the same look-and-feel as handling ordinary Java objects.Figure 2 shows a code excerpt illustrating a simple use of P RO A CTIVE . The ﬁrst line creates anactive object b , thus all subsequent uses of object b are active object requests that will be handled asyn-chronously by the bank object. Each of these call returns a future, stored locally in a and m . The codeshown in the ﬁgure does not use locally the future stored in a , but it transmits the future to the bank. Theinvoker does not need to wait for the future resolution to send it but the bank will probably synchroniseon the account creation. It is worth noting that communication between active objects is FIFO, ensuringthat all the operations will be handled by the bank in the order of the program. The balance object m re-ceives a future at line 4, and the method invocation at line 5 will be blocked until the balance is obtained(returned by the bank object).The support for runtime monitoring is very limited in P RO A CTIVE . Entry points are provided tointercept inter-object communications. They are used for example in an execution visualizer, but anyprecise monitoring of runtime properties has to be done by hand in P RO A CTIVE (prior to this work).6

Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors public class Server { public static void main ( String [ ] args ) { Bank b = ( Bank ) PAActiveObject . newActive ( Bank . class . getName () , null ) ; PAActiveObject . registerByName ( b , args [ ]) ; } } public class Client { private static void startInterface ( Bank b ) { · · · } public static void main ( String [ ] args ) { Bank b = ( Bank ) PAActiveObject . lookupActive ( Bank . class . getName () , args [ ]) ; Client . startInterface ( b ) ; } } Figure 3: A distributed implementation of the Banking example, using P RO A CTIVE . In the following, we describe how we employ a runtime veriﬁcation framework for sequential applica-tions (L

ARVA ) to generate distributed monitors for distributed (P RO A CTIVE ) objects. We discuss howdistributed monitoring is achieved by letting L

ARVA distinguish between method invocations and method executions on active objects. This section also explains how different monitoring conﬁgurations are re-alised, including orchestrated, centralised and choreography based monitoring, by letting distributedL

ARVA monitors communicate using active objects. Finally, a notion of blame-shifting is discussed thatis inspired by static assume-guarantee (AG) style reasoning.

Running example.

Throughout this section, we explain and discuss the combination of L

ARVA and P RO A CTIVE with adistributed version of the banking example discussed in Section 2.1, implemented using P RO A CTIVE .An excerpt of the implementation is given in Figure 3, where we reuse the

Bank class from Figure 1. Infact, this implementation consists of two separate programs, namely: (i) a server that creates and hosts anactive object for the bank, and (ii) a client that connects to this active object in order to login, logout, orwithdraw money. The client and the server are intended to be instantiated on different JVMs. The onlyactive object, i.e. the only remotely accessible object, is the bank.By running the server program, P RO A CTIVE constructs a proxy for the active object in the API-call in line 3, and assigns a worker thread to this active object. Any call to methods on b are actuallytranslated to messages sent to this worker, but these details are neatly hidden by P RO A CTIVE . The APIcall at line 4 exposes the new active object b to the outside world, by assigning a network name to it,speciﬁed as the argument args [ ] in the console. Note that, after executing line 4, the thread associatedto the active object b keeps on running and is ready to process incoming method invocations.Another machine might run the Client program and connect to the active object hosted by the server(identiﬁed by the network name given as an input argument, args [ ] ). The client program will connect . Ahrendt, L. Henrio & W. Oortwijn myfault login ↑ (cid:14) \ result ∧ code (cid:54)∈ C (cid:14) C . add ( code ) logout ↑ (cid:14) \ result ∧ code ∈ C (cid:14) C . remove ( code ) login ↑ (cid:14) code ∈ C (cid:14) logout ↑ (cid:14) code (cid:54)∈ C (cid:14) withdraw ↑ (cid:14) code (cid:54)∈ C (cid:14) (a) Runtime monitor for the client myfault not myfault login ↓ (cid:14) \ result ∧ code ∈ S (cid:14) login ↓ (cid:14) \ result ∧ code (cid:54)∈ S (cid:14) S . add ( code ) logout ↓ (cid:14) code ∈ S (cid:14) S . remove ( code ) logout ↓ (cid:14) code (cid:54)∈ S (cid:14) withdraw ↓ (cid:14) code (cid:54)∈ S (cid:14) (b) Runtime monitor for the server Figure 4: Runtime monitors for the distributed banking example. Edges with multiple labels abbreviatemultiple edges, each with a single label.to the speciﬁed active object by the API-call in line 11 and line 12. Observe that the objects b returnedin line 3 and line 11 by P RO A CTIVE are typed as

Bank objects: one may use these as ordinary Javaobjects, even if the object b returned in line 11 is actually a stub —a proxy generated by P RO A CTIVE that translates all method invocations on b to network messages (actually to RMI communication) to theJVM that physically hosts the bank active object (here the JVM that runs the Server program). Afterconnecting to the

Bank active object, the client may display some user-interface via startInterface on line 13, to allow user interaction with the bank. L ARVA

Even though L

ARVA is not designed for runtime veriﬁcation of distributed programs, P RO A CTIVE andL

ARVA make a good match nonetheless, because:1. Although each JVM may host several active objects, each active object only has a single workerthread. Therefore, operations on active objects are essentially resolved sequentially. Note thatasynchronous communication with futures does not break support for runtime monitoring withL

ARVA .2. The P RO A CTIVE layer hides all details regarding (network) communication between active ob-jects. These details are captured in stub and proxy classes, and those do not break the A

SPECT

Jbindings of L

ARVA .Additionally, many modern distributed programming architectures and protocols can naturally bemodelled and constructed with the active objects design pattern, including web and cloud services.

To monitor the distributed banking implementation in Figure 3 with L

ARVA , separate runtime monitorsshould be assigned to the client and server program, as they are run on different JVMs. Figure 4 showsthe automata-representations of the two L

ARVA monitors. Both L

ARVA monitors check interaction with8

Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors the

Bank active object, but from different perspectives: Figure 4a monitors from the client’s perspective,whereas Figure 4b monitors from the server’s perspective. In both cases, the triggers login , logout and withdraw correspond to their eponymous methods in the Bank class.These runtime monitors again express the property that clients must ﬁrst login before being able tologout or perform a withdrawal. However, since there are now two different parties involved—a clientand a server—we choose, in this example, to monitor the property from two different perspectives. In theremainder of this section, we discuss the different states and transitions used in Figure 4, together withtheir underlying principles.

Call- and execution triggers.

The

DATE speciﬁcation format allows the distinction between methodcalls and method executions when specifying triggers. In Figure 4, the superscript ↑ indicates that thetransition should be taken upon calling the corresponding method in the program, whereas ↓ indicatesan transition triggering upon starting the method execution . This distinction between call and executionevents is supported by L ARVA in the sequential setting, but becomes particularly useful when monitoringdistributed objects.To give an example, when a client invokes a login method on a server active object, the client itselfdoes not execute the method login . Instead, the invocation becomes a network call to the server. A login ↓ trigger would therefore not be meaningful in the monitor of the client. Moreover, within theserver, the login method is executed but not invoked. Therefore, a login ↑ trigger is meaningless in themonitor of the server. Generally speaking, a remote call to a method m causes two events in differentcontexts, m ↑ in the context of the caller, and m ↓ in the context of the callee.To conclude, the call and execution triggers of DATE /L ARVA match well the events in distributedcaller-callee scenarios.

Monitor variables.

Both runtime monitors in Figure 4 maintain a monitor-local variable, namely alist C or S of the codes of logged-in users. Initially these lists are empty. Some transitions update C or S via their action component by adding or removing codes. By using state in this way, runtimemonitoring can be performed in scenarios where multiple clients log-in multiple users. Moreover, theformal parameter code is bound to the matching actual parameter given to the methods of Bank , and theplaceholder \ result captures the return value of the method corresponding to the transition’s trigger.Clarifying the transitions, client monitors go to a bad state when they: (i) perform a login attemptwith a code that is already logged-in by the same client, or (ii) perform a logout with a code that is notlogged-in, or (iii) attempt to withdraw money on an account that is not logged in.The server distinguishes between two kinds of ‘end’ states. The server’s monitor goes to the ‘ myfault ’ state when it performs an invalid operation, i.e., upon a successful login for a user if the monitorknows from its state that the user is already logged in (i.e., its code is already in C or S ). On the otherhand, the monitor goes to the ‘ not my fault ’ state if it detects that a client violates its intended protocol.We use these two different end states for assume-guarantee inspired blame-shifting, see Section 3.3. The runtime monitoring setup that we considered so far is purely distributed, as illustrated in Figure 5a.Each participating JVM (JVM i ) has its own L ARVA monitor (M i ), and these monitors do not commu-nicate with each other. However, there are many more monitoring conﬁgurations proposed in the lit-erature [14, 9], some of which rely on monitor communication. Indeed, if independent monitors are . Ahrendt, L. Henrio & W. Oortwijn JVM JVM n ··· M M n ··· (a) Independent JVM JVM n ··· M M n ··· (b) Communicating JVM JVM n ··· M M n ··· State (c) Orchestrated

Figure 5: Different monitoring conﬁgurations: 5a depicts purely independent monitors, 5b shows moni-toring with communication, and 5c depicts orchestrated monitoring, where some centralised component

State maintains the global state. bad − (cid:14) − (cid:14) r . notify () (a) Client monitor snippet notify ↓ (cid:14) − (cid:14) − (b) Server monitor snippet Figure 6: Synchronisation of monitors using active objects. Here r is an active object with a publicmethod notify () , that is instantiated by the server monitor and called by the client monitor (for notifyingthe server monitor).by nature extremely efﬁcient, inter-monitor communication is sometimes necessary to establish correct-ness. We explain below how to enable monitor interactions in our context and what interaction patternswe envision for monitors. Distributed monitor communication.

Recall that L

ARVA monitors are translated to Java code (tobe bound to the monitored implementation using A

SPECT

J), and that the transition in the automatamay contain arbitrary snippets of Java. We exploit this to realise monitor communication, as shown inFigure 5b, by instantiating active objects inside the L ARVA monitors and have them connect to eachother. Runtime monitors may call the public methods on the active objects of remote monitors andthereby inﬂuence their state. Moreover, since active objects are used as ordinary Java objects, the L

ARVA monitors may deﬁne triggers on the public methods of their active objects. By doing this, a transitioncan be taken when a remote monitor invokes a public method on a local active object, thus implementingmonitor synchronisation.Figure 6 illustrates the integration of synchronisation between independent distributed L

ARVA mon-itors using P RO A CTIVE (this illustration might for example extend the setup from Figure 4). The servermonitor may itself instantiate and host a new active object r (with a public method notify ), and theclient monitor may connect to this active object, like illustrated in Figure 3. Then when the client mon-itor transitions to some bad state for example, it may call r . notify () in the action component of thetransition (6a). The object r in the client monitor is only a proxy object generated by P RO A CTIVE ,so this method invocation targets the monitor of the server. Since notify is an ordinary Java method,the server’s monitor may have a trigger for its execution (i.e. notify ↓ ), so that the monitor can take atransition when the client invokes notify (6b).0 Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors

Orchestrated monitoring.

By allowing monitors to communicate with each other, we can also realiseelaborate monitoring conﬁgurations like orchestrated monitoring , illustrated in Figure 5c. Instead ofhaving the monitors communicate with each other directly, they communicate with a centralised activeobject (called

State in the ﬁgure). This schema is the best way to implement a globally shared memory.Indeed active objects do not share data, this organisation can allow us to an active object to store aglobal state, together with getters and setters invokable by the distributed monitors. In addition, since thecentralised component is an active object, and therefore plain Java, it may have its own L

ARVA monitorand verify invariants on the global state.These different conﬁgurations open-up a design space of distributed monitoring, as they may becombined in many ways. For example, one may consider orchestrated monitoring as in 5c, but stillallow L

ARVA monitors to communicate with each other, as in 5b. Moreover, if a single orchestrator(like shown in 5c) does not provide enough scalability, one may consider a hierarchical structure oforchestrators, each coordinating a subset of the distributed monitors.

In some scenarios, independent distributed monitoring is insufﬁcient to establish global correctness,while monitor communication is performance-wise too heavyweight. The distributed hash table casestudy in Section 4 falls into this category, since communication-based monitoring would break its scala-bility, while independent monitoring alone is not enough to establish correctness globally.For these scenarios we propose a notion of blame-shifting that is inspired by assume-guarantee (AG)reasoning [12]. The idea is that, instead of having runtime monitors rely on network communication,they could instead rely on assumptions from the environment, while giving certain commitments to theenvironment in return. This system of assumptions and commitments must be consistent: a monitormay only assume properties from the environment that are guaranteed/committed by the monitors of theenvironment. (Otherwise, the insights from the runtime veriﬁcation would be limited.)This implementation of AG reasoning constitutes a ‘my fault/not my fault’ blame system. A ‘notmy fault’ occurs when a monitor observes interaction with remote objects that violates its assumptionson the environment. A ‘my fault’ occurs when a monitor either observes interaction with a remoteobject that violate its commitment to the environment, or if it violates a local sub-property (possiblyleading to interaction that violates its commitment to the environment). Moreover, if the assumptionsand commitments of interacting distributed objects are in sync, then a ‘not my fault’ in some objectswill be mirrored by a ‘my fault’ in one or several other objects. In conclusion, AG style monitoringcan be used to detect and locate distributed runtime violations, without having to resort to centralisedmonitoring or other conﬁgurations that require communication.

Blame-shifting by example.

An example of blame-shifting was already presented in Figure 4. Here,the server monitor distinguishes between violations from the environment and local violations via twodifferent end states, named ‘ not my fault ’ and ‘ my fault ’. The client monitor does not have a ‘ not myfault ’ state in this example, as the intended protocol only really restricts the client (e.g. the client mustnot withdraw before logging in). Also notice that the assumptions of the server are consistent with thecommitments of the client; if the server’s monitor goes to the ‘ not my fault ’ state, it means that a clientperformed a logout or a withdrawal without being logged-in, and therefore goes to its ‘ my fault ’ state.Although the two runtime monitors described in Figure 4 do not communicate, the server can stilldetermine whether clients violated their protocol. This concept of blame-shifting is demonstrated furtherin the case study (Section 4) in particular in a system consisting of more than two objects. . Ahrendt, L. Henrio & W. Oortwijn Global consistency.

At the moment we do not have a mechanised way of checking whether the as-sumptions made by L

ARVA monitors are consistent with the commitments of the monitors of the envi-ronment. This currently has to be established manually. We are investigating ways to make this checkmechanical, as a static veriﬁcation task. This would result a combination of static global consistencychecking and local runtime veriﬁcation.

Distributed systems typically consist of nodes that perform local computations and distribute the com-puted results over the network, via protocols. Runtime veriﬁcation of distributed systems should there-fore cover both the computation and the distribution of data. Even though L

ARVA allows

DATE speciﬁ-cations to use data to a limited extend, it is mostly focused on control-oriented properties.In order to extend our approach to the monitoring of data-oriented properties, we deﬁne ppDATE ,an extension of

DATE with

Hoare triples ( pp stands for p re/ p ost-condition), as supported by the S TA R-VOO R S [2, 8] tool (S TA tic and Runtime Veriﬁcation of Object-O R iented Software). Hoare triples areof the standard form { P } m { Q } , with P and Q assertions expressed in ﬁrst-order logic and m the headerof a Java method. In ppDATE , Hoare-triples are state dependent : each state in the runtime monitorautomaton contains its own set of Hoare triples that have to hold in that state. S TA RVOO R S translates ppDATE to DATE and uses L

ARVA thereafter. Section 4 illustrates how Hoare triples can make thespeciﬁcation and monitoring of distributed applications easier.

This section demonstrates our veriﬁcation approach on a case study: verifying the correctness of aCAN [21]—a Content-Addressable Network—implemented with active objects in the P RO A CTIVE plat-form [20]. In the sequel we refer to this case study as ActiveCAN. The main motivations for consideringthe ActiveCAN case study are: (i) the case study is external, rather than a toy example that we con-structed ourselves (an evolved version has also been used in a large-scale application) ; (ii) the propertyto be runtime veriﬁed is a combination of both data- and control-oriented properties; and (iii) by naturethe example consists in many identical peers, which highlights the meaning of the distinction ’my fault’vs. ’not my fault’, where ’not my fault’ means either another peer or the rest of the system. Application description.

A CAN is a distributed infrastructure that provides hash table-like function-ality. The basic operations of a CAN are: insert ( k , v ) for inserting a value v at key k , and lookup ( k ) for looking-up the possible value stored at key k . The originality is that keys are tuples. CANs consistof many peers that are connected via a network, and each peer owns a fragment of the entire key space,i.e. a hyperrectangle. When a new peer joins the network, the key space in the CAN is split, allowingthe joined peer to receive ownership of the new fragment. Moreover, when performing an insert or lookup on a peer, the operation may either be handled locally if the key is owned by the local peer, orotherwise be relayed to a remote peer. For relaying operations to remote peers the CAN infrastructurecontains a routing protocol that is scalable. S TA RVOO R S also supports monitor optimisation through (partial) static veriﬁcation results, a feature which we do notuse yet in the distributed setting. See http://play.ow2.org . Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors

The ActiveCAN consists of a set of active objects, each responsible for the storage of data indexed inone hyperrectangle. Each peer knows the zone it manages, but also the zones of the neighbouring peersto route messages to the right target. The ActiveCAN peers provide three operations: • join ( p ) , to add one new peer p in the network. The joined peer will split its hyperrectangle intotwo parts and delegate one to the new peer. • insert ( k , v ) , to insert a new value v at a given key k where k is a tuple. • lookup ( k ) , to fetch the value previously stored at a given tuple k .Communication between neighbouring peers is performed by method calls on active objects, andeach request is transmitted this way to the adequate target. Finally, lookup relies on futures to return thefetched value. Veriﬁed properties.

Essentially, the following high-level properties are addressed:1. The behaviour of the hash table is consistent , meaning that: (i) after calling insert ( k , v ) the dataelement v is stored somewhere in the network at key k , (ii) if lookup ( k ) returns a positive result,then k is mapped somewhere in the network, (iii) if lookup ( k ) returns a negative result, then k isnot mapped in the network.2. There are no cycles in the routing protocol. In particular, we verify that: (i) any lookup and insert operation is either handled locally by a peer, or the peer has at least one neighbour todefer the operation to, and (ii) if a lookup or insert needs to be remotely resolved, the operationis deferred to a remote peer that is closer to the target peer (hence there are no cycles in the routingprotocol).Property (1) in the above is data-oriented, whereas (2) is control-oriented, and both are necessaryto establish correctness of the overall infrastructure. The case study thereby highlights the necessity forveriﬁcation techniques to include data and control aspects and motivates the usage of the S TA RVOO R Sfront-end to L

ARVA . The distributed setting is particularly interesting here because peers typically per-form some computation over data and distribute requests and results via some routing protocol. L

ARVA generates monitors checking these properties, reporting when any of them is violated while using thehash table.To verify property (1), for each peer, the runtime monitor maintains a list K of keys to be mapped inthe key/value store. For each key, it also maintains a boolean ﬂag that indicates whether the key is storedlocally or remotely. Therefore, K contains all keys stored locally by the peer, but also records the keysof remote lookup and insert operations that routed through the peer.The runtime monitors use blame-shifting in the style of AG reasoning to determine whether a lookupor insert is handled correctly. In more detail, when a locally resolved lookup behaves incorrectly, theruntime monitor of the local peer is able to detect this: it is classiﬁed as ‘my fault’. When a remotelyresolved lookup behaves incorrectly (e.g. stating the key is unmapped while the runtime monitor knowsfrom K that the key has already been stored), then the local peer can determine that another peer hasviolated the property: it is classiﬁed as ‘not my fault’. Since the remotely resolved lookup has been de-ferred to a neighbouring peer, that peer is closer to ﬁguring out which peer actually violated the property.In particular, due to the consistency of the AG-style assumptions and commitments, the neighbouringpeer’s monitor must also be in a not my fault or my fault state. This constitutes a chain of blames,ending-up in the misbehaving peer (i.e. the peer that is in the my fault state). . Ahrendt, L. Henrio & W. Oortwijn lookup or insert operation needs to be deferred to a remote node, the implementation calls the helpermethod getZoneClosestTo ( k ) , which returns the neighbouring zone that is closest to the destinationpeer. The property of loop freedom in the routing protocol is captured by the following Hoare triple. (cid:8) true (cid:9) getZoneClosestTo ( k ) (cid:26) \ result (cid:54) = null ∧\ result . distance ( k ) < localzone . distance ( k ) (cid:27) Every zone z has a distance function z . distance ( k ) that calculates the Euclidean distance between thecenter of z and the speciﬁed key k . The variable localzone refers to the zone of the callee peer. ThisHoare triple is checked every time the getZoneClosestTo method is called in the implementation.This method returns an object of type Zone —the zone of the neighbour closest to the destination peer.

Availability.

The source code of the ActiveCAN case study, together with veriﬁcation instructions, areavailable online . Our experiments did not reveal runtime violation of properties (1) and (2). Even if the area of runtime veriﬁcation is much more elaborate in the stand-alone setting , runtime ver-iﬁcation of distributed systems is a very active area of research. The recent article [16] provides anexcellent overview over scenarios and characteristics of this sub-ﬁeld, and discusses the existing ap-proaches on that background. Here, we only include works with sufﬁcient overlap in aim, method, orapplication area. Generally, our discussion of conﬁgurations in Section 3.2 is consistent with the monitororganisation categories in [16]. However, compared to the analysis there the L ARVA methodology forspeciﬁcation by nature splits the monitoring into properties speciﬁed and veriﬁed for each object, whichnaturally extends to a truly distributed monitoring approach. A recent work use distributed monitorsto control the executions of Erlang actors [15], the behaviour of the program is expressed as a chore-ography and the monitors control the execution so that, in case of failure, the application can rollbackto a safe state and then run along a different path. Compared to these choreography-based approaches,we provide the monitoring of a reactive system that does not have a service choreography speciﬁcation.For example we are able to monitor a peer-to-peer system that has a more concurrent behaviour thanchoreographies. In such a system it makes more sense to specify the correctness of each entity based onassumptions on the environment than from a global perspective, and this naturally provides distributedmonitors. However, it is more difﬁcult to state properties of the global system in our approach than froma choreographed perspective, but to the best of our knowledge, no other runtime-veriﬁcation approach isable to verify at runtime a reactive system made of distributed objects.The E

LARVA runtime veriﬁer [10] is an adaption of L

ARVA for monitoring concurrent Erlang pro-grams. In particular, E

LARVA adapts L

ARVA by translating the object-oriented monitoring constructs toa process-oriented setting, and gives an asynchronous interpretation to L

ARVA ’s monitoring semantics.However, E

LARVA only supports centralised monitoring and is unable to monitor across multiple ma-chines. Our implementation allows both centralised and decentralised monitoring, and allows distributedmonitors to communicate and coordinate using active objects, as described in Section 3. Another notabledifference is that we did not change the synchronous L

ARVA ’s monitoring semantics, and instead leave https://github.com/utwente-fmt/RV2018-ActiveCAN . This includes the analysis of single nodes, only, running in a distributed environment. Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors it to P RO A CTIVE to ‘hide’ the asynchronous nature of the underlying communication (not only from theprogrammer but also from L

ARVA ).There are several tools for static veriﬁcation of active objects, relying on model-checking [4, 22],static analysis [3], behavioural types [17], or deductive techniques [13], but the support for monitoringand runtime veriﬁcation of active object systems is quite weak. Basic monitoring tools exist for actors andactive object systems. For example Akka [23] provides an interesting hierarchy of actors for monitoringfailures, both ABS and P RO A CTIVE feature a tool for viewing or debugging active object execution (seee.g. Section 3.3 of [18]), and a monitoring framework has been developed for Erlang [7]. In existingplatforms runtime veriﬁcation of functional correctness for active object applications must be done byhand. We believe that we can ﬁt our approach to the infrastructures proposed in actor monitoring systemsby generating monitors speciﬁc to an actor monitoring framework, like [7]; however our current approachhas the crucial advantage to minimise the changes to the monitor generation of L

ARVA .The choice of locations of the monitors is quite an important issue because communication acrosslocations is usually expensive and information-sensitive. A good discussion about this choice is presentedin [14], where a theoretical framework is presented for comparing those choices. Such a discussion iscomplementary to the solution presented in this system paper, which shows how different monitoringarchitectures can be naturally realised for a distributed variant of Java.We are not aware of other work which adapts the assumption-guarantee paradigm, known from com-positional static veriﬁcation, to the runtime veriﬁcation setting.

This article shows the importance of two key challenges in monitoring of distributed applications: dis-tributed monitoring by independent active monitors, and veriﬁcation of properties mixing data-orientedand control-oriented aspects.From a technical point of view, this article provides two contributions addressing these two chal-lenges. First it presents an effective distributed monitoring mechanism. This is realised thanks to activemonitors that are generated by our framework. Our runtime veriﬁcation environment combines mon-itor generation of the L

ARVA framework with an active object middleware, P RO A CTIVE . This way,the standard Java code generated by L

ARVA generates distributed active monitors at runtime to detectthe violation of safety properties in a distributed monitoring fashion. The veriﬁcation setup integratesassumption-guarantee-style blame shifting to efﬁciently localise runtime failures while limiting commu-nication between monitors. Moreover, the Hoare triple extension to L

ARVA provides to the programmera better abstraction for specifying properties mixing data-oriented and control-oriented aspects. This isparticularly relevant in connection with the active object paradigm, where a task is the execution of amethod.The approach has been illustrated by monitoring a distributed peer-to-peer system implementing akey-value store. This example and the properties we were able to monitor illustrate well the contribu-tions of this work: it is by nature distributed, and needs a distributed monitoring infrastructure; and theapplication mixes control-oriented and data-oriented properties both in the routing of messages and inthe storage/retrieval of data; and the blame-shifting allows to trace the cause of an error along the routingpath (even if this is error tracing is currently done on the meta-level and not yet automated).Our approach is by nature distributed. In particular, we do not favour veriﬁcation of global proper-ties and prefer to focus on a local rely-guarantee approach. The counterpart is that our framework is notparticularly adapted to reason at a global level, and in particular an object is unable to distinguish if an . Ahrendt, L. Henrio & W. Oortwijn TA RVOO R S approach [2]to the distributed setting, to combine static veriﬁcation with our distributed runtime veriﬁcation method.Just like in S TA RVOO R S, this has great potential to decrease the runtime overhead and increase thescalability.

Acknowledgements.

The authors would like to thank Gordon Pace and Gerardo Schneider for fruitful discussions in the courseof this work, and Mauricio Chimento for implementing some adaptions in the S TA RVOO R S tool. Thiswork is partially supported by the NWO TOP 612.001.403 project VerDi, and by the COST ActionIC1402 Runtime Veriﬁcation beyond Monitoring.

References [1]

ProActive Middleware . Available at https://github.com/scale-proactive .[2] Wolfgang Ahrendt, Jes´us Mauricio Chimento, Gordon J. Pace & Gerardo Schneider (2017):

Verifying data-and control-oriented properties combining static and runtime veriﬁcation: theory and tools . Formal Methodsin System Design

SACO: Static Analyzer for ConcurrentObjects . In Erika ´Abrah´am & Klaus Havelund, editors:

Proc. 20th International Conference on Tools andAlgorithms for the Construction and Analysis of Systems (TACAS) , LNCS

Behavioural semanticsfor asynchronous components . Journal of Logical and Algebraic Methods in Programming

89, pp. 1–40,doi:10.1016/j.jlamp.2017.02.003. Available at .[5] Frank De Boer, Vlad Serbanescu, Reiner H¨ahnle, Ludovic Henrio, Justine Rochas, Crystal Chang Din,Einar Broch Johnsen, Marjan Sirjani, Ehsan Khamespanah, Kiko Fernandez-Reyes & Albert MingkunYang (2017):

A Survey of Active Object Languages . ACM Computing Surveys

A Theory of Distributed Objects . Springer, doi:10.1007/b138812.[7] Ian Cassar & Adrian Francalanza (2016):

On Implementing a Monitor-Oriented Programming Frameworkfor Actor Systems . In Erika ´Abrah´am & Marieke Huisman, editors:

Integrated Formal Methods , Springer,pp. 176–192, doi:10.1007/978-3-319-33693-0˙12.[8] Jes´us Mauricio Chimento, Wolfgang Ahrendt, Gordon J. Pace & Gerardo Schneider (2015):

StaRVOOrS: ATool for Combined Static and Runtime Veriﬁcation of Java . In Ezio Bartocci & Rupak Majumdar, editors:

Runtime Veriﬁcation , Lecture Notes in Computer Science

Organising LTL monitors over distributed systems with a globalclock . Formal Methods in System Design Who is to Blame? – Runtime Veriﬁcation of Distributed Objects with Active Monitors [10] Christian Colombo, Adrian Francalanza & Rudolph Gatt (2012):

Elarva: A Monitoring Tool for Er-lang . In Sarfraz Khurshid & Koushik Sen, editors:

Runtime Veriﬁcation , Springer, pp. 370–374,doi:10.1007/BFb0053381.[11] Christian Colombo, Gordon J. Pace & Gerardo Schneider (2009):

LARVA — Safer Monitoring of Real-Time Java Programs (Tool Paper) . In:

Seventh IEEE International Conference on Software Engineering andFormal Methods (SEFM) , IEEE Computer Society, pp. 33–37, doi:10.1109/SEFM.2009.13.[12] W.P. de Roever, U. Hanneman, J. Hooiman, Y. Lakhneche, Mannes Poel, Jakob Zwiers & F. de Boer (2001):

Concurrency Veriﬁcation: Introduction to Compositional and Noncompositional Methods . Cambridge Tractsin Theoretical Computer Science, Cambridge University Press. Imported from HMI.[13] Crystal Chang Din, S. Lizeth Tapia Tarifa, Reiner H¨ahnle & Einar Broch Johnsen (2015):

History-BasedSpeciﬁcation and Veriﬁcation of Scalable Concurrent and Distributed Systems . In Michael Butler, SylvainConchon & Fatiha Za¨ıdi, editors:

International Conference on Formal Engineering Methods (ICFEM) , LNCS

Distributed System Contract Monitoring . TheJournal of Logic and Algebraic Programming . Formal Lan-guages and Analysis of Contract-Oriented Software (FLACOS’11).[15] Adrian Francalanza, Claudio Antares Mezzina & Emilio Tuosto (2018):

Reversible Choreographies via Mon-itoring in Erlang . In Silvia Bonomi & Etienne Rivi`ere, editors:

Distributed Applications and InteroperableSystems , Springer, pp. 75–92, doi:10.1016/j.jlamp.2017.11.002.[16] Adrian Francalanza, Jorge A. P´erez & C´esar S´anchez (2018):

Runtime Veriﬁcation for Decentralised andDistributed Systems , pp. 176–210. Springer, doi:10.1007/978-3-319-75632-5˙6.[17] Ludovic Henrio, Cosimo Laneve & Vincenzo Mastandrea (2017):

Analysis of Synchronisations in StatefulActive Objects , pp. 195–210. Springer, doi:10.1007/978-3-319-66845-1 13.[18] Ludovic Henrio & Justine Rochas (2017):

Multiactive objects and their applications . Logical Methods inComputer Science

Volume 13, Issue 4, doi:10.23638/LMCS-13(4:12)2017. Available at http://lmcs.episciences.org/4079 .[19] Carl Hewitt, Peter Bishop & Richard Steiger (1973):

A Universal Modular ACTOR Formalism for ArtiﬁcialIntelligence . In Nils J. Nilsson, editor:

Proceedings of the 3rd International Joint Conference on ArtiﬁcialIntelligence , IJCAI’73, W. Kaufmann, pp. 235–245.[20] Laurent Pellegrino, Fabrice Huet, Franc¸oise Baude & Amjad Alshabani (2013):

A Distributed Pub-lish/Subscribe System for RDF Data . In Abdelkader Hameurlain, Wenny Rahayu & David Taniar, editors:

Data Management in Cloud, Grid and P2P Systems , Springer, pp. 39–50, doi:10.1145/964723.383071.[21] S. Ratnasamy, P. Francis, M. Handley, R. Karp & S. Shenker (2001):

A Scalable Content-Addressable Net-work . In:

SIGCOMM , ACM, pp. 161–172, doi:10.1145/383059.383072.[22] Marjan Sirjani, Ali Movaghar, Amin Shali & Frank S. de Boer (2004):

Modeling and Veriﬁcation of ReactiveSystems using Rebeca . Fundamenta Informaticae