[PDF] Adaptable and Verifiable BDI Reasoning

Abstract

Long-term autonomy requires autonomous systems to adapt as their capabilities no longer perform as expected. To achieve this, a system must first be capable of detecting such changes. In this position paper, we describe a system architecture for BDI autonomous agents capable of adapting to changes in a dynamic environment and outline the required research. Specifically, we describe an agent-maintained self-model with accompanying theories of durative actions and learning new action descriptions in BDI systems.

Full PDF

RR.C. Cardoso, A. Ferrando, D. Briola, C. Menghi,T. Ahlbrecht (Eds.): First Workshop on Agents andRobots for reliable Engineered Autonomy 2020 (AREA’20).EPTCS 319, 2020, pp. 117–125, doi:10.4204/EPTCS.319.9 c (cid:13)

P. Stringer, R.C. Cardoso, X. Huang & L.A. DennisThis work is licensed under theCreative Commons Attribution License.

Adaptable and Veriﬁable BDI Reasoning ∗ Peter Stringer Rafael C. Cardoso Xiaowei Huang Louise A. Dennis

University of LiverpoolLiverpool L69 3BX, United Kingdom { peter.stringer, rafael.cardoso, xiaowei.huang, l.a.dennis } @liverpool.ac.uk Long-term autonomy requires autonomous systems to adapt as their capabilities no longer performas expected. To achieve this, a system must ﬁrst be capable of detecting such changes. In this posi-tion paper, we describe a system architecture for Belief-Desire-Intention (BDI) autonomous agentscapable of adapting to changes in a dynamic environment and outline the required research. Specif-ically, we describe an agent-maintained self-model with accompanying theories of durative actionsand learning new action descriptions in BDI systems.

Long-term autonomy requires autonomous systems to adapt as their capabilities no longer perform asexpected. To achieve this, a system must ﬁrst be capable of detecting such changes. Creating andmaintaining a system ontology is a comprehensive solution for this; an agent-maintained formal self-model will take the role of this system ontology. It would act as a repository of information about allthe processes and functionality of the autonomous system, forming a systematic approach for detectingaction failures.Our work will focus on Belief-Desire-Intention (BDI) [25] programming languages as they are wellknown for their use in developing intelligent agents [1, 6, 16, 21]. Agents that are capable of controllingan array of cyber-physical autonomous systems such as autonomous vehicles, spacecraft and robot armshave been programmed using BDI agents (e.g., Mars Rover [16], Earth-orbiting satellites [6] and roboticarms for nuclear waste-processing [1]). Coupled with their use of plans and actions, BDI languages offeran appropriate platform to build upon for the development of an adaptable autonomous system.The agent-maintained self-model includes action descriptions , consisting of pre- and post-conditionsof all known actions/capabilities. An action’s pre-conditions are the environment conditions that mustexist for an action to be executed whilst post-conditions are deﬁned as the expected changes in the en-vironment made directly by a completed action. These action descriptions are based on the PlanningDomain Deﬁnition Language (PDDL) [22], commonly used in classical automated planning. The com-plete availability of current system information will provide the ability to monitor the status of actions,presenting the opportunity to detect failure. We use action life-cycles based on a theory of durative ac-tions for BDI systems [10] to detect persistent abnormal behaviour from action executions that coulddenote hardware degradation or other long-term causes of failure such as exposure to radiation or ex-treme temperature. Once a failure has been detected, we can use machine learning methods to update theaction description in the self model. Then, we can repair or replace the actions in any existing plans byusing an automated planner to patch these plans. The resulting plans can then be veriﬁed to ensure thesystems safety properties are intact. ∗ This work has been supported by the University of Liverpool’s School of EEE & CS in support of the EPSRC “Roboticsand AI for Nuclear” (EP/R026084/1) and “Future AI and Robotics for Space” (EP/R026092/1) Hubs. Adaptable and Veriﬁable BDI Reasoning

This is a position paper that outlines a program of research. The overarching aim of this research isto create a framework for the veriﬁcation of autonomous systems that are capable of learning new be-haviour(s) descriptions and integrating them into existing BDI plans: using the framework as a route tocertiﬁcation. In this paper, we discuss the current ability of BDI systems in adaptable reasoning, largelyfocusing on actions. We also consider research in Artiﬁcial Intelligence (AI) planning on modelling ac-tions and the methods and implications of introducing machine learning for replacing action descriptions.Our main contribution is the initial design of a system architecture for BDI autonomous agents capableof adapting to changes in a dynamic environment, consolidating the agent-maintained self-model withthe theory of durative actions and learning new action descriptions into a cohesive and adaptable BDIsystem. It should be noted, our work relies upon assumptions that have been discussed further in therelevant sections of this paper.

Intelligent Agent systems conforming to the BDI software model largely follow the principles proposedby Bratman’s

Intentions, Plans, and Practical Reason [4] that was originally intended for modellingpractical reasoning in human psychology. Use of BDI agents is particularly suitable for high-level man-agement and control tasks in complex dynamic environments [26] which justiﬁes their implementation inmany practical applications. Since Georgeff and Lansky’s Procedural Reasoning System (PRS) emergedin 1987 [14] a wide range of BDI programming languages have been developed [21], each building uponthe reach of PRS with a multitude of extensions for different applications.

Typically, BDI languages use plans provided by a programmer at compile time and the language selectsan appropriate plan to react to a given situation. Some BDI languages model interaction with the ex-ternal environment either as an action (e.g., Jason [3]) or as a capability (e.g., GOAL [19]). We viewcapabilities as actions with explicit pre- and post-conditions.Actions and capabilities can appear in the bodies of plans . The body of a plan is generally a sequenceof actions/capabilities, belief updates and subgoal manipulations (e.g., adopting or dropping goals). Plansare selected by means-end reasoning in order to achieve the agent’s goals. Plans may have additionalcomponents as well as the plan body. For instance, they generally have a guard which must hold beforethe plan can be applied. Once a plan is selected for execution it is transformed into an intention whichrepresents a sequence of steps to be performed as part of executing the plan.We intend to extend the G

WENDOLEN agent programming language [7] for our research. We havechosen to use G

WENDOLEN as it is a BDI agent programming language capable of producing veriﬁ-able agents. It is integrated into the MCAPL (Model-Checking Agent Programming Languages) frame-work [8]. Using MCAPL, agents can be programmed in G

WENDOLEN and then veriﬁed using the AJPF(Agent Java Pathﬁnder) model-checker [11]. Actions in G

WENDOLEN are generally implemented in aJava-based environment; at runtime they are requested and executed by agents. Whilst actions in G

WEN - DOLEN can exhibit characteristics of duration, this is implemented using a ‘wait for’ construct whichtemporarily suspends an intention when encountered. When the predicate that is being waited for isbelieved, the intention becomes unsuspended. This is the extent to which actions in G

WENDOLEN aretreated as having signiﬁcant durations and is largely typical of the treatment of actions in BDI languages. . Stringer, R.C. Cardoso, X. Huang & L.A. Dennis

BDI languages are increasingly being used for developing agents for physical systems where actionscould take considerable time to complete [10]. Currently, most BDI languages suspend an agent entirelyuntil an action completes or implement actions in such a way that an agent may start a process but thenmust be programmed to explicitly track the progress of the action in some way.Introducing an explicit notion of duration to actions will allow us to create principled mechanismsto let an agent continue operating once an action is started, meaning the agent is available to monitor thestatus of actions in progress. [16] introduced an abstract theory of goal life-cycles , whereby every goalpursued by the agent moves through a series of states:

Pending to Active ; Active to either

Suspended or Aborted or a

Successful end state; and so on. Dennis and Fisher [10] extended the formal semanticsprovided by Harland et al. to show how the behaviour of durative actions could integrate into these life-cycles. They advocate associating actions not only with pre- and post-conditions containing durationsbut also with explicit success, failure and abort conditions (an abort is used if the action is ongoing butneeds to be stopped) and suggest goals be suspended while an action is executing and then the action’sbehaviour be monitored for the occurrence of its success, failure or abort conditions. When one of theseoccurs the goal then moves to the

Active or Pending (where re-planning may be required) part of itslife-cycle as appropriate. Adding these additional states to actions should not add to the cost of modelchecking as this should not add branches. Adding states should only add more information which wouldmake no signiﬁcant difference.Brahms [28], a multi-agent modelling environment, is an example of an agent approach that imple-ments durative actions. The Brahms equivalent of actions, activities , have duration. Brahms has a formalsemantics provided by Stocker et al. [29], although these semantics are primarily concerned with the ef-fect of activity duration on simulation with mechanism for monitoring the behaviour of an activity duringits execution. Whilst the concept of durative actions seems to have been adequately explored in theseexamples, there has not been a formal implementation that focuses on monitoring individual actions forfailure.

The idea of monitoring an actions life-cycle exists in current literature [10, 16, 17]. A range of states canbe attributed to an action that can subsequently be traced for irregularities or consistent errors, providinga basis for determining failure. If we assume that the performance of actions may degrade then we alsoneed to introduce the concept of an action life-cycle in which an action is introduced into the systemas

Functional , may move into a

Suspect state if it is failing and ﬁnally becomes

Deprecated followingrepeated failures.Cardoso et al. [5] assumes a framework along these lines and builds upon it to outline a mechanismthat allows reconﬁguration of the agent’s plans in order to continue functioning as intended if some actionhas become

Deprecated . However, this assumed ability to detect persistent failures does not yet exist.Our proposed framework should allow us to detect persistent abnormal behaviour from action executionsfor use with Cardoso et al.’s reconﬁguration mechanism.

AI Planning seeks to automate reasoning about plans; using a formal description of the domain, allpossible actions available in the domain, an initial state of the problem, and a goal condition to produce20

Adaptable and Veriﬁable BDI Reasoning a plan consisting of the actions that will achieve the goal condition when executed [18]. The formaldescription of the domain and the problem can be considered a model of the environment, the accuracyof which is fundamental to producing viable plans of reasonable quality. Signiﬁcant advances have beenmade in the modelling of actions [12, 15, 33, 34] in automated planning, supporting actions that can havevariable duration, conditions and effects.Actions in BDI systems are typically designed without a speciﬁed duration and are deﬁned beforethe execution of the program. As previously mentioned, BDI systems do not have a de facto theory ofdurative actions. Additionally, there is no theory for learning new action descriptions. With an extensionof action theory for BDI systems (covering these two areas) paired with the self-model concept, actionscould adapt to change. However, learning a new action may not always be the best solution for a failingaction. Cardoso et al. [5] have developed a method for reasoning about replacing malfunctioning actionswith alternate existing actions to achieve the same desired goal, reusing the domain entities and predicatesthat are already available.In situations where a new action description is required at runtime, there are already suitable learningmethods that could be adapted to be incorporated into the framework [23, 24], enabling the discoveryof new entities and predicates in the domain. In [23], the Qualitative Learner of Action and Perception(QLAP) is introduced. When deployed in an unknown, continuous and dynamic environment, QLAPconstructs a hierarchical structure of possible actions in an environment based upon the consequences ofactions that have happened before. Work in [24] explores the use of Machine Learning and probabilisticplanning in complex environments to cope with unexpected outcomes. A learning algorithm is usedto determine an action model with the greatest likelihood of attaining the perceived action effects ofanother different set of actions. We have to acknowledge the risk that system properties could be violatedduring machine learning, although this could be remedied by using a

Safe Learning [13] approach,the introduction of machine-learning presents great difﬁculty for veriﬁcation as the algorithms cannot(currently) be directly veriﬁed [31]. As a consequence, it should be noted that the proposed system couldbe unsuitable for scenarios where learning from failure is not safe (e.g., autonomous drones), where itwould be safest to execute a controlled stop to the system rather than attempting a recovery.

The initial objective of this research is to formally deﬁne the concept of a self-model: an agent-maintainedontology for the autonomous system. We intend to use PDDL (Planning Domain Deﬁnition Language)as a starting point for creating a self-model. PDDL is a formalism for AI planning which is intendedto “express the ‘physics’ of a domain” [22]. More speciﬁcally, we intend to use features introduced inPDDL2.1 [12] as our starting point. PDDL2.1 is an extension to PDDL for expressing temporal planningdomains. The self-model concept will build on this by enabling agents to access and maintain a domaindescription, adding the capability of learning new action descriptions and allowing action life-cycles tobe monitored. As shown in Figure 1, the self-model is centrally linked to the other system componentsas they are required to contribute into keeping the self-model accurate. It is important to note that theself-model’s domain description is not assumed to be modelled soundly and completely, yet it is assumedthat all reports and updates received by the system are correct.Our implementation will be developed for G

WENDOLEN [9]. The G

WENDOLEN agent programminglanguage follows the BDI software model. As part of the MCAPL Framework, G

WENDOLEN interfaceswith the Java Pathﬁnder (JPF) model-checker [32]. Our intention is to implement self-models and thetheory of action life-cycles [10] in G

WENDOLEN and integrate this with the existing work on plan re- . Stringer, R.C. Cardoso, X. Huang & L.A. Dennis

WENDOLEN ’s support for veriﬁcation to verify the adaptedsystem against requirements. We propose representing actions in our self-model with explicit pre- andpost-conditions and either explicit success, fail and abort conditions or one’s that can be inferred fromthe pre- and post-conditions. We will then adapt the G

WENDOLEN goal life-cycle as suggested in [10]to handle durative actions in a principled fashion.

Learning

Action Failure

Self-Model

Plan is patchedAI Regular Action BDIPlanAction Description Deprecated AJPF

Failure

New Action DescriptionReconﬁgure Reconﬁgure affected plans Reverify programVeriﬁed Patched PlanIf a plan can’t be BDI SystemExecution

Detection

Re-attemptLearnedCardoso et al.’s FrameworkreconﬁguredPlanner containing patched plans

Figure 1: Diagram of Action Failure and Recovery Mechanisms. Arrows represent data ﬂow and dottedlines are for readability when a line goes through a component.When an action changes, requiring plans to be modiﬁed, it is assumed that the agent must be veriﬁedagain in order to preserve the safety properties of the system as a whole. However, if a new action islearnt in place of a failing action (fully or partially achieving the failing actions post-conditions), thewhole system may not require reveriﬁcation. We aim to further study this process in order to identify theconditions where such reveriﬁcation would not be necessary.In

Algorithm 1 , we propose a primitive method for action failure monitoring. It is assumed that theaction status used in the algorithm is asserted as a belief by the system. We start monitoring an actiononce it has been executed, retrieving some preliminary information about the action: an identiﬁer andthe current status (lines 2-3). If the action is currently

Pending ; Suspended ; or

Aborting , this status isreturned (lines 4-5). If not, the action’s expected post-conditions are retrieved from the tuple (line 6).Whilst an action’s state is ’

Active ’, we continue checking it for failure by comparing the perceived post-conditions with those that are expected of that action (lines 7-11). If at any point during monitoring theseconditions do not match, the action’s state becomes ’

Failed ’. If an action is not working as expected,the action can be re-attempted or suspended and replaced in the self-model. The replacement actionmay be selected from an existing action in the self-model itself. Alternatively, using machine-learningmethods, a new action can be learnt to replace the failing action using current knowledge of the availablecapabilities. Finally, a method for reconﬁguring the BDI plan, such as in [5], is called (lines 12-15).

To illustrate how the self-model would complement Cardoso et al.’s work on reconﬁgurability [5], we usethe same scenario: a Mars rover’s faulty movement capability. Figure 1 shows our proposed mechanisms22

Adaptable and Veriﬁable BDI Reasoning

Algorithm 1:

Action Failure Monitoring Function monitor( (cid:104) action identiﬁer, action status, action post conditions (cid:105) ) ActionID ←− action identi f ier ; Status ←− action status ; if Status (cid:54) = Active ∨ Failed then return Status ; ExpectedPostCond ←− action post conditions ; while Status = Active do ActualPostCond ←− getPostConditions(ActionID); if ActualPostCond = ExpectedPostCond then Status ←− Active ; monitor( ActionID , Status , ActualPostCond ); else Status ←− Failed ; reconﬁgure( ActionID ); return Status ; return Active ;for actions failure and recovery embedded into a system architecture including a BDI system, an AI Plan-ner and Cardoso et al.’s [5] reconﬁgurability framework. The dotted line arrows crossing the self-modelrepresent incoming information from a component such as an action’s state. The system architecture inthe diagram relies upon a simpliﬁcation of the successful, fault-free execution of actions that would nor-mally occur in a BDI system. In the case of the rover, these fault-free actions can be represented by thehigh-level task of movement between waypoints. Whilst mostly successful, these actions are susceptibleto failure.Consider the task of moving from a waypoint A to another waypoint B, in order to collect a rocksample to analyse at another waypoint C. Using the monitoring method for failure detection in Algo-rithm 1, a failed movement action between any of these waypoints could be found. Given the dynamicenvironment that the rover operates in, it is plausible that previously clear and usable routes could be-come blocked at any time. A failure can be ﬂagged when an action is exceeding a predetermined time orenergy threshold described in the action post-conditions. Once failure has been detected and conﬁrmed,we can update the self-model to show that the action description has deprecated and no longer affordsit’s post-conditions. The rover now attempts to reconﬁgure the current plan to resolve the failure usingan AI planner to search for a replacement (e.g., by ﬁnding a different route) before attempting to learn acompletely new action description. In both cases, the time and energy consumption required to accom-plish the original post-conditions is updated in the reconﬁgured/new action description. If it is found thatthe reconﬁgured plan is now too time or energy intensive, the latter method of learning a new action de-scription is invoked. If at any point the failing action is found to achieve all post-conditions but does notperform the action within the time or energy threshold (e.g., the rover now navigates around a blockageand arrives at the correct waypoint but now takes longer to do so), this can be managed by learning newactions descriptions with an updated time and/or energy threshold.The action may not be deprecated if the failure is considered anomalous; for instance, the actionnormally succeeds and only fails on one isolated occasion. If the action description is not deprecated, . Stringer, R.C. Cardoso, X. Huang & L.A. Dennis

The work in [5] describes a reconﬁgurability framework that is capable of replacing faulty action descrip-tions based on formal deﬁnitions of action descriptions, plans, and plan replacement. The implementa-tion uses an AI planner to search for viable action replacements. We plan on extending their approach byadding the concept of a self-model, durative actions, and failure detection. Furthermore, we also envisionadding a learning component to the framework in order to be able to cope with dynamic environmentevents that require new action descriptions to be formulated at runtime.Troquard et al.’s work on logic for agency in [30] considers the modelling of actions with durationsalthough a different approach was taken: actions are given duration using continuations from STIT(Seeing To It That) logic. In BDI systems, the focus of handling plan failure is the effect that failure hason goals [2, 27]. This is a reasonable focus considering the central role that goals have in agent-orientedprogramming. Consequently, action failure recovery has not been explored as an option for managingplan failure.

In this position paper we have described a system architecture for BDI autonomous agents capable ofadapting to changes in a dynamic environment. We also introduced the idea of an agent-maintained self-model with durative actions and learning new action descriptions. Our proposed system aims to resolvethe following: develop the concept of a self-model; produce and develop a method to detect the failureof an action performed by a BDI Agent; develop a theory of durative actions for BDI languages; adaptexisting system to allow new actions to be learnt and used in place of failing ones whilst preserving safetyproperties, and ﬁnally to integrate into the existing G

WENDOLEN infrastructure.To illustrate the applicability of the discussed mechanisms, a practical example of how a Mars rovercould make use of the framework was provided. Future work includes deﬁning the learning componentto be able to handle dynamic environment events that require the creation of new action descriptionsat runtime, a formal deﬁnition of the self-model with an outline of the concepts included in this, theimplementation of the system architecture, and the evaluation of the approach.A number of questions and challenges have been identiﬁed whilst outlining this program of research.Firstly, it has been noted that the term ’persistent failure’ is subjective and should be accompanied by aformal and precise speciﬁcation to avoid ambiguity. Secondly, considerations for the steps taken afterreconﬁguration and the learning process require further work (e.g. What happens to failing actions inthe model after reconﬁguring?). Finally, the proposed learning strategy has produced many challengeswhich will be considered once implementation has reached this stage. Notably, we will consider how thelearning method can ensure valid solutions; how planning time could be minimised and how an action’sstate could inﬂuence the learning strategy. These challenges will serve as guidance for future work.24

Adaptable and Veriﬁable BDI Reasoning

References [1] Jonathan M. Aitken, Affan Shaukat, Elisa Cucco, Louise A. Dennis, Sandor M. Veres, Yang Gao, MichaelFisher, Jeffrey A. Kuo, Thomas Robinson & Paul E. Mort (2017):

Autonomous Nuclear Waste Management . IEEE Intelligent Systems . In Press.[2] Rafael H Bordini & Jomi Fred H¨ubner (2010):

Semantics for the Jason Variant of AgentSpeak (Plan Failureand some Internal Actions).

In:

ECAI , pp. 635–640, doi:10.3233/978-1-60750-606-5-635. Available at https://dx.doi.org/10.3233/978-1-60750-606-5-635 .[3] Rafael H. Bordini, Michael Wooldridge & Jomi Fred H¨ubner (2007, 273p):

Programming Multi-Agent Sys-tems in AgentSpeak using Jason . John Wiley & Sons, doi:10.1002/9780470061848.[4] Michael Bratman et al. (1987):

Intention, plans, and practical reason . 10, Harvard University Press Cam-bridge, MA.[5] Rafael C. Cardoso, Louise A. Dennis & Michael Fisher (2019):

Plan Library Reconﬁgurability in BDIAgents . In:

Proc. of the 7th International Workshop on Engineering Multi-Agent Systems (EMAS) .[6] Louise Dennis, Michael Fisher, Alexei Lisitsa, Nicholas Lincoln & Sandor Veres (2010):

Satellite controlusing rational agent programming . IEEE Intelligent Systems

Gwendolen Semantics: 2017 . Technical Report ULCS-17-001, University ofLiverpool, Department of Computer Science.[8] Louise A Dennis (2018):

The MCAPL Framework including the Agent Infrastructure Layer and Agent JavaPathﬁnder . The Journal of Open Source Software .[9] Louise A Dennis & Berndt Farwer (2008):

Gwendolen: a BDI language for veriﬁable agents . In:

Proceedingsof the AISB 2008 Symposium on Logic and the Simulation of Interaction and Reasoning, Society for theStudy of Artiﬁcial Intelligence and Simulation of Behaviour , pp. 16–23.[10] Louise A Dennis & Michael Fisher (2014):

Actions with Durations and Failures in BDI Languages.

In:

ECAI , pp. 995–996, doi:10.3233/978-1-61499-419-0-995. Available at https://dx.doi.org/10.3233/978-1-61499-419-0-995 .[11] Louise A. Dennis, Michael Fisher, Matthew P. Webster & Rafael H. Bordini (2012):

Model checking agentprogramming languages . Automated Software Engineering https://dx.doi.org/10.1007/s10515-011-0088-x .[12] Maria Fox & Derek Long (2003):

PDDL2. 1: An extension to PDDL for expressing temporal planningdomains . Journal of artiﬁcial intelligence research

20, pp. 61–124, doi:10.1613/jair.1129.[13] Javier Garc´ıa, Fern & o Fern´andez (2015):

A Comprehensive Survey on Safe Reinforcement Learning . Journalof Machine Learning Research

Reactive reasoning and planning.

In:

AAAI , 87, pp. 677–682.[15] Malik Ghallab, Dana Nau & Paolo Traverso (2016):

Automated planning and acting . Cambridge UniversityPress.[16] James Harland, David N Morley, John Thangarajah & Neil Yorke-Smith (2014):

An operational semanticsfor the goal life-cycle in BDI agents . Autonomous agents and multi-agent systems

Aborting, suspending, andresuming goals and plans in BDI agents . Autonomous Agents and Multi-Agent Systems https://dx.doi.org/10.1007/s10458-015-9322-4 .[18] Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni & Christian Muise (2019):

An introduction to the plan-ning domain deﬁnition language . Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning . Stringer, R.C. Cardoso, X. Huang & L.A. Dennis [19] Koen V. Hindriks, Frank S. de Boer, Wiebe van der Hoek & John-Jules Ch. Meyer (2000):

Agent Program-ming with Declarative Goals . In:

Proceedings of the 7th International Workshop on Agent Theories andArchitectures, , Springer, pp. 228–243. Available at http://dx.doi.org/10.1007/3-540-44631-1_16 .[20] Xiaowei Huang, Marta Kwiatkowska, Sen Wang & Min Wu (2017):

Safety veriﬁcation of deep neural net-works . In:

International Conference on Computer Aided Veriﬁcation , Springer, pp. 3–29, doi:10.1007/978-3-319-63387-9 1. Available at https://dx.doi.org/10.1007/978-3-319-63387-9_1 .[21] Viviana Mascardi, Daniela Demergasso & Davide Ancona (2005):

Languages for Programming BDI-styleAgents: an Overview.

In:

WOA , 2005, pp. 9–15.[22] D. Mcdermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld & D. Wilkins (1998):

PDDL- The Planning Domain Deﬁnition Language . Technical Report TR-98-003, Yale Center for ComputationalVision and Control.[23] Jonathan Mugan & Benjamin Kuipers (2011):

Autonomous learning of high-level states and actionsin continuous environments . IEEE Transactions on Autonomous Mental Development

Learning symbolic models of stochas-tic domains . Journal of Artiﬁcial Intelligence Research

29, pp. 309–352, doi:10.1613/jair.2113.[25] Anand S Rao & Michael P Georgeff (1992):

An abstract architecture for rational agents. KR

92, pp. 439–449.[26] Anand S Rao, Michael P Georgeff et al. (1995):

BDI agents: from theory to practice.

In:

ICMAS , 95, pp.312–319.[27] Sebastian Sardina & Lin Padgham (2011):

A BDI agent programming language with failure handling,declarative goals, and planning . Autonomous Agents and Multi-Agent Systems

Brahms: a multi-agent modelling en-vironment for simulating work processes and practices . International Journal of Simulation and ProcessModelling https://dx.doi.org/10.1504/IJSPM.2007.015238 .[29] Richard Stocker, Maarten Sierhuis, Louise Dennis, Clare Dixon & Michael Fisher (2011):

A formal semanticsfor brahms . In:

International Workshop on Computational Logic in Multi-Agent Systems , Springer, pp. 259–274, doi:10.1007/978-3-642-22359-4 18.[30] Nicolas Troquard & Laure Vieu (2006):

Towards a Logic of Agency and Actions with Duration.

Frontiers inArtiﬁcal Intelligence and Applications

Challenges in the veriﬁcation of reinforcement learning algo-rithms . Technical Report NASA/TM-2017-219628, NASA Langley Research Center.[32] Willem Visser, Klaus Havelund, Guillaume Brat, SeungJoon Park & Flavio Lerda (2003):

Model CheckingPrograms . Automated Software Engineering

Planning with durative actions in stochastic domains . Journal of ArtiﬁcialIntelligence Research

31, pp. 33–82, doi:10.1613/jair.2269.[34] H˚akan LS Younes & Reid G Simmons (2004):

Solving generalized semi-Markov decision processes usingcontinuous phase-type distributions . In: