[PDF] Limited Lookahead Policies for the Control of Discrete-Event Systems: A Tutorial

Abstract

Some problems in discrete-event systems (DES) model large, time-varying state spaces with complex legal languages. To address these problems, Chung et al. introduced limited lookahead policies (LLP) to provide online supervisory control for discrete-event systems. This seminal paper, along with an addendum of technical results, provided the field with a series of very important and powerful results, but in a notationally- and conceptually-dense manner. In this tutorial, we present Chung et al.'s problem formulation for online control and unravel the formal definitions and proofs from their original work with the aim of making the ideas behind limited lookahead accessible to all DES researchers. Finally, we introduce the Air Traffic Control problem as an example of an online control problem and demonstrate the synthesis of LLP supervisors.

Full PDF

LLimited Lookahead Policies for the Control ofDiscrete-Event Systems: A Tutorial

Richard Hugh Moulton, Anthony J. Marasco, [email protected] Department of Electrical and ComputerEngineering, Queen’s University,Kingston ON, Canada [email protected] School of Computing, Queen’s Univer-sity, Kingston ON, CanadaDepartment of Electrical and ComputerEngineering, Royal Military College ofCanada, Kingston ON, Canada and Karen Rudie [email protected] Department of Electrical and ComputerEngineering and Ingenuity Labs Re-search Institute, Queen’s University,Kingston ON, Canada

June , Some problems in discrete-event systems (DES) model large, time-varying state spaces with complex legal languages. To address theseproblems, Chung et al. introduced limited lookahead policies (LLP)to provide online supervisory control for discrete-event systems. Thisseminal paper, along with an addendum of technical results, providedthe ﬁeld with a series of very important and powerful results, but ina notationally- and conceptually-dense manner. In this tutorial, wepresent Chung et al.’s problem formulation for online control andunravel the formal deﬁnitions and proofs from their original workwith the aim of making the ideas behind limited lookahead accessibleto all DES researchers. Finally, we introduce the Air Trafﬁc Controlproblem as an example of an online control problem and demonstratethe synthesis of LLP supervisors. T he main problem in control theory is the problem of providinga control signal to a process to ensure that only desirable behaviouris produced. In discrete event systems (DES) this is done throughsupervisory control: enabling or disabling events that cause the pro-cess to transition from one state to another. Although DES theoryprovides supervisors that are correct by construction, there are real-world applications whose characteristics make constructing suchsupervisors infeasible. These applications include processes with avery large state space, processes that are time-varying, and processeswhose desirable behaviours are difﬁcult to specify.Chung, Lin and Lafortune’s paper “Limited Lookahead Poli-cies in Supervisory Control of Discrete Event Systems” addressedthese challenges by providing a theoretical base for the develop-ment of DES supervisors that performed control in an online mannerrather than computing a monolithic control policy ofﬂine. This work Chung et al., b inspired a number of follow-on works and was recognized with theGeorge S. Axelby Outstanding Paper Award for the IEEE Transactionson Automatic Control . Nonetheless, we believe that wider use of theselimited lookahead policy (LLP) supervisors has been prevented bythe conceptually dense material that must be mastered. a r X i v : . [ ee ss . S Y ] J un r . h . moulton , a . j . marasco , and karen rudie In this tutorial we present Chung et al.’s problem formulation, su-pervisor characterization and formal proofs of correctness in a waythat emphasizes the underlying ideas. Our aim is to make the theo-retical results in “Limited Lookahead Policies in Supervisory Controlof Discrete Event Systems” usable for researchers and practitionersalike. We begin by introducing the

Air Trafﬁc Control (ATC) problem asan illustrative example of the online control problem we use through-out the tutorial (Section ). Next we present Chung et al.’s seminalresults (Sections through ) and demonstrate how these results areused to synthesize LLP supervisors (Section ). Finally, we providean overview of works that directly extend or apply ideas from theoriginal LLP supervisor (Sections and ). W e will use the ATC problem as a concrete example throughoutthis paper to illustrate the theoretical concepts we are discussing.This problem requires the formulation of a supervisor to control airtrafﬁc in the vicinity of an airﬁeld. It is useful because it is conceptu-ally easy to understand and because some of its characteristics makeit challenging for traditional DES approaches.

Figure : The Air Trafﬁc Controlproblem. The airspace to be con-trolled is shown: the airﬁeld isin the centre square; aircraft canarrive in any of the four corners,but can only leave from the topleft and bottom right corners; andonce an aircraft is in the airspace itmust travel in a clockwise manneraround the airﬁeld. The supervisorthat we synthesize must ensurethat aircraft can take off and landfrom this airﬁeld with minimalrestrictions while guaranteeingﬂight safety. lp for the control of des : a tutorial 3 As a discrete event system model, the ATC problem uses the alphabet Σ = Σ c ˙ (cid:83) Σ uc . The controllable events, denoted by Σ c ( ), representaircraft movements into a new section of airspace, as well as take-off ( δ i ), landing ( η i ), entering the airﬁeld’s vicinity ( ρ ji ), and leavingthe airﬁeld’s vicinity ( χ ji ). The uncontrollable events, denoted by Σ uc ( ), represent notiﬁcations of a pending arrival or take off and aredistinguished by a tilde, ~. All events are subscripted with the indexof the plane performing the action. Σ c = { (cid:101) , α i , β i , γ i , θ i , κ i , λ i , µ i , ν i , η i , δ i , ρ i , ρ i , ρ i , ρ i , χ i , χ i } i ∈ N + ( ) Σ uc = { ˜ ρ i , ˜ ρ i , ˜ ρ i , ˜ ρ i , ˜ δ i } i ∈ N + ( )In order to simplify the problem, we assume that all events areobservable, although subsequent work allowed for unobserved eventsin the plant (see Section ). The plant representing the uncontrolled airspace can be modelled asthe synchronous product of multiple subplants. This will allow us toreason about each component of the plant separately, without havingto consider the effects of the whole plant.First, there is a permanent subplant that represents the supervisorsfor adjacent spaces, which is itself a synchronous product betweensubplants for departures and arrivals (Figure ). Figure : Supervisors for adjacentspaces; the ﬁrst index in each staterepresents the departures controllerand the second index representsthe arrivals controller. The initialstate is the only marked state in G and it represents the state wherethere are no aircraft waiting forpermission to enter the airspace ortake off. Second, there is a subplant for each aircraft that is airborne orready to enter the airspace. These subplants will need to be added toor removed from the overall plant as aircraft arrive and depart from r . h . moulton , a . j . marasco , and karen rudie the airspace. An example of an aircraft subplant is shown in Figure ;these subplants are denoted G i where i is the aircraft’s index. Figure : A subplant representingan aircraft departing towards χ .The initial state is before the air-craft has requested to depart andthe marked state is when the theaircraft has successfully departedthe airspace. As in Chung et al.’s original paper, we assume that the languagegenerated by the plant is equal to the preﬁx-closure of the plant’smarked language: L ( G ) = L m ( G ) . Chung et al., b Both the large state space of the problem and the plant’s dynamicnature will make it difﬁcult to write down the language generatedby the plant, but we will also need to consider how we will describethe legal language, K . As in the original paper, we consider the legallanguage to be L m ( G ) -closed, that is K = K ∩ L m ( G ) . Chung et al., b At a high level, the supervisor must ensure that aircraft can takeoff from and land at this airﬁeld with minimal restrictions and whileguaranteeing ﬂight safety. In plain English, the speciﬁcations cap-tured by K are:• Two aircraft can’t occupy the same section of airspace;• An arriving aircraft can’t be delayed for more than one control-lable action;• A departing aircraft can’t be delayed for more than ten control-lable actions; and• Aircraft departing through the same gate must be separated by atleast ﬁve actions. It is clear from its formulation that the ATC problem meets all threeof Chung et al.’s criteria for problems requiring LLP control. lp for the control of des : a tutorial 5 First, there is an explosion in the size of the state space. If weconsider the case of ﬁve aircraft airborne at once, the synchronousproduct of the subplants representing the adjacent supervisors andeach aircraft has on the order of 6 × states, i.e. even with a grosslysimpliﬁed problem we have an impractical number of states.Second, the plant is dynamic. Its composition changes every timean aircraft enters (or exits) the airspace. In the DES model this resultsin a subplant being added (or dropped) from the overall plant’s syn-chronous product. This dynamic nature would require that an ofﬂinesupervisor be recomputed for every new aircraft.Third, the legal behaviour is difﬁcult to capture as a regular lan-guage. This results from the plant’s dynamic and modular natures:although event legality can be easily determined for any given pointin time, determining this for all possible futures requires the supervi-sor to keep track of all possible legal languages. A ll of the characteristics of DES problems that make LLPsupervisors necessary also make the analysis of the problems them-selves somewhat difﬁcult. We begin, therefore, by deﬁning and illus-trating some DES operations that will prove useful in our reasoningabout the problem as well as the supervisor’s own operations (alldeﬁnitions from Chung et al. ). Chung et al., b Deﬁnition (Active Set) . The active set in a language L ( G ) after the The active set allows the supervisor toreason about what actions are possiblein the plant from any given state. trace s is denoted Σ L ( G ) ( s ) and deﬁned as Σ L ( G ) ( s ) : = { σ ∈ Σ | s σ ∈ L ( G ) } . Example (Active Set) . At any point in time where there are noaircraft airborne, the active set for the ATC problem is restricted tothe alphabet of uncontrollable events. This is because the controllableevents are only applicable when there is an aircraft in the airspace tobe directed. Σ L ( G ) ( (cid:101) ) = Σ uc Deﬁnition (Post-language) . The post-language of a language L The post-language operation providesthe “lookahead” portion of LLP byallowing the future language of theplant to be determined on the basisof the history of the system. Since thepossibility of some events is dependenton the state of the plant, this kind ofoperation cannot be done ahead of timeand must be computed on-the-ﬂy. after the trace s is denoted L / s and deﬁned as L / s : = { t ∈ Σ ∗ | st ∈ L } . Example (Post-language) . If the aircraft from Figure is the onlyaircraft airborne and s = ˜ δ δ ν α β γ r . h . moulton , a . j . marasco , and karen rudie occurs, then the post-language contains strings such as L ( G ) / s = { θ , θ χ , ˜ ρ , ˜ ρ θ , ˜ ρ θ ρ , ˜ ρ θ ρ χ ,˜ ρ θ ρ χ α , ˜ ρ θ ρ χ α β ,˜ ρ θ ρ χ α β γ , ˜ ρ θ ρ χ α β γ η ,˜ ρ , ˜ ρ ρ , ˜ ρ ρ θ , ˜ ρ ρ θ γ ,˜ ρ ρ θ γ χ , ˜ ρ ρ θ γ χ η , . . . } Deﬁnition (Truncation) . The truncation of a language L to the The truncation operation provides the“limited” portion of LLP by allowingthe language of the plant to be trimmedto a tractable set of strings. As seenfrom Example it can be difﬁcultto write strings out in full, let aloneenumerate all possible strings. Instead,truncation restricts the language tothose of its members whose length is nomore than N . strings of length N is denoted L | N and deﬁned as L | N : = { t ∈ L | | t | ≤ N } where | t | is the length of the string t . Example (Truncation) . Extending Example the truncated post-language L ( G ) / s | contains strings such as { θ , θ χ , ˜ ρ , ˜ ρ θ , ˜ ρ θ ρ ,˜ ρ , ˜ ρ ρ , ˜ ρ ρ θ , . . . } W ith these useful definitions introduced , we now turn ourattention to formalizing the ideas behind LLP supervisors. An LLPsupervisor (Figure ) is a general supervisory framework and canbe modelled in many different ways, nevertheless we formalize theoperations in terms of automata, languages and strings as done bythe original authors. Chung et al., b Figure : An LLP supervisorprojects the plant’s behaviour forthe next N steps (adapted fromChung et al., b) The LLP control scheme (Algorithm ) aligns with traditional DESapproaches: the biggest departure is in the third step where the con-troller must make assumptions about plant behaviour that is neitherdeﬁnitely legal nor deﬁnitely illegal. In practice the controller doesthis by adopting an attitude towards behaviour of uncertain legal-ity: the controller can optimistically believe that all uncertain strings lp for the control of des : a tutorial 7 can be steered towards legality or the controller can conservativelyrestrict the plant to behaviour that is known to be legal. Algorithm : LLP Supervisor : while events occur in the plant do : Project the plant’s behaviour over the next N events L ( G ) / s | N and L m ( G ) / s | N ) : Determine legality of all strings K / s | N and K / s | N : Label pending strings according to attitude f Na ( s ) =  cons. K / s | N − optm. K / s | N ∪ ( K / s | N \ K / s | N − ) : Find the modiﬁed tree’s supremal controllable sublanguage f N ( s ) = [ f Na ( s )] ↑ / s | N : Output control action γ N ( s ) = f N ( s ) | ∪ Σ uc ∩ Σ L ( G ) ( s ) Algorithm is adapted from Chung etal., b. The LLP control scheme is aﬁve step process, with the supervisorperforming the following calculationsafter each event: predict the possiblefuture behaviour of the plant for thenext N steps; eliminate any tracesthe controller knows to be illegal;decide how to label the pending traces;calculate the pruned tree’s supremalcontrollable sublanguage; and producethe control action. For the ATC problem this algorithm leads the LLP supervisorthrough the following determinations: whether aircraft might arriveand/or take off and where all aircraft in the airspace can move to;which of these behaviours respect the speciﬁcations and which donot; taking either a conservative or optimistic attitude towards be-haviours that are not clearly legal or illegal; calculating the supremalcontrollable sublanguage of this pruned tree; and then outputting thecontrol action for all aircraft under its control.In order to be convinced that this formalization of an LLP supervi-sor leads to correct behaviour we must answer three questions: Chung et al., b . What happens if the supervisor says that there is no policy thatcan keep the plant’s behaviour legal?From a purely practical point of view, we are very interested inknowing how our supervisor will act when its calculations tell itthat illegal behaviour cannot be prevented. . Is there a reason to prefer one attitude over the other? How differ-ently do the optimistic and conservative attitudes behave?The two proposed attitudes seem to be suited to different scenarios. r . h . moulton , a . j . marasco , and karen rudie If this is true we would like to know how we can expect each attitudeto behave and in which scenarios we should pick one attitude overthe other. . Is it worth using an LLP supervisor instead of simply calculatingan ofﬂine supervisor in the traditional manner?Given the weight of the literature showing that traditional supervi-sors are correct-by-construction and provably correct, we would liketo know if we are trading away any of these guarantees when we de-cide to use an LLP supervisor. We will return to these three questionsthroughout the following presentation of Chung et al.’s technicalresults. In answering the ﬁrst question we get right to the heart of what dis-tinguishes LLP supervisors. In contrast with a traditional ofﬂinesupervisor, the characteristic and deﬁning error of the LLP supervisoris the run-time error.

Deﬁnition (Run-time error) . If s ∈ L ( G , γ N ) and f N ( s ) = ∅ , thenwe say that there is a run-time error happening in L ( G , γ N ) at trace s . A run time error exists when the LLP does not have any guar-anteed safe action, in the sense that there is no trajectory from thecurrent state of the system to a legal marked state that isn’t at riskof being driven uncontrollably into illegal behaviour. The answerto the ﬁrst question is therefore that if the supervisor says that nopolicy can prevent illegal behaviour in the plant then something hasgone wrong and the supervisor should have prevented the plant fromgetting into this state in the ﬁrst place.A special case of the run-time error is when it occurs at the be-ginning of the plant’s behaviour. This is called a starting error andit indicates that no supervisor can guarantee safe behaviour in theplant. Deﬁnition (Starting error) . If a run-time error occurs for s = (cid:101) , wecall this a starting error .With this in mind, we should begin by asking whether the ATCproblem has a starting error? If so, then it is not suitable for supervi-sory control and we must either reshape the problem or make use ofother techniques. lp for the control of des : a tutorial 9 The question remains whether a valid supervisor can be synthesizedfor the ATC problem with either the optimistic or conservative atti-tude. This is the question that we will focus on next, beginning withChung et al.’s deﬁnition of supervisor validity. Chung et al., b Deﬁnition (Valid supervisor) . An LLP supervisor with control

It is worth noting that the terminologyof an LLP supervisor’s validity ispotentially confusing given what itexpresses. Clearly, a valid supervisoravoids blocking states as well as stateswhich lead uncontrollably to illegalstrings. Deﬁnition is stronger thanthis, however, and also requires that thesupervisor be minimally restrictive. policy γ N is called valid if L ( G , γ N ) = K ↑ .A valid supervisor for the ATC problem would therefore enforceﬂight safety, but only by restricting aircraft when necessary. The take-away is that a valid LLP supervisor necessarily balances correctnesswith permissiveness. C hung et al . presented two possible attitudes that LLPsupervisors could take towards pending strings in their N -step looka-head windows: optimistic and conservative. We will consider the Chung et al., b size of language produced by each attitude, the effect of larger win-dows on these language and whether these attitudes result in validsupervisors.To begin, we state a result that will prove useful in our analysis.Put plainly, it states that if an attitude is at least as permissive forevery possible string as another attitude, then the language allowedby the former attitude will be at least as big as the language allowedby the latter. This is stated more formally in Lemma and is provedby induction. Chung et al., b Lemma . If γ i ( s ) ⊇ γ j ( s ) for all s ∈ Σ ∗ , then L ( G , γ i ) ⊇ L ( G , γ j ) . An LLP supervisor with the optimistic attitude marks all pending

This attitude will naturally allow everystring in K ↑ to occur, since there willnever be reason to believe that they willuncontrollably lead to an illegal state.It might also allow additional stringswhich do lead uncontrollably to illegalstrings, however, since the informationrequired to avoid these strings may lieoutside the lookahead window. strings as legal, which means that it will presume that it can steerany trace to a string in K until there is proof that it cannot. Example (Uncontrollable illegality) . In the ATC problem, an op-timistic supervisor could allow an aircraft in both the top-left andtop-middle sections of airspace (Figure ).This trace isn’t illegal itself and is the preﬁx of a marked string,but it could uncontrollably lead to an illegal string: the event ˜ ρ i would require two controlled actions to avoid having two aircraft . h . moulton , a . j . marasco , and karen rudie Figure : Although strings thatlead to this state are preﬁxes oflegal, marked strings, the plantcan lead uncontrollably to illegalbehaviour if the event ˜ ρ i occurs. in the same section of airspace but that would contravene the speciﬁ-cation that only one controlled event can happen before ρ i . The language produced by the optimistic attitude

The optimisticattitude prioritizes maximal permissiveness over preventing illegalbehaviour. The size of the language it produces, L ( G , γ Noptm ) , is prov-ably bounded between the preﬁx-closure of the supremal controllablesublanguage of K and the inﬁmal controllable superlanguage of K . Chung et al., b Theorem . K ↑ ⊆ L ( G , γ Noptm ) ⊆ K ↓ Intuitively, then, we hope that if we increase the size of the looka-head window then the LLP supervisor will have more of the infor-mation it needs to avoid illegal strings and that L ( G , γ Noptm ) = K ↑ forsome large enough N . Chung et al. proved this as Theorem whichstates that increasing the size of the lookahead window will never in-crease the size of the language allowed by an optimistic supervisor. Chung et al., b Theorem . L ( G , γ Noptm ) ⊇ L ( G , γ N + optm ) Run-time errors with the optimistic attitude

An LLP supervisorwith the optimistic attitude will never prohibit a string that is con-tained in K ↑ , so we know that it will be minimally restrictive. We dohave to verify, however, whether an illegal string might be allowed tooccur. Theorem . If a supervisor with the optimistic attitude is valid, thenthere will be no run-time errors in L ( G , γ Noptm ) . Proof.

Chung et al.’s proof of Theorem is by contradiction and Chung et al., b is broken down into two cases: when the current state’s active setcontains at least one uncontrollable action and, conversely, when thecurrent state’s active set is entirely controllable. lp for the control of des : a tutorial 111 . Consider when Σ u ∩ Σ L ( G ) ( s ) (cid:54) = ∅ .Because we assume the supervisor is valid and that a run-time errorhas occurred, we have that f Noptm ( s ) = K ↑ / s | = ∅ . But, γ Noptm ( s ) mustinclude the uncontrollable events that can occur in the plant, whichmakes it non-empty and a strict superset of K ↑ / s | . This contradictsthe deﬁnition of a valid supervisor, therefore γ Noptm is not valid. . Consider when Σ u ∩ Σ L ( G ) ( s ) = ∅ .Because we assume that a run-time error has occurred we have that s / ∈ K , otherwise the supervisor could permit (cid:101) to occur. If γ Noptm ( s ) = ∅ and s / ∈ K then our supervisor γ Noptm is blocking at s , which meansthat s is not a preﬁx of any string in K . From this is follows that L ( G , γ Noptm ) (cid:54) = K ↑ and therefore γ Noptm is not valid.For both cases, if a run-time error occurs then the supervisor wasnot valid, which is a contradiction. Therefore we conclude that if asupervisor with the optimistic attitude is valid, then there will be norun-time errors in L ( G , γ Noptm ) .This result is intuitive because by deﬁnition a valid supervisoris one that produces the preﬁx-closure of the supremal controllablesublanguage of the legal language. Unfortunately, the converse ofTheorem is not true. Chung et al. give a counterexample to showthis, with the key being the potential for an unlimited number ofuncontrollable events occurring. Chung et al., b An LLP supervisor with the conservative attitude marks all pending

This attitude will never permit a stringthat is not part of the plant’s supremalcontrollable sublanguage, but it mayalso prevent strings in K ↑ from occur-ring, since it may not be clear that theirpreﬁxes can be controlled all the way toa marked string. strings as illegal, which means that it presumes a trace leads uncon-trollably to an illegal string unless there is proof that it does not. Example (Overly restrictive) . In the ATC problem, this means thatthe supervisor could take more actions than necessary in order toensure the legality of the produced strings.For example, consider a supervisor responsible for controlling theconﬁguration shown in Figure If this supervisor is notiﬁed that anaircraft is ready to depart, the event ˜ δ i occurs, then the supervisormight unnecessarily delay that aircraft’s departure until all airborneaircraft have departed the airspace. The language produced by the conservative attitude

The con-servative attitude results in prioritizing the prevention of illegalbehaviour over maximal permissiveness. The size of the languageit produces, L ( G , γ Ncons ) , is provably bounded such that it is no larger . h . moulton , a . j . marasco , and karen rudie Figure : Having received notiﬁca-tion that an aircraft is ready to takeoff, the conservative attitude maylead a supervisor to prevent thisuntil all the airborne aircraft havedeparted the airspace. than the preﬁx-closure of the supremal controllable sublanguage of K : as long as K ↑ (cid:54) = ∅ , a conservative supervisor composed with theplant is guaranteed to produce a subset of K ↑ . Chung et al., b Theorem . K ↑ (cid:54) = ∅ ⇐⇒ L ( G , γ Ncons ) ⊆ K ↑ We would like to think, then, that by increasing the size of thelookahead window the LLP supervisor will be able to see that morestrings in K ↑ can be permitted and we will have L ( G , γ Ncons ) = K ↑ for some large enough N . Chung et al. formalize that increasing thelookahead window’s size will never decrease the size of the languageallowed by a conservative supervisor. Chung et al., b Theorem . L ( G , γ N + cons ) ⊇ L ( G , γ Ncons ) Proof.

Our proof follows Chung et al.’s from their addendum oftechnical results. Due to Lemma the result in Theorem will Chung et al., a follow if we show f Ncons ⊆ f N + cons ∀ s ∈ Σ ∗ .By the deﬁnition of the f Nu block (Algorithm , Line ) we have f Ncons : = ( K / s | N − ) ↑ / s | N = ( K / s | N − ) ↑ / s | N + .By the deﬁnition of truncation we have ( K / s | N − ) ⊆ ( K / s | N ) . lp for the control of des : a tutorial 13 Since the post-language and supremal controllable sublanguageoperations don’t alter inclusion relationships ( K / s | N − ) ⊆ ( K / s | N )= ⇒ ( K / s | N − ) ↑ / s | N + ⊆ ( K / s | N ) ↑ / s | N + = ⇒ f Ncons ⊆ f N + cons Run-time errors with the conservative attitude

From Theorem we can see that as long as the plant’s supremal controllable sublan-guage is nonempty then the language produced by a supervisor withthe conservative attitude will be nonempty as well. Theorem . If there is no starting error in L ( G , γ Ncons ) , then there willbe no run-time error in L ( G , γ Ncons ) . Proof.

Chung et al.’s proof proceeds by induction on the length of Chung et al., a the trace s .The base case is s = (cid:101) , where the length of s is . There is norun-time error at s because we have assumed that there is no startingerror in L ( G , γ Ncons ) .Our hypothesis is that no run-time error has occurred for s , a traceof length i , and we will show that this implies there is no run-timeerror for s σ , a trace of length i + σ ∈ γ Ncons .Since there was no run-time error at s we know that f Ncons ( s ) (cid:54) = ∅ ;as a minimum σ ∈ f Ncons ( s ) since the conservative attitude allowedit to occur. With this in mind, as well as Theorem ’s result, if weconsider the post-language f Ncons ( s ) / σ then we have { (cid:101) } ⊆ f Ncons ( s ) / σ = f N − cons ( s σ ) ⊆ f Ncons ( s σ ) which means f Ncons ( s σ ) (cid:54) = ∅ and no run-time error occurs at s σ .Given its emphasis on enforcing legal behaviour, it makes sensethat as long as the plant begins in a legal state, a supervisor with theconservative attitude will never allow the plant enter an illegal state. Having described the optimistic and conservative attitudes, we canask ourselves how do these attitudes relate to each other, i.e. Chunget al.’s second question. We ﬁrst consider the differences in the con-trol actions they produce for one step ahead in the plant, formulatedas Chung et al.’s ﬁrst comparison result. Chung et al., b Theorem (Comparing One Step Ahead) . f Mcons ( s ) | ⊆ f Noptm ( s ) | ∀ s ∈ Σ ∗ , ∀ N , M ∈ N . h . moulton , a . j . marasco , and karen rudie Although this result is obvious when N = M , it is less clear thatit should be true otherwise. With our previous results, however, theproofs are straightforward. Proof.

Consider three cases. . N = M . As stated above, this follows directly from the attitudes’respective deﬁnitions. . N < M . We have f Mcons ( s ) | ⊆ f Moptm ( s ) | ⊆ f Noptm ( s ) | because increasing the lookahead window size does not increasethe size of the language permitted by the optimistic attitude (Theo-rem ) . N > M . Similarly, we have f Mcons ( s ) | ⊆ f Ncons ( s ) | ⊆ f Noptm ( s ) | because increasing the the lookahead window size does not de-crease the size of the language permitted by the conservative atti-tude (Theorem )This is a powerful relationship between the two attitudes and withLemma it extends directly to comparing the languages producedby the two attitudes, formalized in Chung et al.’s second comparisonresult. Chung et al., b Theorem (Comparing Complete Languages) . L ( G , γ Mcons ) ⊆ L ( G , γ Noptm ) ∀ N , M ∈ N The key takeaways from Theorem are that the choice of attitudehas a deﬁnite effect on L ( G , γ N ) and that, unintuitively, this effect isindependent of the size of lookahead window used. The power ofthis result is that we can be sure that the conservative attitude willnever produce behaviour that is more permissive than the behaviourproduced by the optimistic attitude, regardless of their respectivelookahead window sizes. lp for the control of des : a tutorial 15 W e will now address Chung et al.’s third question: do we give upany guarantees or abilities when we forego an ofﬂine supervisor andwe use an LLP supervisor instead?The results covered in Section allow us to note that the se-quences of languages { L ( G , γ Ncons ) } ∞ N = and { L ( G , γ Noptm ) } ∞ N = are both bounded and monotone, and we can therefore conclude thattheir limits, L ( G , γ ∞ ) , exist. As a ﬁnal answer for the second questionwe would like to know if these limits are the same language and, toanswer the third question, are these limits the same as the languageproduced by an ofﬂine controller? Formally, we are asking whetheror not it is true that: lim N → ∞ L ( G , γ N ) = K ↑ ( )Unfortunately, Equation is not true in the general case, evenunder the assumption that L m ( G ) and K are regular languages. All Chung et al., b is not lost, however, since we can place conditions on the problemand on the value of N that do guarantee that L ( G , γ N ) = K ↑ . Thiscan be done whether or not K is preﬁx-closed. K is preﬁx-closed We start with the easier of the two cases, where the supervisor neverhas to worry about blocking, because the proofs are correspondinglysimpler to follow.To facilitate the analysis, Chung et al. introduce the measure N u ( L ) which is the length of the longest ﬁnite subtrace of uncon-trollable events that occurs in language L . Chung et al., b N u ( L ) : =  max {| s | | s ∈ Σ uc ∧ ∃ u , v ∈ Σ ∗ ( usv ∈ L ) } if it existsundeﬁned otherwise ( ) The optimistic attitude

Although we know that the absence ofrun-time errors does not imply that an optimistic supervisor is valid,we can make this claim if K = K . Chung et al. proved that if N can Chung et al., b be made large enough to ensure no run-time errors in L ( G , γ Noptm ) then it also ensures that γ Noptm is a valid supervisor. Chung et al., b . h . moulton , a . j . marasco , and karen rudie Theorem (Conditions on N – Case I, Optimistic Attitude) . If K = K then ( N ≥ N u ( L ( G )) + ) ∨ ( N ≥ N u ( K ) + )= ⇒ L ( G , γ Noptm ) = K ↑ .Of note, for the ﬁrst condition in Theorem ’s antecedent we needonly to look one event further than the longest ﬁnite uncontrollablesubtrace that can be generated by the plant. As shown in Figure if t is the longest ﬁnite subtrace of uncontrollable events in L ( G ) , then anLLP supervisor with the optimistic attitude only needs to look | t | + t to occur.By contrast, in the second condition we must look two eventsfurther because the plant may be capable of generating longer uncon-trollable subtraces than are included in the legal language. Considerin Figure the case where t is the longest ﬁnite subtrace of uncon-trollable events in K . Since it is still possible that the plant couldproduce additional uncontrollable events after t , the optimistic LLPsupervisor must look | t | + t occurs. Figure : Given that the trace s hasoccurred in the plant, how big alookahead window is needed to de-cide whether σ c should be enabledor disabled? The conservative attitude

Taking a different approach for theconservative attitude, we can show that if L ( G , γ Ncons ) has no start-ing error then the conservative supervisor is valid. The key is the Chung et al., b language K Npruned = K \ ( K / Σ N − u ) Σ ∗ ( )where if there is no starting error in L ( G , γ Ncons ) and if K ∩ Σ ( N − ) u = ∅ then L ( G , γ Ncons ) = ( K Npruned ) ↑ ( )Chung et al. prove that if the plant begins in a legal state and N can be made large enough to see the longest ﬁnite uncontrollablesubtrace in K (i.e. if K Npruned = K ) then it also ensures that γ Ncons is avalid supervisor (Theorem ). Chung et al., b Theorem (Conditions on N – Case I, Conservative Attitude) . If Because K ⊆ L ( G ) we can use N u ( L ( G )) to approximate N u ( K ) .This transforms Theorem into N ≥ N u ( L ( G )) + = ⇒ L ( G , γ Ncons ) = K ↑ and is useful since it may be morepractical to calculate L ( G ) than K forsome problems.(Chung et al., b). K = K and there is no starting error in L ( G , γ Ncons ) then N ≥ N u ( K ) + = ⇒ L ( G , γ Ncons ) = K ↑ . lp for the control of des : a tutorial 17 It is notable that, unlike for the optimistic attitude, it is not suf-ﬁcient that N ≥ N u ( L ( G )) + γ Ncons is a validsupervisor. This is because if the longest uncontrollable subtrace in L ( G ) appears in a string in K , then it is possible that K ∩ Σ N − u (cid:54) = ∅ and therefore K Npruned (cid:54) = K .This is illustrated in Figure by having a conservative LLP super-visor look | t | + t is the longest uncontrollablesubtrace in L ( G ) and even though the supervisor can see to the endof the subtrace in the plant’s behaviour, the supervisor cannot seeif any uncontrollable actions are possible after t and will thereforeconservatively prevent σ c from occurring. Characterizing the ATC Problem

Theorems and make it clearthat, at a minimum, a problem only admits valid LLP supervisors ifthe plant cannot generate an inﬁnitely long subtrace of uncontrollableactions. This is true of the ATC problem by construction, since thesupervisors for adjacent spaces (Figure ) have been restricted fromputting our supervisor in an uncontrollably illegal state. The longestuncontrollable subtrace that can occur in the plant is of length two,e.g. ˜ δ i ˜ ρ j , and after this subtrace occurs, the plant’s active set is guar-anteed to be restricted to controllable actions. K is not preﬁx-closed The case where K (cid:54) = K is more difﬁcult because our LLP supervisormust avoid blocking states as well as illegal states. Unfortunately, thisis the relevant case for many real-world problems, including the ATCproblem.To analyze this case, Chung et al. introduce the idea of a “frontier”of legal states beyond which the plant can uncontrollably end up inan illegal state. This leads to the deﬁnition of two sets of traces: legal,marked, controllable traces; and uncontrollably crossing traces. Chung et al., b Deﬁnition (Legal, marked, controllable traces) . The set of legal,marked, controllable traces is denoted K mc and is deﬁned as K mc : = { s ∈ K ∩ L c ( G ) } where the language L c ( G ) contains all the traces that can be pro-duced by the plant after which the active set contains only control-lable actions. L c ( G ) : = { s ∈ L ( G ) | s σ / ∈ L ( G ) ∀ σ ∈ Σ uc } Deﬁnition (Uncontrollably crossing traces) . The set of uncontrol-lably crossing traces is denoted K f ¯c and contains all the traces thatlead from K to L ( G ) \ K on account of an uncontrollable event. K f ¯c : = (cid:0)(cid:0) L ( G ) \ K (cid:1) / Σ uc (cid:1) ∩ K . h . moulton , a . j . marasco , and karen rudie The optimistic attitude

The challenge for an optimistic supervisoris that it may allow traces to occur that lead to run-time errors. Thekey, then, is to ensure that the supervisor knows which strings in K mc are preﬁxes of strings in K f ¯c , i.e. the supervisor must know whichlegal strings must be disallowed because they may uncontrollablylead the plant into illegal behaviour.This is achieved by deﬁning the measure N mcf ¯c , which is the lengthof the longest subtrace t that can occur after a legal, marked, control-lable string s such that st is the only illegal string in st / s . Chung et al., b N mcf ¯c : =  max {| t | | ∃ s ∈ K mc ∪ { (cid:101) } (cid:16) st ∈ K f ¯c ∧∀ (cid:101) < v < t (cid:16) sv / ∈ K f ¯c ∪ K mc (cid:17)(cid:17) } if it existsundeﬁned otherwise ( )Chung et al. then prove that an LLP supervisor with the optimisticattitude is valid as long as N is greater than N mcf ¯c . Chung et al., b In Figure consider the case where s σ c is a legal string and t isthe longest subtrace that can occur such that s σ c t is the only illegalstring in the plant’s post-language after s σ c . If N < N mcf ¯c + s σ c t and will optimistically allow the subtrace t to occur. If N ≥ N mcf ¯c + σ c and prevent the subtrace t from occurring. Theorem (Conditions on N – Case II, Optimistic Attitude) . If K ↑ (cid:54) = ∅ then N ≥ N mcf ¯c + = ⇒ L ( G , γ Noptm ) = K ↑ . The conservative attitude

In a similar fashion, the challenge fora conservative supervisor is that it may disallow traces that control-lably lead to legal, marked states. The answer here is to ensure thatthe supervisor can see deﬁnitively whether or not every branch leadsto a string in K mc or to an illegal string.The measure to accomplish this is N mcmc , which is deﬁned as thelength of the longest string t that can occur after a legal, marked,controllable string s such that st is itself is the only legal, marked andcontrollable string in st / s . Chung et al., b N mcmc : =  max {| t | | ∃ s ∈ K mc ∪ { (cid:101) } ( st ∈ K mc ∧∀ (cid:101) < v < t ( sv / ∈ K mc )) } if it existsundeﬁned otherwise ( ) lp for the control of des : a tutorial 19 This measure is used to formulate the guarantee in Theorem namely that if N is greater than N mcmc then a supervisor with theconservative attitude will have enough information to be a valid su-pervisor. Chung et al., b The reasoning that supports N mcmc as a measure is similar to thereasoning in support of N mcf ¯c . In Figure consider the case where s σ c is a legal string and t is the longest subtrace that can occur suchthat s σ c t is the only legal string in the plant’s post-language after s σ c . If N < N mcmc + s σ c t and will conservatively prevent thesubtrace t from occurring. If N ≥ N mcmc +

1, on the other hand, thesupervisor will have enough information to allow the event σ c and itwill not needlessly restrict the legal string s σ c t from occurring. Theorem (Conditions on N – Case II, Conservative Attitude) . If K = K mc and there is no starting error in L ( G , γ Ncons ) then N ≥ N mcmc + = ⇒ L ( G , γ Ncons ) = K ↑ . Characterizing the ATC problem

We recognize that the ATCproblem’s legal language, K , is not preﬁx-closed and that thereforeCase II applies. To begin, we must check if the ATC problem meetsthe conditions for ﬁnding a value of N where L ( G , γ N ) = K ↑ . . Does K ↑ (cid:54) = ∅ ?Unfortunately, no; the supremal controllable sublanguage of the ATCproblem is empty. Without synthesizing the whole legal language K ,we can consider the following example. Example (Showing that K ↑ = ∅ ) . Our supervisor receives a noti-ﬁcation that an aircraft will arrive from ρ whenever possible; thesenotiﬁcations are uncontrolled events, since they are the result of othersupervisors, and therefore can’t be disabled. But if the trace s = ˜ ρ ρ ˜ ρ α ρ ˜ ρ occurs, then we will have aircraft in the top-left and top-middle sec-tions of airspace and a third that must arrive to the top-left sectionafter a maximum of one controllable action.As seen in Example this cannot be done while respecting thespeciﬁcation that two aircraft cannot occupy the same section ofairspace. The only controllable action that can occur after s is β , butthen both α and ρ need to occur at the same time to prevent anillegal string.With this, we conclude that the ATC problem as formulated doesnot admit a supervisor that can guarantee all the requirements are . h . moulton , a . j . marasco , and karen rudie Figure : The scenario in Ex-ample can be made to occuruncontrollably by the supervisor ofthe neighbouring airspace. met, let alone such an LLP supervisor. If we make adjustments tothe plant, however, we can ensure that adjacent supervisors do notforce our supervisor into an illegal state. We formulate the followingadditional speciﬁcations and incorporate them into our plant:• Departing aircraft must be separated by ten actions; The result is a new plant that has onthe order of 10 states but ensures that K ↑ (cid:54) = ∅ . • Aircraft arriving through the same gate must be separated by ﬁveactions; and• Only ﬁve aircraft can be in the airspace at a given time. We have seen that, although in the general case L ( G , γ ∞ ) (cid:54) = K ↑ , it ispossible for some problems to choose a ﬁnite value for N such thatan LLP supervisor allows behaviour in the plant that is identical tothat allowed by an ofﬂine supervisor.To start, we must ensure some simple characteristics for the prob-lem, namely that K ↑ (cid:54) = ∅ and that there is no starting error in L ( G , γ N ) . The key is to calculate the appropriate measure, N u ( K ) , N u ( L ( G )) , N mcf ¯c or N mcmc , and to use this to set a lower bound on thesize of the lookahead window. We have seen that these measure areundeﬁned for some problems, however, which forces us to concludethat these problems do not admit a valid LLP supervisor with thegiven attitude. lp for the control of des : a tutorial 21 G iven that our application is safety - critical , it is temptingto think that our LLP supervisor should implement the conservativeattitude. Theorems and make it clear, however, that both theoptimistic attitude and conservative attitude can be used by a validsupervisor and that, in the limit, these result in the same language. In the ﬁrst case, Theorem tells us that we having N ≥ N mcf ¯c + ). Example ( N mcf ¯c for the ATC problem) . s = (cid:101) t = ˜ ρ ρ κ λ µ ν ˜ ρ α ρ κ λ µ ν ˜ ρ β ρ κ λ µ α ˜ ρ ρ γ κ λ ˜ ρ ν ρ We are required to take s = (cid:101) because K ∩ L c ( G ) = ∅ for the ATCproblem, which implies that K mc = ∅ .The string t is the longest string that takes us from s to an un-controllably crossing trace; this is achieved by having four aircraftarrive from the gate with the furthest possible travel time and arraythemselves in a “cross pattern” (Figure )). Now any action exceptfor η runs into the problem that an uncontrollable notiﬁcation re-garding a ﬁfth arriving aircraft can force us uncontrollably into illegalbehaviour.Based on our string t , we have N mcf ¯c = | t | =

28 and can thereforesay that an optimistic LLP supervisor for the ATC problem shouldhave a lookahead window of size N ≥ Theorem tells us that N ≥ N mcmc + K mc = ∅ for the ATC problem whichmeans that N mcmc is undeﬁned. This reﬂects that at any point in thesystem, before we can reach a marked state we may uncontrollably . h . moulton , a . j . marasco , and karen rudie Figure : The cross pattern. have another aircraft turn up in the airspace. Borrowing terminologyfrom other control theory domains, the ATC problem exhibits orbitalstability and not asymptotic stability.Interestingly, this means that we cannot ﬁnd a valid conservativeLLP supervisor with the current problem formulation. An obviousway to adjust the problem would be to mark all states representingsafe behaviour throughout the plant in order to allow for the calcula-tion of N mcmc . Example ( N mcmc for modiﬁed ATC problem) . s = (cid:101) t = ˜ ρ ρ κ λ µ ν ˜ ρ α ρ κ λ µ ν ˜ ρ β ρ κ λ µ α ˜ ρ ρ γ κ ν β λ µ α ˜ ρ ν ρ Although K mc (cid:54) = ∅ for the modiﬁed ATC problem, s = (cid:101) still gives usthe trace t of maximum length.The string t is the longest string that takes us from s to a string is K mc ; this is achieved by having four aircraft arrive and take as manyactions as possible before a ﬁfth aircraft arrives and prevents anyuncontrollable actions from happening.Based on t , we have N mcmc = | t | =

32 and can therefore say thata conservative LLP supervisor for the modiﬁed ATC problem shouldhave a lookahead window of size N ≥ lp for the control of des : a tutorial 23 Of particular note in the last two sections is that, although the op-timistic and conservative attitudes can both be used to produce L ( G , γ N ) = K ↑ , the particular value of N required for each maybe different. This results from the fact that the two attitudes need dif-ferent kinds of information to produce valid supervisors; dependingon the structure of the plant it may take more steps into the futurefor a given attitude to see what it needs to see. I n addition to the material presented in their seminalwork, the authors also published four notable extensions of LLP Chung et al., a; and Chung et al., b supervisors. These works introduced a technique to efﬁciently re-compute the lookahead tree at every step, a more nuanced attitudetowards pending strings, the ability to incorporate state information,and the ability to deal with partially observed plants. The original authors presented a method for efﬁciently calculatingthe lookahead tree in a recursive manner, addressing a practicalaspect of implementing the LLP control scheme. Chung et al.,

Algorithm : Recursive calculation for an LLP supervisor Input: An N -step lookahead tree where X t ( j ) is the set of states at the j thlevel of the tree that was calculated after the t th event. j =

1, ..., N and t =

0, 1, 2, .... : for all States in X t ( N ) do : if Conservative attitude then : The cost of this state is ∞ . : else if Optimistic attitude then : if x is an illegal state then : The cost of this state is ∞ . : else : The cost of this state is 0. : for Each layer, j , recursively back through the lookahead tree do : for all States in X t ( j ) do : if Conservative attitude and this state’s cost was 0 in the lastcomputed tree then : The cost of this state is 0. : else if Optimistic attitude and this state’s cost was ∞ in the lastcomputed tree then : The cost of this state is ∞ . Algorithm is adapted from Chunget al., . To begin, the states in the N th layer of the tree have not beenseen and are labelled according to thesupervisor’s attitude (Lines to ).Finally, if the cost of the initial stateis 0, then the least restrictive controlpolicy can be found using the “cost-to-go” values; if the cost of the initial stateis ∞ , then a run-time error has occurred(Lines to ). . h . moulton , a . j . marasco , and karen rudie : else if This state is marked or if this state is transient andthere is a controlled action that leads to a state with cost 0; and ifno uncontrollable action leads to a state with cost ∞ then : The cost of this state is 0 : else : The cost of this state is ∞ . : if The cost of all states in this layer are 0 then : Set the cost of all states in preceding layers to 0 and termi-nate. : else if The cost of all states in this layer match their costs in thelast computed tree then : Set the cost of all states in preceding layers to their costs inthe last computed tree. : if The cost of the initial state is then : return The least restrictive control policy. : else : return Run-time errorThis recursive calculation (Algorithm ) is based on converting theLLP control problem to a 0/ ∞ optimal control problem. In this prob-lem, the cost of a policy g ( s ) after trace s has occurred in the plant is0 if s is legal, if g ( s ) permits all uncontrollable actions that can occurin the plant after the trace s , and if g ( s ) is non-empty whenever s is apreﬁx of a marked trace but not marked itself. All other policy/tracecombinations are assigned a cost of ∞ . This setup allows a dynamicprogramming approach to be used and makes it simple to applywhen costs other than 0 and ∞ are desired. Chung et al. also extended their work by introducing variable looka-head policies (VLP) and the undecided cost, U . This cost is added Chung et al., to the optimal control framework that was introduced for recursivelycomputing the lookahead tree, and is deﬁned so that the following Chung et al., ordering exists amongst costs:0 < U < ∞ .The power of the “undecided” cost is that pending strings can beassessed truthfully as opposed to arbitrarily labelling them based onsupervisor attitude. This cost and its place in the ordering of costsreﬂect that an undecided string is not as desirable as a string withcost 0, but it is certainly more desirable than a string with cost ∞ .Additional reﬁnements presented included allowing the size ofthe lookahead window to be unbounded and again making intelli- lp for the control of des : a tutorial 25 Algorithm : VLP supervisor labels pending string, s Input:

A pending string s that leads from the plant’s current state to anotherstate in the N -step lookahead tree. : if | s | = N then : if s is not a preﬁx of a legal string then : Cost-to-go of s is ∞ : else : Cost-to-go of s is U : else if s leads to a marked, controllable state then : Cost-to-go of s is : else if s is not a preﬁx of a legal string then : Cost-to-go of s is ∞ : else : if An uncontrollable event can occur then : Cost-to-go of s is the maximum cost-to-go that occurs if an uncon-trollable event follows s : else : Cost-to-go of s is the minimum cost-to-go that occurs if a control-lable event follows s gent reuse of previously computed trees unless the underlying planthas varied in the meantime. The authors presented experimental re-sults from simulations and concluded that VLP supervisors expandsigniﬁcantly less of the lookahead tree than traditional LLP supervi-sors, that even a bounded VLP supervisor reduces the uncertainty incontrol policies, and that the unbounded VLP supervisor is the bestsupervisor on average. Algorithm is adapted from Chunget al., . Similar to the calculationsin Algorithm , a cost is assigneddirectly to s if it leads to the edge ofthe lookahead tree (Lines to ). Ifthe string leads to a state earlier on inthe lookahead tree, though, then thecosts-to-go of future states are used todetermine the cost-to-go of the currentstate (Lines to ). Chung et al.,

Work up to date had assumed that little to no information was avail-able regarding the underlying plant, although this allows easy gener-alization, it also ignores useful information. Hadj-Alouane et al.,

Algorithm : Online computation of control policy by VLP-S Input:

Initial state x and underlying plant G : Add all uncontrollable events that can occur in state x to the controlpolicy : if We don’t know the cost-to-go for x then : Calculate the cost-to-go of x using Algorithm : if The cost-to-go of x is not ∞ then : for all Controllable actions σ that can occur in state x do : if We don’t know the cost-to-go for δ ( x , σ ) then : Calculate the cost-to-go of δ ( x , σ ) using Algorithm : if The cost-to-go of δ ( x , σ ) is 0 then : Add σ to the control policy : return the control policy Algorithm is adapted from Hadj-Alouane et al., . To calculate astate’s cost-to-go, the lookahead win-dow is ﬁrst constructed by recursivelyexpanding the plant G from the cur-rent state x . This can be done withcustomizable stopping conditions toensure that the procedure halts and thatstates are only expanded if necessary(Line ). Once this is done, all of theillegal states, blocking states, and statesthat lead uncontrollably to these areassigned the cost-to-go ∞ . Any remain-ing states are assigned the cost-to-go 0,which allows the cost-to-go of the initialstate to be returned. . h . moulton , a . j . marasco , and karen rudie To address this, Hadj-Alouane et al. presented variable lookaheadpolicies with state information (VLP-S), which allows reduced com-putation whenever states appear in multiple locations in the tree(Algorithm ). The VLP-S supervisor constructs the control policy Hadj-Alouane et al., for the current state by allowing all uncontrollable events that canoccur in the plant and then any controllable actions that lead to stateswith a cost-to-go of 0.

Algorithm : Calculating the cost-to-go of a state with VLP-S Input:

State x and underlying plant G : Recursively expand the subplant of G from the state x : Assign all illegal states the cost-to-go ∞ : do : Assign all states that lead uncontrollably to a state with cost-to-go ∞ the same cost-to-go : Assign all blocking states the cost-to-go ∞ : while Any states were assigned a cost at Line : Assign all uncosted states the cost-to-go 0 : return the cost-to-go for state x Algorithm is adapted from Hadj-Alouane et al., .Algorithm , Lines and are of par-ticular note. The supervisor checks tosee if the cost-to-go of a given state canbe reused from a previous computation,which allows previously computedcosts-to-go to be reused for futureiterations of Algorithm . The last direct extension that we discuss is Hadj-Alouane et al.’swork to enable variable lookahead policies to consider plants underpartial observation (VLP-PO). The authors present this as an ex- Hadj-Alouane et al., tension of their previous work, VLP-S, to the speciﬁc case where thelegal language is preﬁx-closed.Since DES problems with partial observation do not have supremal

Varying this ordering can lead to adifferent maximal sublanguage; theauthors highlight that if this orderingis done with care then a desirablemaximal sublanguage can be obtaineddepending on the domain-speciﬁcmeaning of “desirable.” controllable and observable sublanguages, these policies generate in-stead a maximal controllable and observable sublanguage of the legallanguage on the basis of an argument ordering of the controllableevents. Hadj-Alouane et al.,

Only minimal changes are necessary: ﬁrst, the set of states that

The paper also presented a distributedversion of VLP-PO and showed that itproduced the same policies producedby the sequential VLP-PO. Both ver-sions were demonstrated on a resourceallocation problem, which allowed theauthors to estimate that the distributedversion would require a tenth of therun-time required by the sequentialVLP-PO. the plant could be in is computed based on the last observable event;then, the control action is computed in the manner of VLP-S. Thesecomputations are adapted to the partially observed nature of theplant and with respect to a global ordering provided for all events.Finally, the set of states that the plant could be in until the next ob-servable event is computed, which will be an argument the next timethat VLP-PO is called. lp for the control of des : a tutorial 27 M any other authors have contributed to the literature on limitedlookahead policies, building on Chung et al.’s broad conceptual base.We organize these into four categories (Table ) and cover them in thefollowing sections. Different Underlying Plants

Modifying an optimal controller to provide control for a partiallyobserved system.Decentralized and modular plants.Probabilistic DES.Hybrid systems.Finite State Machines with Variables.Partially observed Petri Nets with forbidden states.Fault-tolerant supervisors.

Different Online Supervisors

No explicit calculation of the supremal controlable sublanguage.Near-optimal control of dynamic DES .Enacting robust control.Learning optimal LLP control using reinforcement learning.

Different Calculations

Extending rather than truncating traces.Estimating the lookahead tree’s state-space size.

Applications

Robots — Navigation and task allocation.Software – Fault detection and enforcing concurrency restraints.

Table : An overview of the liter-ature investigating and applyingLLP supervisors. Heymann and Lin noted that synthesizing supervisors for partiallyobserved plants is an NP-hard problem and presented a process formodifying an optimal controller of the original, fully-observed sys-tem in an online fashion to provide control for a partially observedsystem. This process runs in O ( n ) time, where n is the number ofstates contained in the the plant composed with the legal speciﬁca-tion. This was improved upon by Ushio, whose method always Heymann and Lin, produces maximal controllable and observable sublanguages. Ushio,

For complex plants resulting from n disjoint subsystems, Minhasand Wonham showed how modular calculations could be used todetermine conditions on the size of the lookahead window. Online Minhas and Wonham, decentralized supervisors can be synthesized for these plants even if . h . moulton , a . j . marasco , and karen rudie their total structure is unknown beforehand. Dai and Lin,

The underlying plant might also be probabilistic. Winacott andRudie compared the performance of an LLP supervisory control ap-proach named Recursive Utility Based Limited Lookahead (RUBLL)in this setting against the optimal solution derived using MarkovChain analysis. They concluded that RUBLL’s performance was con-sistent with the optimal solution but was more readily applied toscenarios where exhaustive approaches are infeasible. Winacott and Rudie,

For hybrid systems, such as a switching circuit, Dupuis and Fancompared an LLP supervisor’s performance with that of an evolvedﬁnite state controller. They concluded that the LLP supervisor wasbetter adapted to systems with slow dynamics and a vector ﬁeldthat ﬂows towards the target, although their study only considered a -step lookahead policy for the LLP supervisor. Dupuis and Fan,

Finally, for partially observed Petri Nets Ru et al. synthesizedsupervisors that allow for control to be effected for any ﬁnite set offorbidden states. Ru et al.,

Gu et al. propose a method for online synthesis of DES supervisorsthat divides the strings in a supervisor’s lookahead tree into threetypes. Type I strings are those that include forbidden states and/orforbidden event strings; Type II strings include neither forbiddenstates nor forbidden event strings; and Type III strings include forbid-den event strings when the past behaviour of the plant is considered.Obviously the strings of Types I and III are illegal, while those ofType II need to be tested for legality. The authors propose such a testbased on the controllability of the related trace and claim that theirmethod “realizes optimal supervisory control without calculating thesupremal controllable sublanguage.” Gu Tianlong et al.,

An optimal control framework is natural to consider if the super-visor must maximize the reward gained instead of guaranteeing legalplant behaviour. Grigorov and Rudie proposed a method to effect(near-)optimal control of dynamic DES, considering aspects such as

A dynamic DES is one where thestructure of the underlying plant isitself time-varying. system reliability and normalizing for risk. A key takeway was thatlooking too far into the future and not far enough can both causeproblems when planning. Grigorov and Rudie,

Zhao et al. applied limited lookahead to deterministic ﬁnite statemachines with variables and showed how they could enforce safetyfor a power grid in an online manner. Zhao et al.,

Boroomand and Hashtrudi-Zad demonstrated that their RobustLimited Lookahead (RLL) supervisor could enact nonblocking robustcontrol by taking a conservative attitude towards pending strings in lp for the control of des : a tutorial 29 the lookahead tree. They also showed that RLL resulted in maximallypermissive policies if the lookahead window was of size N nn f , thelength of the longest neighbouring nonblocking frontier trace. Boroomand and Hashtrudi-Zad,

In order to allow for supervisors to adapt as they enact control,Umemoto and Yamasaki showed that a supervisor could learn anoptimal LLP by using reinforcement learning. Speciﬁcally, for a non-stationary environment the LLP supervisor is able to learn the statetransition probabilities, expected rewards and expected costs of en-acting control. The authors conducted simulations that showed howdifferent learning rates trade convergence rate for asymptotic perfor-mance. Umemoto and Yamasaki,

For situations where uncontrolled faults can occur in the plant, Daiet al. considered whether limited lookahead could be used to achievefault-tolerance in supervisors. Their proposed method containedthree parts: a learned nominal supervisor; a learned fault detector;and a post-fault supervisor. Necessary and sufﬁcient conditions arepresented for the existence of a satisfactory post-fault supervisor. Dai et al.,

Kumar et al. approached the idea of LLP from a notion of extend-ing rather than truncating, as Chung et al. and most other surveyedworks had done. The result of this approach is an extension basedLimited Lookahead (ELL) supervisor, which estimates the plant be-haviour by adding all ﬁnite-length event sequences to the N-stepprojection of plant behaviour. The key beneﬁt of this approach is thatit obviates the need for the supervisor to take an attitude by ensur-ing that no pending strings occur while requiring the same order ofcomputation as the traditional LLP supervisor. Kumar et al.,

Addressing a topic of practical concern, Winacott et al. presentedand analysed a method for estimating the state-space size for an LLPsupervisor’s lookahead tree. This method is based on the underlyingplant’s adjacency matrix and depends on estimating a parameter τ that captures the state space’s size. Winacott et al.,

Besides theoretical developments, there have also been a number ofworks applying LLP theory to speciﬁc problem domains.For robotics, Kobayashi and Ushio applied LLP to a Petri netmodel of mobile robots navigating through a building, while Kobayashi and Ushio,

Tsalatsanis et al. used LLP to dynamically allocate tasks to a teamof robots. Tsalatsanis et al.,

For software applications, Zhao et al. used limited lookahead DESto actively monitor software in order to predict faults and avoid them . h . moulton , a . j . marasco , and karen rudie before they occurred. Auer et al. considered the problem of mul- Zhao et al., tithreaded applications and applied limited lookahead to guaranteethat concurrency restraints were enforced. . Auer et al.,

10 CONCLUSION I n this tutorial we introduced the basic terminology requiredwhen discussing LLP supervisors. Although the underlying ideasfor LLP supervisors are intuitive, we presented Chung et al.’s formalproofs to rigorously conﬁrm their correctness. We demonstrated thatboth the optimistic and conservative attitudes can be used to producevalid supervisors as well as the conditions for which this is the case.The ATC problem illustrated Chung et al.’s ideas throughout thetutorial, including: characterizing the underlying problem, recogniz-ing when the problem did not allow for LLP control and highlightingthat the required window size may vary based on the supervisor’sattitude. The ATC problem also illuminated some possible routes fordeveloping the theory of LLP supervision, including the idea thatsome plants may exhibit a kind of orbital stability. In such cases itwould be useful to bound the window size N which guarantees thatthe supervisor can prevent illegal behaviour and ensure that eachof its subplants reaches their respective marked states in ﬁnite timewithout requiring that the plant as a whole reaches its marked state.Finally, we reviewed extensions to Chung et al.’s original paper,many by the original authors, as well as applications of this theory toa wide variety of speciﬁc problems and problem classes. We believethat this tutorial will enable other DES practitioners to make use ofLLP supervisors when appropriate in their own work. ACKNOWLEDGMENTS

The authors acknowledge that Queen’s University is situated ontraditional Anishinaabe and Haudenosaunee Territory.All DFA ﬁgures were produced using the Integrated Discrete

IDES is available under the AGPL- . open source license at https://github.com/krudie/IDES . Event Systems (IDES) software. Rudie,

This research was supported by the Natural Sciences and Engi-neering Research Council of Canada as well as the Faculty of Engi-neering and Applied Science at Queen’s University. lp for the control of des : a tutorial 31 REFERENCES

Anthony Auer, Juergen Dingel, and Karen Rudie. Concurrency controlgeneration for dynamic threads using discrete-event systems.

Science ofComputer Programming , : – , . doi : . /j.scico. . . .Farzam Boroomand and Shahin Hashtrudi-Zad. A Limited Lookahead Policyin Robust Nonblocking Supervisory Control of Discrete Event Systems.In Proceedings of the American Control Conference , pages – . IEEE, Jun . doi : . /ACC. . .Sheng-Luen Chung, Stéphane Lafortune, and Feng Lin. Addendum to "Lim-ited Lookahead Policies in Supervisory Control of Discrete Event Systems":Proofs of Technical Results (Report No. CGR- - ). Technical report, Col-lege of Engineering, University of Michigan, Ann Arbor, Michigan, a.Sheng-Luen Chung, Stéphane Lafortune, and Feng Lin. Limited lookaheadpolicies in supervisory control of discrete event systems. IEEE Transactionson Automatic Control , ( ): – , b. doi : . / . .Sheng-Luen Chung, Stéphane Lafortune, and Feng Lin. Recursive computa-tion of limited lookahead supervisory controls for discrete event systems. Discrete Event Dynamic Systems: Theory and Applications , ( ): – , Mar . doi : . /BF .Sheng-Luen Chung, Stéphane Lafortune, and Feng Lin. Supervisory controlusing variable lookahead policies. Discrete Event Dynamic Systems: Theoryand Applications , ( ): – , . doi : . /BF .Jin Dai and Hai Lin. A learning-based synthesis approach to decentral-ized supervisory control of discrete event systems with unknown plants. Control Theory and Technology , ( ): – , . doi : . /s - - - .Jin Dai, Ali Karimoddini, and Hai Lin. Achieving fault-tolerance and safetyof discrete-event systems through learning. Proceedings of the American Con-trol Conference , -July: – , . doi : . /ACC. . .Jean François Dupuis and Zhun Fan. Comparing an evolved ﬁnite statecontroller for hybrid system to a lookahead design. IEEE Congress onEvolutionary Computation , . doi : . /CEC. . .Lenko Grigorov and Karen Rudie. Near-Optimal Online Control of DynamicDiscrete-Event Systems. Discrete Event Dynamic Systems , ( ): – , Dec . doi : . /s - - -x.Gu Tianlong, Gao Jinchang, and Zhou Chunhui. On-line synthesis ofsupervisors for discrete events in automated manufacturing systems.In Proceedings of the IEEE International Conference on Industrial Tech-nology (ICIT’ ) , pages – , Shanghai, China, . IEEE. doi : . /ICIT. . .Nejib Ben Hadj-Alouane, Stéphane Lafortune, and Feng Lin. Variable looka-head supervisory control with state information. IEEE Transactions onAutomatic Control , ( ): – , . doi : . / . . . h . moulton , a . j . marasco , and karen rudie Nejib Ben Hadj-Alouane, Stéphane Lafortune, and Feng Lin. Centralized anddistributed algorithms for on-line synthesis of maximal control policiesunder partial observation.

Discrete Event Dynamic Systems , ( ): – ,Oct . doi : . /BF .Michael Heymann and Feng Lin. On-line control of partially observeddiscrete event systems. Discrete Event Dynamic Systems: Theory and Applica-tions , ( ): – , Jul . doi : . /BF .Keigo Kobayashi and Toshimistu Ushio. An application of LLP supervisorycontrol with Petri net models in mobile robots. In IEEE InternationalConference on Systems, Man and Cybernetics , volume , pages – ,Nashville, USA, . IEEE. doi : . /ICSMC. . .Ratnesh Kumar, Hok M. Cheung, and Steven I. Marcus. Extension basedlimited lookahead supervision of discrete event systems. Automatica , ( ): – , Nov . doi : . /S - ( ) - .Rajinderjeet Minhas and W.M. Wonham. Online supervision of discrete eventsystems. In Proceedings of the American Control Conference , volume , pages – , Denver, USA, . IEEE. doi : . /ACC. . .Yu Ru, Maria Paola Cabasino, Alessandro Giua, and Christoforos N. Hadji-costis. Supervisor synthesis for discrete event systems under partial obser-vation and arbitrary forbidden state speciﬁcations. Discrete Event DynamicSystems: Theory and Applications , ( ): – , . doi : . /s - - - .Karen Rudie. The Integrated Discrete-Event Systems Tool. th Interna-tional Workshop on Discrete Event Systems , pages – , . doi : . /WODES. . .Athanasios Tsalatsanis, Ali Yalcin, and Kimon P. Valavanis. Dynamic taskallocation in cooperative robot teams. Robotica , ( ): – , . doi : . /S .Hijiri Umemoto and Tatsushi Yamasaki. Optimal LLP Supervisor for DiscreteEvent Systems Based on Reinforcement Learning. In IEEE InternationalConference on Systems, Man, and Cybernetics , pages – , Kowloon,China, Oct . IEEE. doi : . /SMC. . .Toshimitsu Ushio. On-Line Control of Discrete Event Systems with a Max-imally Controllable and Observable Sublanguage. IEICE Transactions onFundamentals of Electronics, Communications and Computer Sciences , E -A( ): – , .Creag Winacott and Karen Rudie. Limited lookahead supervisory control ofprobabilistic discrete-event systems. In th Annual Allerton Conference onCommunication, Control, and Computing , pages – , Monticello, USA, . IEEE. doi : . /ALLERTON. . .Creag Winacott, Behnam Behinaein, and Karen Rudie. Methods for theestimation of the size of lookahead tree state-space. Discrete Event DynamicSystems , ( ): – , Jun . doi : . /s - - -y. lp for the control of des : a tutorial 33 Changzhi Zhao, Wei Dong, and Zhichang Qi. Active Monitoring for ControlSystems under Anticipatory Semantics. In th International Conference onQuality Software , pages – , Zhangjiajie, China, Jul . IEEE. doi : . /QSIC. . .Junhui Zhao, Yi Liang Chen, Zhong Chen, Feng Lin, Caisheng Wang,and Hongwei Zhang. Modeling and control of discrete event sys-tems using ﬁnite state machines with variables and their applicationsin power grids. Systems and Control Letters , ( ): – , . doi : . /j.sysconle. . .010