A String Diagrammatic Axiomatisation of Finite-State Automata
aa r X i v : . [ c s . F L ] N ov A String Diagrammatic Axiomatisationof Finite-State Automata
Robin Piedeleu and Fabio Zanasi ⋆ ( B ) University College London, UK, {r.piedeleu, f.zanasi}@ucl.ac.uk
Abstract.
We develop a fully diagrammatic approach to the theory offinite-state automata, based on reinterpreting their usual state-transitiongraphical representation as a two-dimensional syntax of string diagrams.In this setting, we are able to provide a sound and complete equationaltheory for language equivalence, with two notable features. First, the pro-posed axiomatisation is finite— a result which is provably impossible toobtain for the one-dimensional syntax of regular expressions. Second, theKleene star is a derived concept, as it can be decomposed into more prim-itive algebraic blocks.
Keywords: string diagrams · finite-state automata · symmetric monoidalcategory · complete axiomatisation
Finite-state automata are one of the most studied structures in theoretical com-puter science, with an illustrious history and roots reaching far beyond, in thework of biologists, psychologists, engineers and mathematicians. Kleene [25]introduced regular expressions to give finite-state automata an algebraic pre-sentation, motivated by the study of (biological) neural networks [32]. They arethe terms freely generated by the following grammar: e , f :: = e + f | e f | e ∗ | | | a ∈ A (1)Equational properties of regular expressions were studied by Conway [14] whointroduced the term Kleene algebra : this is an idempotent semiring with an oper-ation ( − ) ∗ for iteration, called the (Kleene) star. The equational theory of Kleenealgebra is now well-understood, and multiple complete axiomatisations, bothfor language and relational models, have been given. Crucially, Kleene alge-bra is not finitely-based: no finite equational theory can appropriately capturethe behaviour of the star [35]. Instead, there are purely equational infinitaryaxiomatisations [28,4] and Kozen’s finitary implicational theory [26].Since then, much research has been devoted to extending Kleene algebrawith additional operations, in order to capture richer patterns of behaviour,useful in program verification. Examples include conditional branching (Kleenealgebra with tests [27], and its recent guarded version [37]), concurrent compu-tation (CKA [19,23]), and specification of message-passing behaviour in net-works (NetKAT [1]). R. Piedeleu and F. Zanasi
The meta-theory of the formalisms above essentially rests on the same threeingredients: (1) given an operational model (e.g., finite-state automata), (2) de-vise a syntax (regular expressions) that is sufficiently expressive to capture theclass of behaviours of the operational model (regular languages), and (3) find acomplete axiomatisation (Kleene algebra) for the given semantics.In this paper, we open up a direct path from (1) to (3). Instead of thinkingof automata as a combinatorial model, we formalise them as a bona-fide (two-dimensional) syntax, using the well-established mathematical theory of stringdiagrams and monoidal categories [36]. This approach lets us axiomatise thebehaviour of automata directly, freeing us from the necessity of compressingthem down to a one-dimensional notation like regular expressions.This perspective not only sheds new light on a venerable topic, but has sig-nificant consequences. First, as our most important contribution, we are able toprovide a finite and purely equational axiomatisation of finite-state automata, upto language equivalence. Intriguingly, this does not contradict the impossibilityof finding a finite basis for Kleene algebra, as the algebraic setting is different:our result gives a finite presentation as a symmetric monoidal category, whilethe impossibility result prevents any such presentation to exist as an algebraictheory (in the standard sense). In other words, there is no finite axiomatisationbased on terms ( tree -like structures), but we demonstrate that there is one basedon string diagrams ( graph -like structures).Secondly, embracing the two-dimensional nature of automata guarantees astrong form of compositionality, that the one-dimensional syntax of regular ex-pressions does not have. In the string diagrammatic setting, automata may havemultiple inputs and outputs and, as a result, can be decomposed into subcom-ponents that retain a meaningful interpretation. For example, if we split theautomata below left, the resulting components are still valid string diagramswithin our syntax, below right: a ab a ba aa (2)In line with the compositional approach, it is significant that the Kleene star canbe decomposed into more elementary building blocks (which come together toform a feedback loop): e ∗ e (3)This property opens up for interesting possibilities when studying extensionsof Kleene algebra within the same approach. We elaborate further on this in theDiscussion (Section 6). String Diagrammatic Axiomatisation of Finite-State Automata 3
Finally, we believe our proof of completeness is of independent interest, as itrelies on fully diagrammatic reformulation of Brzozowski’s minimisation pro-cedure [12]. In the string diagrammatic setting, the symmetries of the equa-tional theory give this procedure a particularly elegant and simple form. Be-cause all of the axioms involved in the determinisation procedure come with adual, a co-determinisation procedure can be defined immediately by simply re-versing the former. This reduces the proof of completeness to determinisation.We should also note that this is not the first time that automata and regularlanguages are recast into a categorical mould. The iteration theories [6] of Bloomand Ésik, the sharing graphs [17] of Hasegawa or the
Network algebras [39] ofStefanescu are all categorical frameworks designed to reason about iteration orrecursion, that have found fruitful applications in this domain. They are basedon a notion of parameterised fixed-point operation which defines a categorical trace in the sense of [22]. While our proposal bears resemblance to (and is in-spired by) this prior work, it goes beyond in one fundamental aspect: it is thefirst to give a finite complete axiomatisation of automata up to language equiv-alence.A second difference is methodological: our syntax (see (4) below) does notfeature any primitive for iteration or recursion. In particular, the Kleene star isa derivative concept, in the sense that it is decomposable into more elementaryoperation (3). Categorically, our starting point is a compact-closed rather thana traced category.We elaborate further on the relation between our paper and existing workin the Discussion (Section 6).
Outline.
Section 2 lays out the diagrammatic syntax and its semantics. Sec-tion 3 introduces the equational theory that we will prove complete. In Section 4we explain the precise link between our syntax and the traditional formalismsof regular expressions and finite-state automata. We also show how a simplechange to the syntax captures context-free languages. Section 5 is dedicatedto the proof of completeness. We rely on a normal form argument, which im-plements Brzozowski’s minimisation algorithm equationally and whose mainingredient is a determinisation procedure for diagrams. Omitted proofs can befound in Appendix.
Syntax.
Let us fix an alphabet Σ of letters a ∈ Σ . We call Aut Σ the symmetricstrict monoidal category freely generated by the following objects and mor-phisms: – three generating objects ◮ (‘action’), ◮ (‘right’) and ◭ (‘left’) with their iden-tity morphisms depicted respectively as (4) R. Piedeleu and F. Zanasi – the following generating morphisms, depicted as string diagrams [36]: a ( a ∈ Σ ) (5)Freely generating Aut Σ from these data (usually called a symmetric monoidal the-ory [42,11]) means that morphisms of Aut Σ will be the string diagrams obtainedby pasting together (by sequential composition and monoidal product in Aut Σ )the basic components in (4)-(5), and then quotienting by the laws of symmetricmonoidal categories. For instance, (3) is a morphism of Aut Σ of type ◮ → ◮ , andis one of type ◮◮ ◮ → ◮ . Semantics.
We first define the semantics for string diagrams simply as a func-tion, and then discuss how to extend it to a functor from
Aut Σ to another cate-gory. Our interpretation maps generating morphisms to relations between reg-ular expressions and languages over Σ : J K = { (( e , e ) | e ∈ RegExp } J K = { ( e , e ∗ ) | e ∈ RegExp } q y = (cid:8)(cid:0) e , ( e , e ) (cid:1) | e ∈ RegExp (cid:9)
J K = { ( e , • ) | e ∈ RegExp } q y = { (( e , f ) , e f ) | e , f ∈ RegExp } J K = { ( • , 1 ) } r a z = (cid:8) ( • , a ) (cid:9) q y = { (( e , f ) , e + f ) | e , f ∈ RegExp } J K = { ( • , 0 ) } r z = (cid:8)(cid:0) L , ( K , K ) (cid:1) | L ⊆ K i , i =
1, 2 and L , K , K ⊆ Σ ⋆ (cid:9) J K = { ( L , • ) | L ⊆ Σ ⋆ } r z = { ( • , ( L , K )) | L ⊆ K | L , K ⊆ Σ ⋆ } J K = { (( L , K ) , L ⊆ K ) | L , K ⊆ Σ ⋆ } J K = { (( L , K ) , K ⊆ L ) | L , K ⊆ Σ ⋆ } s { = { (( e , L ) , K ) | J e K R L ⊆ K and e ∈ RegExp , L , K ⊆ Σ ⋆ } (6)and the converse relations for the mirror black generators. In (6), the seman-tics J e K R ∈ A ∗ of a regular expression e ∈ RegExp is defined inductively on e (see (1)), in the standard way: J e + f K R = J e K R ∪ J f K R J e f K R = { vw | v ∈ J e K R , w ∈ J f K R } J K R = { ε } J K R = ∅ J a K R = { a } J e ∗ K R = [ n ∈ N J e K nR where J e K n + R : = J e K R · J e K nR and J e K R = { ε } . The semantics highlights the dif-ferent roles played by red and black generators. In a nutshell, red generatorsstand for regular expressions ( the sum, is 0, the product, is1, the Kleene star, and a the letters of Σ ), and black generators for oper-ations on the set of languages ( is copy, is delete, and feed backoutputs into inputs, in a way made more precise later). These two perspectives, String Diagrammatic Axiomatisation of Finite-State Automata 5 which are usually merged, are kept distinct in our approach and only allowedto communicate via , which represents the product action of regularexpressions (the red wire) on languages.In order for this mapping to be functorial from
Aut Σ , we now introducea suitable target semantic category. Interestingly, this will not be the category Rel of sets and relations: indeed, the identity morphisms and arenot interpreted as identities of
Rel . Instead, the semantic domain will be thecategory
Prof B of Boolean(-enriched) profunctors [15] (also called in the literaturerelational profunctors [20] or weakening relations [33]).
Definition 1.
Given two preorders ( X , ≤ X ) and ( Y , ≤ Y ) , a Boolean profunctor R : X → Y is a relation R ⊆ X × Y such that if ( x , y ) ∈ R and x ′ ≤ X x , y ≤ Y y ′ then ( x ′ , y ′ ) ∈ R.Preorders and Boolean profunctors form a symmetric monoidal category
Prof B withcomposition given by relational composition, where the identity for an object ( X , ≤ X ) is the order relation ≤ X itself, and where the monoidal product is the usual product ofpreorders. The rich features of our diagrammatic language are reflected in the profunc-tor interpretation. Indeed, the order relation is built into the wires and. The two possible directions represent the identities on the ordered set oflanguages and the same set with the reversed order, respectively. The additionalred wire represents the set
RegExp of regular expressions, with equality asthe associated order relation. It is clear that all monochromatic generators sat-isfy the condition of Definition 1. Similarly, the action generator is aBoolean profunctor: if (( e , L ) , K ) are such that J e K R L ⊆ K and L ′ ⊆ L , K ⊆ K ′ then we have J e K R L ′ ⊆ J e K R L ⊆ K ⊆ K ′ by monotony of the product of lan-guages. We can conclude that Proposition 1.
The semantics J · K defines a symmetric monoidal functor of type Aut Σ → Prof B . In particular, because
Aut Σ is free, we can unambiguously assign meaning toany composite diagram from the semantics of its components using composi-tion and the monoidal product in Prof B : q c d y = (cid:8) ( L , K ) | ∃ M ( L , M ) ∈ q c y , ( M , K ) ∈ q d y (cid:9) s c c { = (cid:8)(cid:0) ( L , L ) , ( K , K ) (cid:1) | ( L i , K i ) ∈ q c i y , i =
1, 2 (cid:9)
Example 1.
We include here a worked out example to show how to compute thebehaviour of a composite diagram which, as we will see, represents the action Note that we can always consider any set with equality as a poset and that, therefore,
Rel is a subcategory of
Prof B , but not vice-versa, for the simple reason that the identityrelation of an arbitrary poset in Prof B is not mapped to the identity relation in Rel . R. Piedeleu and F. Zanasi by concatenation of the regular language a ∗ . uv a }~ = { ( L , K ) | ∃ M , N , O s.t. L , O ⊆ N , J a K R M ⊆ O , N ⊆ M , K } where O is a variable name assigned to the top wire of the feedback loop, M to the output wire of the action node, and N is the name assigned to the wirejoining to . Since J a K R = { a } we continue with = { ( L , K ) | ∃ M , N , O s.t. L , O ⊆ N , aM ⊆ O , N ⊆ M , K } = { ( L , K ) | ∃ M , O s.t. aM ⊆ O , L , O ⊆ M , L , O ⊆ K } = { ( L , K ) | ∃ M s.t. aM ⊆ M , L ⊆ M , L , M ⊆ K } = { ( L , K ) | ∃ M s.t. L ∪ aM ⊆ M , L , M ⊆ K } = { ( L , K ) | ∃ M s.t. a ∗ L ⊆ M , L , M ⊆ K } = { ( L , K ) | a ∗ L ⊆ K } where the penultimate step is an application of Arden lemma [2]: a ∗ L is theleast solution of the language inequality L ∪ aX ⊆ X . In Figure 1 we introduce = KAA , the (finite) equational theory of
Kleene ActionAlgebra , on
Aut Σ . It will be later shown to be complete for the given semantics.We explain some salient features of = KAA below. – (A1)-(A2) relate and , allowing us to bend and straighten wires at will.This makes the full subcategory of Aut Σ on ◮ and ◭ , modulo (A1)-(A2), a compact closed category [24]. (A3) allows us to eliminate isolated loops. Notethat the whole category is not compact closed because ◮ does not have adual. – The B block states that , forms a cocommutative comonoid (B1)-(B3), while , form a commutative monoid (B4)-(B6). Moreover,, , , form an idempotent bimonoid (B7)-(B11). (B12) al-lows us to eliminate trivial feedback loops. – The C block axiomatises the behaviour of the action of regular expressionson languages. These laws mimic the usual definition of the action of a semir-ing on a set, except for (C5) which is novel and captures the interaction withthe Kleene star. Here lies a distinctive feature of our theory: the behaviourof the Kleene star is derived from its decomposition as the feedback loopon the right of (C5). – The D block forces the action to be a comonoid ((D1)-(D2)) and monoid((D1)-(D2)) homomorphism. – The E block axiomatises the purely red fragment. Remarkably, these ax-ioms do not describe any of the actual Kleene algebra structure: they just
String Diagrammatic Axiomatisation of Finite-State Automata 7 (A1) = (A2) = (A3) = (B1) = (B2) = (B3) = (B4) = (B5) = (B6) = (B7) = (B8) = (B9) = (B10) = (B11) = (B12) = (C1) = (C2) = (C3) = (C4) = (C5) = (D1) = (D2) = (D3) = (D4) = (E1) = (E2l) = (E2r) = (E3) = (E4) = (E5) = a (E6) = aa a (E7) = (E8) = (E9) = (E10) = (E11) = (E14) = (E13) = (E15) = (E14) = Fig. 1.
Equational theory = KAA of Kleene action algebra. R. Piedeleu and F. Zanasi state that and form a commutative comonoid ((E1)-(E3)) and thatall other red generators are comonoid homomorphisms ((E4)-(E15)). Thismeans that the red fragment is actually the free (cartesian) algebraic theory( cf. [42,11]) on generators , , , , , a ( a ∈ Σ ) , wherethe remaining generators and act as copy and discard of vari-ables.Let = KAA be the smallest equational theory containing all equations in Fig. 1.Their soundness for the chosen semantics is not difficult to show and, for spacereasons, we omit the proof. We now state our completeness result, whose proofwill be discussed in Section 5.
Theorem 1 (Completeness).
For morphisms d, e in
Aut Σ , d = KAA e if and only if J d K = J e K .Remark 1. Some axiomatisations of Kleene algebra make use of a partial orderbetween terms, which can be defined from the idempotent monoid structure: f ≤ e iff e + f = e . At the semantic level, it corresponds to inclusion of lan-guages. Similarly, using the idempotent bimonoid structure of our equationaltheory, we can define a partial order on ◮ → ◮ diagrams: f ≤ e iff ef = e . This partial order structure can also be extended to all morphisms ◮ n → ◮ m by using the vertical composition of n copies of and m copies ofinstead. Remark 2.
There are no specific equations relating the atomic actions a ( a ∈ Σ ) . This is because, as we study finite-state automata, we are interested in the free monoid Σ ∗ over Σ . However, nothing would prevent us from modellingother structures. For instance, free commutative monoids (powers of N ), whoserational subsets correspond to semilinear sets [14, Chapter 11], would be ofparticular interest. A major appeal of our approach is that both regular expressions and automatacan be uniformly represented in the graphical language of string diagrams, andthe translation of one into the other becomes an equational derivation in = KAA .In fact, we will see there is a close resemblance between automata and the shapeof the string diagrams interpreting them — the main difference being that stringdiagrams are composable structures.In this section we describe how regular expressions (resp. automata) can beencoded as string diagrams, such that their semantics corresponds in a preciseway to the languages that they describe (resp. recognise).To define these encodings, it is convenient to introduce the following syntac-tic sugar. For any regular expression e , one may always construct a ‘red’ stringdiagram e : 0 → ◮ such that J e K = { ( • , e ) } . We will write e for its String Diagrammatic Axiomatisation of Finite-State Automata 9 composite with the action, as defined below left, with the particular case of aletter a ∈ Σ on the right: e : = e a : = a (7) In a sense, regular expressions are already part of the graphical syntax, as thered generators. However, these alone are meaningless, since their image underthe semantics is simply the free term algebra
RegExp (see (8)) . They acquiremeaning as they act on the black wire, whose semantics is the set of languagesover Σ . Using this action, we can inductively define an encoding h−i of regularexpressions into string diagrams of Aut Σ , as follows: h e + f i = fe (C4) = KAA ef h i = (C3) = KAA h e f i = fe (C1) = KAA e f h i = (C2) = KAA h e ∗ i = e (C5) = KAA e h a i = a = : a (8)For example, h ab ( a + ab ) ∗ i = ab = KAA a b aa b (9)As expected, the translation preserves the language semantics of regular ex-pressions in a sense that the following proposition makes precise.
Proposition 2.
For any regular expression e, J h e i K = { ( L , K ) | J e K R L ⊆ K } . Example (9) suggests that the string diagram h e i corresponding to a regular ex-pression e looks a lot like a nondeterministic finite-state automaton (NFA) for e . In fact, the translation h−i can be seen as the diagrammatic counterpart ofThompson’s construction [40] that builds an NFA from a given regular expres-sion.Instead of starting from regular expressions, one may translate NFA intostring diagrams directly. There are at least two ways to do that. The first is toencode an NFA as the diagrammatic counterpart of its transition relation. Thesecond is to translate directly its combinatorial representation as a graph intothe diagrammatic syntax. Encoding the transition relation.
This is a simple variant of the translation of ma-trices over semirings that has appeared in several places in the literature [29,42].Let A be an NFA with set of states Q , initial state q ∈ Q , accepting states F ⊆ Q and transition relation δ ⊆ Q × Σ × Q . We can represent δ as a stringdiagram d with | Q | incoming wires on the left and | Q | outgoing wires on theright.The left j th port of d is connected to the i th port on the right throughan a whenever ( q i , a , q j ) ∈ δ . To accommodate nondeterminism, when thesame two ports are connected by several different letters of Σ , we join theseusing and . When ( q i , ǫ , q j ) ∈ δ , the two ports are simplyconnected via a plain identity wire. If there is no tuple in δ such that ( q i , a , q j ) ∈ δ for any a , the two corresponding ports are disconnected.For example, the transition relation ofan NFA with three states and δ = { (( q , a , q ) , ( q , b , q ) , ( q , a , q ) , ( q , a , q )) } (disre-garding the initial and accepting states for the moment)is depicted on the right. Conversely, given such adiagram, we can recover δ by collecting Σ -weightedpaths from left to right ports. d = abaa To deal with the initial state, we add an additional incoming wire connectedto the right port corresponding to the initial state of the automaton. Similarly,for accepting states weadd an additional outgoing wire, connected to the leftports corresponding to each accepting state, viaif there is more than one. Finally, we trace out the | Q | wires of the diagrammatic transition relation to obtainthe associated string diagram. In other words, for aNFA with initial state q , set of accepting states F , tran-sition relation δ , we obtain the string diagram on theright, where d is the diagrammatic counterpart d fe | Q | | Q | of δ as defined above, e is the injection of a single wire as the first amongst | Q | wires, and f deletes all wires that are not associated to states in F with , andapplies to merge them into a single outgoing wire.For example, if A with δ as above has initial state q and accepting state { q } ,we get the diagram below left; instead, if all states are accepting, we obtain thediagram below right: abaa abaa String Diagrammatic Axiomatisation of Finite-State Automata 11
The correctness of this simple translation is justified by a semantic correspon-dence between the language recognised by a given NFA A and the denotationof the corresponding string diagram. Proposition 3.
Given an NFA A which recognises the language L, let d A be its asso-ciated string diagram, constructed as above. Then J d A K = { ( K , K ′ ) | LK ⊆ K ′ } .From graphs to string diagrams. The second way of translating automata intostring diagrams mimics more directly the combinatorial representation of au-tomata. The idea (which should be sufficiently intuitive to not need to be madeformal here) is, for each state, to use to represent incoming edges,and to represent outgoing edges. As above, labels a ∈ A will be mod-elled using a . For example, the graph and the associated string diagramcorresponding with the NFA above are a ab a ab aa (10)Note the initial state of the automaton corresponds to the left interface of thestring diagram, and the accepting state to the right interface. As before, whenthere are multiple accepting states, they all connect to a single right interface,via . For example, if we make all states accepting in the automaton above,we get the following diagrammatic representation: a ab a ab aa The previous discussion shows how NFAs can be seen as string diagrams oftype ◮ → ◮ . The converse is also true: we now show how to extract an automa-ton from any string diagram d : ◮ → ◮ , such that the language the automatonrecognises matches the denotation of d .In order to phrase this correspondence formally, we need to introduce someterminology. We call left-to-right those string diagrams whose domain and codomaincontain only ◮ , i.e. their type is of the form ◮ n → ◮ m . The idea is that, in anysuch string diagram, the n left interfaces act as inputs of the computation, andthe m right interfaces act as outputs . For instance, (10) is a left-to-right diagram ◮ → ◮ .A string diagram d is atomic if the only red generators occurring in d are ofthe form a . By unfolding all red components e in any left-to-right diagram,using axioms (C1)-(C5), we can prove the following statement. Proposition 4.
Any left-to-right diagram is = KAA -equivalent to an atomic one.
For instance, the string diagram on the left of (9) is = KAA -equivalent to theatomic one on the right.We call block of a certain subset of generators a vertical composite of thesegenerators followed by some permutations of the wires.
Definition 2. A matrix-diagram is a left-to-right diagram that factors as a block of , , followed by a block of a ( a ∈ Σ ) and finally, a block of , . To each matrix-diagram d we can associate a unique transition relation δ bygathering paths from each input to each output: ( q i , a , q j ) ∈ δ if there is a joining the i th input to the j th output.A transition relation is ǫ -free if it does not contain theempty word. It is deterministic if it is ǫ -free and, foreach i and each a ∈ Σ there is at most one j suchthat ( q i , a , q j ) ∈ δ . We will apply these terms to matrix-diagrams and the associated transition relation inter- abaa changeably. The example of Section 4.2 above, with the three blocks highlighted,is a matrix-diagram. It is ǫ -free but not deterministic since there are two a -labelled transitions starting from the third input.Given a matrix-diagram d : ◮ l + n → ◮ p + m , we will write d ij , with i = l , n and j = p , m , for the subdiagrams corresponding to the appropriate submatrices. Definition 3.
For any left-to-right diagram d : ◮ n → ◮ m , a representation is a matrix-diagram ˆ d : ◮ l + n → ◮ l + m , such that d mn = ˆ d mn l and ˆ d ll , ˆ d nl are ǫ -free. It is a deterministic representation if moreover ˆ d ll is deterministic. For example, given the string diagram below on the left, the one on the right isa representation for it, whose highlighted matrix-diagram is the same as above. ab aa = KAA abaa (11)We will refer to the associated matrix-diagram ˆ d as the transition matrix of agiven representation. From a ◮ → ◮ diagram with a representation ˆ d : ◮ l + → ◮ l + ,we can construct an NFA from its transition matrix ˆ d as follows: – its state set is Q = { q , . . . , q l } , i.e., there is one state for each wire of ˆ d ll ; – its transition relation built from ˆ d ll as described above; – its initial states Q are those q i for which there exists an index j such thatthe ij th coefficient of ˆ d l is non-zero (and therefore ǫ ); String Diagrammatic Axiomatisation of Finite-State Automata 13 – its final states F are those q j for which there exists an index i such that the ij th coefficient of ˆ d l is non-zero (and therefore ǫ );The construction above is the inverse of that of Section 4.2. Moreover the con-nection between the constructed automaton and the original string diagram issummarised in the following statement, which is a straightforward corollary ofProposition 3. Proposition 5.
For a diagram d : ◮ → ◮ with a representation ˆ d, let A ˆ d be the asso-ciated automaton, constructed as above. Then ˆ L is the language recognised by A ˆ d iff J d K = (cid:8) ( K , K ′ ) | ˆ LK ⊆ K ′ (cid:9) . The following proposition states that we can extract a representation from anystring diagram. Combined with Proposition 5 it can also be read as a
Kleenetheorem for our syntax.
Proposition 6 (Kleene’s Theorem for
Aut Σ ). Any left-to-right diagram has a rep-resentation.
We established a correspondence between ◮ → ◮ diagrams and automata.What about arbitrary left-to-right diagrams ◮ n → ◮ m ? Their semantics is fullycharacterised by a single regular language for each pair of input-output port (seeCorollary 1 below). As a result, the semantics of a given ◮ n → ◮ m diagram isfully characterised by an m × n array of languages. It is worth pointing out how a simple modification of the diagrammatic syn-tax takes us one notch up the Chomsky hierarchy, leaving the realm of regularlanguages for that of context-free grammars and languages.Our diagrammatic language allows to specify systems of language equa-tions of the form aX ⊆ Y . In this context, feedback loops can be interpreted asfixed-points, in systems in which a variable may appear both on the left andon the right of equations. For example, the automaton below left, and its cor-responding string diagram, below right, translate to the system of equations atthe center: a ab a ǫ ⊆ X aX ⊆ X bX ⊆ X aX ⊆ X aX ⊆ X ← [ ab aa (12)This translation can be obtained by simply labelling each state with a variableand adding one inequality of the form aX i ⊆ X j for each a -transition from state i to state j . The system we obtain corresponds very closely to the J − K -semanticsof the associated string diagram. The distinction between red and black wires can be understood as a typediscipline that only allows linear uses of the product of languages. It is legiti-mate and enlightening to ask what would happen if we forgot about red wiresand interpreted the action directly as the product. We would replace the actionby a new generator with semantics r z = { (cid:0) ( M , L ) , K (cid:1) | ML ⊆ K } .This would allow us to specify systems of language equations with unre-stricted uses of the product on the left of inclusions, e.g. UVW ⊆ X .Equations of this form are similar to the productionrules (e.g. X → UVW ) of context-free grammars andit is well-known that the least solutions of this class ofsystems are precisely context-free languages [14, Chap-ter 10]. For example we could encode the Dyck lan-guage X → XX | ( X ) | ǫ of properly matched paren-theses as least solution of the system ǫ ⊆ X , ( X ) ⊆ X , XX ⊆ X which gives the diagram on the right. )( This section is devoted to prove our completeness result, Theorem 1. We usea normal form argument: more specifically we mimic automata-theoretic re-sults to rewrite every string diagram to a normal form corresponding to a mini-mal deterministic finite automaton (DFA). We achieve it by implementing Brzo-zowski’s algorithm [12] through diagrammatic equational reasoning. The proofproceeds in three distinct steps.1. We first show (Section 5.1) how to determinise (the representation of) a dia-gram: this step consists in eliminating all subdiagrams that correspond tonondeterministic transitions in the associated automaton.2. We use the previous step to implement a minimisation procedure (Section 5.2)from which we obtain a minimal representation for a given diagram: this isa representation whose associated automaton is minimal—with the fewestnumber of states—amongst DFAs that recognise the same language. To dothis, we show how the four steps of Brzozowski’s minimisation algorithm—reverse; determinise; reverse; determinise—translate into diagrammatic equa-tional reasoning. Note that the first three steps taken together simply amountto applying in reverse the determinisation procedure we have already de-vised. That this is possible will be a consequence of the symmetry of = KAA .3. Finally, from the uniqueness of minimal DFAs, any two diagrams that havethe same denotation are both equal to the same minimal representation andwe can derive completeness of = KAA .We will now write equations in = KAA simply as = to simplify notation andsay that diagrams c and d are equal when c = KAA d .First, we use the symmetries of the equational theory to make simplify-ing assumptions about the diagrams we need to consider for the completenessproof. String Diagrammatic Axiomatisation of Finite-State Automata 15
A few simplifying assumptions.
Without loss of generality, the proof we giveis restricted to string diagrams with no ◮ in their domain as well as in theircodomain. This is simply a matter of convenience: the same proof would workfor more general diagrams, that may contain ◮ in their (co)domain, at the costof significantly cluttering diagrams. Henceforth, one can simply think of the la-bels for the action x as uniquely identifying one open red wire in a diagram.With this convention, two or more occurrences of the same x in a diagram canbe seen as connected to the same red wire on the left, via . The complete-ness of = KAA restricted to the monochromatic red fragment is a consequenceof [11, Theorem 6.1].Arbitrary objects in
Aut Σ are lists of the three generating objects. We havealready motivated focusing on string diagrams with no open red wires so thatthe objects we care about are lists of ◮ and ◭ . The following proposition impliesthat, without loss of generality, for the proof of completeness we can restrictfurther to left-to-right diagrams (Section 4.2). Proposition 7.
There is a natural bijection between sets of string diagrams of the form A B A B ↔ A B A B where A i , B i represent lists of ◮ and ◭ . Proposition 7 tell us that we can always bend the incoming wires to the left andoutgoing wires to the right before applying some equations, and recover theoriginal orientation of the wires by bending them into their original place later.
In diagrammatic terms, a nondeterministic transition of the automaton asso-ciated to (a representation of) a given diagram, corresponds to a subdiagramof the form aa for some a ∈ Σ . Clearly, using the definition of a : = a in (7) and the axiom (D1) = , we have aa = a (13)which will prove to be the engine of our determinisation procedure, along withthe fact that any red expression can be copied and deleted. The next two theo-rems generalise the ability to copy and delete to arbitrary left-to-right diagrams. Theorem 2.
For any left-to-right diagram d : ◮ m → ◮ n , we haved m nn (cpy) = d m n d n d m n (del) = m d m n d m (co-cpy) = d m nm n (co-del) = d m n For d : ◮ m → ◮ n , let d ij be the string diagram of type ◮ → ◮ obtained by compos-ing every input with except the i th one, and every output with exceptthe j th one. Theorem 2 implies that string diagrams are fully characterised bytheir ◮ → ◮ subdiagrams. Corollary 1.
Given d , e : ◮ m → ◮ n , d = KAA e iff d ij = KAA e ij , for all ≤ i ≤ m and ≤ j ≤ n. Thus, we can restrict our focus further to left-to-right ◮ → ◮ diagrams, withoutloss of generality. We are now able to devise a determinisation procedure forrepresentation of diagrams, which we illustrate below on a simple example. Proposition 8 (Determinisation).
Any diagram ◮ → ◮ has a deterministic repre-sentation.Example 2.a aaa cb aba aca = a ba caa (D1) = a ba ca = : bca a ∗ a ∗ (cpy) = bca a ∗ : = bca a ← [ a a cbDealing with useless states. Notice that our deterministic form is partial and thatthe determinisation procedure disregards useless states , i.e., parts of a string di-agram that are not on a path from the input to the output wire. None of thesecontribute to the semantics of the diagram and can be safely eliminated usingTheorem 2. If one prefers a total deterministic form—one in which the transi-tion relation not only contains each letter of Σ at most once out of each state, but precisely once. Conversely, one can use Theorem 2 (del)-(co-del) to introduce anadditional garbage state (corresponding to the empty set in the classical deter-minisation by subset construction), disconnected from the output, as a target forall undefined transitions. Rather than providing a tedious formal construction, String Diagrammatic Axiomatisation of Finite-State Automata 17 let us illustrate this point on the preceding example: there is only one transitionout of the initial state but we can add b and c -transitions to a new state thatdoes not lead anywhere, giving a total deterministic automaton, as follows. a a cb bca a (B2; del) = bca abc (B9) = bca abc ← [ a a cbcb As explained above, our proof of completeness is a diagrammatic reformulationof Brzozowski’s algorithm which proceeds in four steps: determinise, reverse,determinise, reverse. We already know how to determinise a given diagram.The other three steps are simply a matter of looking at string diagrams differ-ently and showing that all the equations that we needed to determinise them,can be performed in reverse.We say that a matrix-diagram is co-deterministic if the converse of its associ-ated transition relation is deterministic.
Proof (Proof of Theorem 1 (Completeness)).
We have a procedure to show that,if J d K = J e K , then there exists a string diagram f in normal form such that d = f = e . This normal form is the diagrammatic counterpart of the minimal automaton associated to d and e . In our setting, it is the deterministic repre-sentation equal to d and e with the smallest number of states. This is uniquebecause we can obtain from it the corresponding minimal automaton, which iswell-known to be unique. First, given any string diagram we can obtain a rep-resentation for it by Proposition 6. Then we obtain a minimal representation bysplitting Brzozowski’s algorithm in two steps.
1. Reverse; determinise; reverse.
A close look at the determinisation procedure(proof of Proposition 2 and Proposition 8 in Appendix) shows that, at eachstep, the required laws all hold in reverse. For example, we can replaceevery instance of (cpy) with (co-cpy). We can thus define, in a completelyanalogous manner, a co-determinisation procedure which takes care of thefirst three steps of Brzozowski’s algorithm, and obtain a co-deterministicrepresentation for the given diagram.
2. Determinise.
Finally from a direct application of Proposition 8, we can ob-tain a deterministic representation for the co-deterministic representationof the previous step. The result is the desired minimal representation andnormal form.
In this paper, we have given a fully diagrammatic treatment of finite-state au-tomata, with a finite equational theory that axiomatises them up to languageequivalence. We have seen that this allows us to decompose the regular opera-tions of Kleene algebra, like the star, into more primitive components, resultingin greater modularity. In this section, we compare our contributions with re-lated work, and outline directions for future research.Traditionally, computer scientists have used syntax or railroad diagrams tovisualise regular expressions and, more generally, context-free grammars [41].These diagrams resemble very closely our syntax but have remained mostlyinformal and usually restricted to a single input and output. More recently,Hinze has treated the single input-output case rigorously as a pedagogical toolto teach the correspondence between finite-state automata and regular expres-sions [18]. He did not, however, study their equational properties.Bloom and Ésik’s iteration theories provide a general categorical setting inwhich to study the equational properties of iteration for a broad range of struc-tures that appear in the semantics of programming languages [6]. They arecartesian categories equipped with a parameterised fixed-point operation whichis closely related to the trace operation that we have used to represent theKleene star. However, the monoidal category of interest in this paper is compact-closed (only the full subcategory over ◮ and ◭ to be precise), a property that isincompatible with the existence of categorical products (any category that hasboth collapses to a preorder [31]). Nevertheless, the subcategory of left-to-rightdiagrams (Section 4.2) is an iteration theory and, in particular a matrix itera-tion theory [5], a structure that Bloom and Ésik have used to give an (infinitary)axiomatisation of regular languages [4].Similarly, Stefanescu’s work on network algebra provides a unified algebraictreatment of various types of networks, including finite-state automata [39]. Ingeneral, network algebras are traced monoidal categories where the product isnot necessarily cartesian, and therefore more general than iteration theories. Inboth settings however, the trace is a global operation, that cannot be decom-posed further into simpler components. In our work, on the other hand, thetrace can be defined from the compact-closed structure, as was depicted in (3).Note that the compact closed subcategory in this paper can be recoveredfrom the traced monoidal category of left-to-right diagrams, via the Int construc-tion [22]. Therefore, as far as mathematical expressiveness is concerned, the twoapproaches are equivalent. However, from a methodological point of view, tak-ing the compact closed structure as primitive allows for improved composition-ality, as example (2) in the introduction illustrates. Furthermore, the compactclosed structure can be finitely presented relative to the theory of symmetricmonoidal categories, whereas the trace operation cannot. This matters greatlyin this paper, where finding a finite axiomatisation is our main concern.Finally, the idea of treating regular expressions as a free structure acting ona second algebraic structure also appeared in Pratt’s treatment of dynamic alge-bra , which axiomatises the propositional fragment of dynamic modal logic [34].
String Diagrammatic Axiomatisation of Finite-State Automata 19
Like our formalism, and contrary to Kleene algebras, the variety of dynamicalgebras is finitely-based. But they assume more structure: there the second al-gebraic structure is a Boolean algebra.In all the formalisms we have mentioned, the difficulty typically lies in cap-turing the behaviour of iteration—whether as the star in Kleene algebra [26,4],or a trace operator [6] in iteration theory and network algebra [39]. The axiomsshould be coercive enough to force it to be the least fixed-point of the languagemap L
7→ { ǫ } ∪ LK . In Kozen’s axiomatisation of Kleene algebra [26] for exam-ple, this is through (a) the axiom 1 + ee ∗ ≤ e ∗ (star is a fixpoint) and (b) the Hornclause f + ex ≤ x ⇒ e ∗ f ≤ x (star is the least fixpoint). In our work, (a) is a con-sequence of the unfolding of the star into a feedback loop and can be derivedfrom the other axioms. (b) is more subtle, but can be seen as a consequenceof (D1)-(D4) axioms. These allows us to (co)copy and (co)delete arbitrary di-agrams (Theorem 2) and we conjecture that this is what forces the star to bea single definite value, not just any fixed-point, but the least one. Making thisstatement precise is the subject of future work.The difficulty in capturing the behaviour of fixed-points is also the reasonwhy we decided to work with an additional red wire, to encode the action ofregular expressions on the set of languages—without it, global (co)copying and(co)deleting (Theorem 2) cannot be reduced to the local (D1)-(D4) axioms. Thereis another route, that leads to an infinitary axiomatisation: we could dispensewith the red generators altogether and take a (for a ∈ Σ ) as primitive in-stead, with global axioms to (co)copy and (co)delete arbitrary diagrams. Thiswould pave the way for a reformulation of our work in the context of iteration(matrix) theories, where the ability to (co)copy and (co)delete arbitrary expres-sions is already built-in. We leave this for future work.There is an intriguing parallelism between our case study and the positivefragment of relation algebra (also known as allegories [16]). Indeed, allegories,like Kleene algebra, do not admit a finite axiomatisation [16]. However, this re-sult holds for the standard algebraic theories. It has been shown recently thata structure equivalent to allegories can be given a finite axiomatisation whenformulated in terms of string diagrams in monoidal categories [9]. It seems likethe greater generality of the monoidal setting—algebraic theories correspondprecisely to the particular case of cartesian monoidal categories [11]—allowsfor simpler axiomatisations in some specific cases. In the future we would liketo understand whether this phenomenon, of which now we have two instances,can be understood in a general context. In the future we would like to under-stand whether this phenomenon, of which now we have two instances, can beunderstood in a general context.Lastly, extensions of Kleene Algebra, such as Concurrent Kleene Algebra(CKA) [19,23] and NetKAT [1], are increasingly relevant in current research.Enhancing our theory = KAA to encompass these extensions seems a promis-ing research direction, for two main reasons. First, the two-dimensional na-ture of string diagrams has been proven particularly suitable to reason aboutconcurrency (see e.g. [7,38]), and more generally about resource exchange be- tween processes (see e.g. [10,13,21,3,8]). Second, when trying to transfer thegood meta-theoretical properties of Kleene Algebra (like completeness and de-cidability) to extensions such as CKA and NetKAT, the cleanest way to proceedis usually in a modular fashion. The interaction between the new operators ofthe extension and the Kleene star usually represents the greatest challenge tothis methodology. Now, in = KAA , the Kleene star is decomposable into simplercomponents (see (3)) and there is only one specific axiom (C5) governing itsbehaviour. We believe this is a particularly favourable starting point to modu-larise a meta-theoretic study of CKA and NetKAT with string diagrams, takingadvantage of the results we presented in this paper for finite-state automata.
References
1. Anderson, C.J., Foster, N., Guha, A., Jeannin, J.B., Kozen, D., Schlesinger, C., Walker,D.: Netkat: semantic foundations for networks. ACM SIGPLAN Notices (1), 113–126 (2014)2. Arden, D.N.: Delayed-logic and finite-state machines. In: 2nd Annual Symposiumon Switching Circuit Theory and Logical Design (SWCT 1961). pp. 133–151. IEEE(1961)3. Baez, J.C., Fong, B.: A compositional framework for passive linear networks. Theory& Applications of Categories (2018)4. Bloom, S.L., Ésik, Z.: Equational axioms for regular sets. Mathematical structures incomputer science (1), 1–24 (1993)5. Bloom, S.L., Ésik, Z.: Matrix and matricial iteration theories. Journal of Computerand System Sciences (3), 381–439 (1993)6. Bloom, S.L., Ésik, Z., Zsuzsa, B.: Iteration theories: The equational logic of iterativeprocesses. Springer (1993)7. Bonchi, F., Holland, J., Piedeleu, R., Soboci ´nski, P., Zanasi, F.: Diagrammatic alge-bra: from linear to concurrent systems. In: Proceedings of the 46th Annual ACMSIGPLAN Symposium on Principles of Programming Languages (POPL) (2019)8. Bonchi, F., Piedeleu, R., Soboci ´nski, P., Zanasi, F.: Graphical affine algebra. In: Pro-ceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science(LICS) (2019)9. Bonchi, F., Seeber, J., Sobocinski, P.: Graphical conjunctive queries. In: 27th AnnualEACSL Conference Computer Science Logic, (CSL). vol. 119 (2018)10. Bonchi, F., Soboci ´nski, P., Zanasi, F.: The calculus of signal flow diagrams I: linearrelations on streams. Inf. Comput. , 2–29 (2017)11. Bonchi, F., Soboci ´nski, P., Zanasi, F.: Deconstructing Lawvere with distributive laws.Journal of logical and algebraic methods in programming , 128–146 (2018)12. Brzozowski, J.A.: Canonical regular expressions and minimal state graphs for defi-nite events. Mathematical theory of Automata (6), 529–561 (1962)13. Coecke, B., Kissinger, A.: Picturing Quantum Processes - A first course in QuantumTheory and Diagrammatic Reasoning. Cambridge University Press (2017)14. Conway, J.H.: Regular algebra and finite machines. Courier Corporation (2012)15. Fong, B., Spivak, D.I.: Seven sketches in compositionality: An invitation to appliedcategory theory. arXiv:1803.05316 (2018)16. Freyd, P.J., Scedrov, A.: Categories, allegories. Elsevier (1990)17. Hasegawa, M.: Recursion from cyclic sharing: Traced monoidal categories and mod-els of cyclic lambda calculi. pp. 196–213. Springer Verlag (1997)18. Hinze, R.: Self-certifying railroad diagrams. In: International Conference on Mathe-matics of Program Construction (MPC). pp. 103–137. Springer (2019) String Diagrammatic Axiomatisation of Finite-State Automata 2119. Hoare, C., Möller, B., Struth, G., Wehrman, I.: Concurrent Kleene algebra. In: Pro-ceedings of the 20th International Conference on Concurrency Theory (CONCUR).pp. 399–414. Springer (2009)20. Hyland, M., Schalk, A.: Glueing and orthogonality for models of linear logic. Theo-retical Computer Science (1-2), 183–231 (2003)21. Jacobs, B., Kissinger, A., Zanasi, F.: Causal inference by string diagram surgery. In:Proceedings of the 22nd International Conference on Foundations of Software Sci-ence and Computation Structures (FOSSACS). pp. 313–329. Springer (2019)22. Joyal, A., Street, R., Verity, D.: Traced monoidal categories. In: Mathematical Pro-ceedings of the Cambridge Philosophical Society. vol. 119, pp. 447–468. CambridgeUniversity Press (1996)23. Kappé, T., Brunet, P., Silva, A., Zanasi, F.: Concurrent Kleene algebra: Free model andcompleteness. In: Proceedings of the 27th European Symposium on Programming(ESOP) (2018)24. Kelly, G.M., Laplaza, M.L.: Coherence for compact closed categories. Journal of Pureand Applied Algebra , 193–213 (1980)25. Kleene, S.C.: Representation of events in nerve nets and finite automata. Tech. rep.,RAND PROJECT AIR FORCE SANTA MONICA CA (1951)26. Kozen, D.: A completeness theorem for Kleene algebras and the algebra of regularevents. Information and Computation (2), 366–390 (1994)27. Kozen, D.: Kleene algebra with tests. ACM Transactions on Programming Lan-guages and Systems (TOPLAS) (3), 427–443 (1997)28. Krob, D.: Complete systems of B-rational identities. Theoretical Computer Science (2), 207–343 (1991)29. Lack, S.: Composing PROPs. Theory and Application of Categories (9), 147–163(2004)30. Lafont, Y.: Equational reasoning with 2-dimensional diagrams. In: Comon, H., Joun-naud, J.P. (eds.) Term Rewriting, Lecture Notes in Computer Science, vol. 909, pp.170–195. Springer Berlin Heidelberg (1995)31. Lambek, J., Scott, P.J.: Introduction to higher-order categorical logic, vol. 7. Cam-bridge University Press (1988)32. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous ac-tivity. The bulletin of mathematical biophysics (4), 115–133 (1943)33. Moshier, A.M.: Coherence for categories of posets with applications. Topology, Al-gebra and Categories in Logic (TACL) p. 214 (2015)34. Pratt, V.: Dynamic algebras as a well-behaved fragment of relation algebras. In: In-ternational Conference on Algebraic Logic and Universal Algebra in Computer Sci-ence. pp. 77–110. Springer (1988)35. Redko, V.N.: On defining relations for the algebra of regular events. UkrainskiiMatematicheskii Zhurnal , 120–126 (1964)36. Selinger, P.: A survey of graphical languages for monoidal categories. Springer Lec-ture Notes in Physics (813), 289–355 (2011)37. Smolka, S., Foster, N., Hsu, J., Kappé, T., Kozen, D., Silva, A.: Guarded Kleene alge-bra with tests: verification of uninterpreted programs in nearly linear time. Proceed-ings of the ACM on Programming Languages (POPL) , 1–28 (2019)38. Soboci ´nski, P., Montanari, U., Melgratti, H., Bruni, R.: Connector algebras for C/Eand P/T nets’ interactions. Logical Methods in Computer Science (2013)39. Stefanescu, G.: Network Algebra. Discrete Mathematics and Theoretical ComputerScience, Springer London (2000)40. Thompson, K.: Programming techniques: Regular expression search algorithm.Communications of the ACM (6), 419–422 (1968)2 R. Piedeleu and F. Zanasi41. Wirth, N.: The programming language pascal. Acta informatica (1), 35–63 (1971)42. Zanasi, F.: Interacting Hopf Algebras: the theory of linear systems. Ph.D. thesis,Ecole Normale Supérieure de Lyon (2015) String Diagrammatic Axiomatisation of Finite-State Automata 23 A Proofs
A.1 Encoding regular expressions and automata
We write “ ; ” for relational composition, from left to right: R ; S = { ( x , z ) |∃ y , ( x , y ) ∈ R , ( y , z ) ∈ S } . Proof (Proof of Proposition 2).
By induction on the structure of regular expres-sions. The proposition holds by definition on the generators: J h a i K = { ( L , K ) | aL ⊆ K } . There are three inductive cases to consider. Assume that e and f sat-isfy the proposition. – For the e f case, J h e f i K = J h e i K ; J h f i K = { ( L , K ) | J e K R L ⊆ K } ; { ( L , K ) | J f K R L ⊆ K } . Hence, by monotony of the product, we have J h e f i K = { ( L , K ) | J e K R J f K R L ⊆ K } = { ( L , K ) | J e f K R L ⊆ K } . – For the case of e + f we have J h e + f i K = { ( L , K ) | ∃ K , K ⊆ K , L , L , s.t. L ⊆ L , L , J e K R L ⊆ K , J f K R L ⊆ K } = { ( L , K ) | ∃ L , L , s.t. L ⊆ L , L , J e K R L ⊆ K , J f K R L ⊆ K } = { ( L , K ) | ∃ L , L , s.t. L ⊆ L , L , J e K R L ∪ J f K R L ⊆ K } = { ( L , K ) | J e K R L ∪ J f K R L ⊆ K } = { ( L , K ) | ( J e K R ∪ J f K R ) L ⊆ K } = { ( L , K ) | J e + f K R L ⊆ K } . – Finally, for e ∗ , J h e ∗ i K = { ( L , K ) | ∃ M , N s.t. M , L ⊆ N , J e K R N ⊆ M , N ⊆ K } = { ( L , K ) | ∃ N s.t. J e K R N ⊆ N , L ⊆ N ⊆ K } = { ( L , K ) | ∃ N s.t. L ∪ J e K R N ⊆ N , L ⊆ N ⊆ K } = { ( L , K ) | ∃ N s.t. J e K ∗ R L ⊆ N , L ⊆ N ⊆ K } = { ( L , K ) | ∃ N s.t. J e ∗ K R L ⊆ N , L ⊆ N ⊆ K } = { ( L , K ) | J e ∗ K R L ⊆ K } where the fourth equation is a consequence of Arden’s lemma [2]: A ∗ B isthe smallest solution (for X ) of the language equation B ∪ AX ⊆ X . Proof (Proof of Proposition 3).
This is the diagrammatic counterpart of the repre-sentation of automata as matrices of regular expressions given in [26, Definition12].We write K for a vector of languages ( K , . . . , K Q ) and, for a square matrix oflanguages A , let AK be the language vector resulting from applying A to K inthe obvious way. By [26, Theorem 11], square language matrices form a Kleenealgebra, with the composition of matrices as product, component-wise unionas sum and the star defined as in [26, Lemma 10]. We also write write K ⊆ K ′ if the inclusions all hold component-wise. Furthermore, Arden’s lemma holdsin this slightly more general setting: the least solution of the language-matrixequation B ∪ AX ⊆ X is X = A ∗ B . Now, for a given automaton A we construct the diagram below as explainedin the main text: d fe | Q | | Q | with d the diagram encoding the transition relation of A , e the diagram encod-ing its initial state, and f the diagram encoding its set of final states. Let D bethe language matrix obtained from A by letting D ij = { a } if ( q i , a , q j ) is in thetransition relation of A . First, we have uwwv d | Q | | Q | }(cid:127)(cid:127)~ = { ( K , K ′ ) | ∃ M , N , M , K ⊆ N , DN ⊆ M , N ⊆ K ′ } = { ( K , K ′ ) | ∃ N , DN ⊆ N , K ⊆ N ⊆ K ′ } = { ( K , K ′ ) | ∃ N , K ∪ DN ⊆ N , K ⊆ N ⊆ K ′ } = { ( K , K ′ ) | ∃ N , D ∗ K ⊆ N , K ⊆ N ⊆ K ′ } = { ( K , K ′ ) | D ∗ K ⊆ K ′ } where the penultimate step holds by the matrix Arden’s lemma. Then, J e K and J f K pick out the component languages of D ∗ that correspond to the initial statesof A and some final state, and takes their union. Thus, we get d fe | Q | | Q | }(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)~ = J e K ; { ( K , K ′ ) | D ∗ K ⊆ K ′ } ; J f K = { ( K , K ′ ) | LK ⊆ K ′ } where L is the language accepted by the original automaton. Lemma 1.
Every left-to-right diagram that does not contain the generator isequal to a one that factors as a block of , , followed by a block of , .Proof. The equational theory = KAA restricted to the four generators , ,, coincides with the equational theory of relations between finite sets.The proof of this fact and a normal form that factorises as in the statement ofthe Lemma can be found in [30, Section 4].
Proof (Proof of Proposition 6).
First, we claim that we can always find c containingno action a such that d mn = c mn lx (14)where x : ◮ l → ◮ l is simply a vertical composition of l different a , a ∈ Σ . String Diagrammatic Axiomatisation of Finite-State Automata 25
To prove this claim, we reason by structural induction on
Aut Σ . For the basecase, if d is a , we have a (A1) = a = a (15)and every morphism that does not contain x is trivially in the right form,with the trace taken over the 0 object.There are two inductive cases to consider: – d is given by the sequential composition of two morphisms of the appropri-ate form (using the induction hypothesis). Then a pn x s y b t m = a pn sy b tmx (16) = c n sy tmx (17) – d is given as the monoidal product of two morphisms of the appropriateform. Then c m n x l x c m n l = c m n x l x c m n l (18) = m n x l x c m n l (19)In d mn = c mn lx (20)since c is contains no action nodes, by Lemma 1, it is equal to a matrix diagramwith only entries 0 or ǫ (i.e. a relation). In other words, we can assume that c factorises as a first layer of comonoid , , followed by a layer of per-mutations and a third layer of vertical compositions of the monoid , .Now, the action nodes in the trace distribute over by (D3) so that we canpush them inside c . The resulting matrix diagram d ′ is such that d ′ ll is ǫ -free, asneeded. A.2 Completeness
Proof (Proof of Proposition 7).
This proposition holds in any compact-closed cat-egory and relies on the ability to bend wires using and . Explicitly, givena diagram of the first form, we can obtain one of the second as follows: A B A B A B A B (21)The inverse mapping is symmetric. That they are inverse transformations fol-lows immediately from (A1)-(A2). Lemma 2.
For any e , we have e = ee e = Proof.
By structural induction. It also follows from the more general case of [42,Theorem 2.42].The class of left-to-right diagrams can be characterised inductively. We call trace the canonical feedback operation induced by and . Given a left-to-right diagram d : 1 + n → + m , its trace is defined as l d nm (22)Let Rat be the set of diagrams of
Aut Σ containing , , , , ,and all red generators, which is closed under the operations of vertical compo-sition, horizontal composition, and the trace. Clearly any diagram of Rat is aleft-to-right diagram. The converse is also true, up to = KAA , as a corollary ofLemma 6, which proves a stronger result about the form of left-to-right dia-grams.
Proposition 9.
Every left-to-right diagram is equal to one in
Rat . We can define the diagrammatic counterpart of matrices whose coefficientsare regular expressions.
Definition 4. A generalised matrix-diagram is a left-to-right diagram that factorsas a block of , , followed by a block of e ( e ∈ RegExp ) and finally, a blockof , . Just like for matrices, we call the regular expressions d ij that appear in a gener-alised matrix-diagram d : ◮ n → ◮ m the coefficients of d and index them by pairsof numbers {
1, . . . n } , {
1, . . . m } in the usual way. The following propositionshows that left-to-right diagrams are as expressive as matrices of regular ex-pressions. String Diagrammatic Axiomatisation of Finite-State Automata 27
Proposition 10.
Any left-to-right diagram is equal to a generalised matrix diagram.Proof.
We reason by structural induction, using the inductive characterisationof left-to-right diagrams of Proposition 9. The base cases are those of , , e ( e ∈ RegExp ) , , and which are all generalised matrix diagrams,by definition.By Proposition 9, there are three inductive cases to consider.1. Let d : ◮ n → ◮ m and c : ◮ m → ◮ l be two generalised matrix-diagrams. Con-sider their horizontal composite. By applying the bimonoid laws (B6)-(B8)as many times as needed we can commute the block of , of d pastthe block of , of c . Then, in the same way, we can apply (D3)-(D4)to commute the block of , of d past the block of e of c . Asa result, their horizontal composite is equal to a diagram that factors as ablock of , , followed by two blocks of e ( e ∈ RegExp ) and fi-nally, a block of , . Now, we may have two types of subdiagramsto eliminate. – If the diagram contains e e ′ then we can turn this into a singlecoefficient as follows: e e ′ = e e ′ = e ′ e = e e ′ – If the diagram contains e e then we can turn this into asingle coefficient as follows: e e = e e = e e = e + e In this way we merge the two blocks of regular expression coefficients intoa single block, as needed to obtain a generalised matrix-diagram.2. The case of vertical composition is immediate: if d : ◮ n → ◮ m and d : ◮ n → ◮ m are two generalised matrix-diagrams then so is their verticalcomposite.3. The remaining case is that of the trace. Suppose d : ◮ + n → ◮ + m is a gener-alised matrix-diagram. Then, there exists a regular expression d and gen-eralised matrix-subdiagrams c m , c n , and c nm such that d = c n c m c nm d n m Then d mn = c n c m c nm d n m (A1-A2) = c n c m c nm n md : = c n c m c nm n md (C5) = c n c m c nm n md = : c n c m c nm n md ∗ Then we can use the inductive case of composition above (1.) to obtain ageneralised matrix diagram from the horizontal composite of c n , d ∗ and c m , and therefore for the whole diagram, thus concluding the proof. Proof (Proof of Theorem 2).
By Proposition 10, any d as in the statement of thetheorem is equal to a generalised matrix-diagram. These are made up of con-secutive blocks of , , e , and , . Each equation in thestatement holds for all of these components. For example, (cpy) holds forby (B7), for by (B8), and for the e block by (D1) in conjunction withLemma 2 to copy e .Thus, a simple structural induction on the form of generalised matrix-diagramssuffices to prove the theorem. Lemma 3.
For any deterministic matrix-diagram d, there exists a deterministic matrix-diagram d ′ such that d mn = d ′ mn + Proof.
This is a straightforward induction on the structure of matrix-diagrams:they are formed of a layer of followed by a layer of action nodes x and String Diagrammatic Axiomatisation of Finite-State Automata 29 finally a layer of . We can use the bimonoid axioms (B7)-(B9) to push thepast the -layer and the ability to merge two a using co-copyingaxiom (D3), to push past the x -layer.The next lemma performs the key step in removing nondeterminism. Inmore transparent language, it asserts that if we have a diagram that correspondto a deterministic automaton, and we identify two inputs to two of its states,we can get rid of the potential nondeterminism that we have introduced, usingequational reasoning. Lemma 4.
For a matrix-diagram d : ◮ l + → ◮ l + with d ll and d l deterministic, thereexists a matrix-diagram d ′ : ◮ l ′ + → ◮ l ′ + with d ′ l ′ l ′ and d ′ l ′ deterministic such thatd l = d ′ l ′ Proof.
The idea is to identify the largest equivalent (i.e., that give the same lan-guage) subdiagrams, starting from any of the branches of , and pull themthrough to merge them. Thinking of the diagram as an automaton, thisamounts to identify the intersection of the languages recognised by the twostates that merges, to pull them through this generator, and therefore cre-ate a new state that recognises the intersection.Following this idea, we take the largest submatrix-diagram c of d such that d ll = e l cc l l l l l (23)for some deterministic e , and l = l + l + l . Note that if there is no such sub-diagram, we are done, since merging the two states does not introduce nonde-terminism. Otherwise we proceed as follows. First, replacing (23) in the contextof the statement, we obtain d l = l l l l l l ecc (B1) = l l l l l l ecc = l l l l l l e ′ cc where e ′ is the dashed box in the previous diagram. Note that we have not in-troduced more nondeterminism since, by construction, e ′ ll and e ′ l are determin-istic. Indeed, otherwise, c would not be the largest subdiagram satisfying (23)and we could add any additional nondeterministic transition in e ′ to it.We now focus on transforming the following subdiagram, which we isolatefor clarity. First, we can split c into two submatrix-diagrams c and c such that cc l l l l = c c c c l l l l (cid:8) Theorem 2-(cpy) (cid:9) = l l l l c c c (B1) = l l l l c c c String Diagrammatic Axiomatisation of Finite-State Automata 31 let c ∗ : = c = l l l l c c ∗ c ∗ (cid:8) Theorem 2-(co-cpy) (cid:9) = l l l l c c ∗ c ∗ c ∗ (A1-A2) = l l l l c c ∗ c ∗ c ∗ since c ∗ : = c = c c l l c l l c In context, this gives d l = e ′ c c l l c l l c Now, we can use Lemma 3 to absorb the two occurences of into e ′ . We get e ′′ such that d l = e ′′ c c l l c l l c Now, the dashed box in the diagram above is the d ′ required by the statementof the lemma: indeed, any further nondeterministic transition would mean that c as chosen above was not be the largest subdiagram satisfying the conditionsof (23). Proof (Proof of Proposition 8 (Determinisation)).
First, by Proposition 6, we canobtain a representation for any give left-to-right diagram. Thus, we only needto prove that, for any matrix-diagram d : ◮ l + → ◮ l + with d ll ǫ -free, there exists l ′ ∈ N and a matrix-diagram d ′ : ◮ l ′ + → ◮ l ′ + , with d ′ l ′ l ′ deterministic such that d l = d ′ l ′ We proceed by induction on the number of nondeterministic transitions in d .Recall that, in diagrammatic terms, a nondeterministic transition of the associ-ated automaton corresponds to a subdiagram of the form, for some a ∈ Σ : aa (24)If there are none there is nothing to do. Assuming that the statement of the the-orem holds for any matrix-diagram containing n nondeterministic transitions,let d be a matrix diagram with n + d ( ) with d l l = d ( ) l ( ) aal ( ) (D1) = d ( ) l ( ) l ( ) a and l ( ) = l −
3, for k the arity of the nondeterministic transition we picked. Thesecond equation is an immediate application of Theorem 2. Then, by tracing String Diagrammatic Axiomatisation of Finite-State Automata 33 out, we get d l = d ( ) l ( ) a (25) = d ( ) l ( ) a (26)Let d ( ) be the subdiagram in the dashed box above and let l ( ) = l ( ) +
1. Wenow have d l = d ( ) l ( ) (27)Note, that, by construction d ( ) contains n nondeterministic transitions. We cantherefore apply the induction hypothesis to determinise the following subdia-gram: d ( ) l ( ) From this, we obtain d ( ) deterministic such that d l = d ( ) l ( ) (A1-A2) = d ( ) l ( ) Applying Lemma 3 twice, we obtain d ( ) such that d l = d ( ) l ( ) and we are now able to apply Lemma 4 to eliminate the copying node ,obtaining d ′′