[PDF] A Fast Graph Program for Computing Minimum Spanning Trees

Abstract

When using graph transformation rules to implement graph algorithms, a challenge is to match the efficiency of programs in conventional languages. To help overcome that challenge, the graph programming language GP 2 features rooted rules which, under mild conditions, can match in constant time on bounded degree graphs. In this paper, we present an efficient GP 2 program for computing minimum spanning trees. We provide empirical performance results as evidence for the program's subquadratic complexity on bounded degree graphs. This is achieved using depth-first search as well as rooted graph transformation. The program is based on Boruvka's algorithm for minimum spanning trees. Our performance results show that the program's time complexity is consistent with that of classical implementations of Boruvka's algorithm, namely O(m log n), where m is the number of edges and n the number of nodes.

Full PDF

BB. Hoffmann and M. Minas (Eds.): Eleventh InternationalWorkshop on Graph Computation Models (GCM 2020)EPTCS 330, 2020, pp. 163–180, doi:10.4204/EPTCS.330.10 © B. Courtehoute & D. PlumpThis work is licensed under theCreative Commons Attribution License.

A Fast Graph Program for ComputingMinimum Spanning Trees

Brian Courtehoute and Detlef Plump

Department of Computer Science, University of York, York, UK {bc956,detlef.plump}@york.ac.uk

When using graph transformation rules to implement graph algorithms, a challenge is to match theefﬁciency of programs in conventional languages. To help overcome that challenge, the graph pro-gramming language GP 2 features rooted rules which, under mild conditions, can match in constanttime on bounded degree graphs. In this paper, we present an efﬁcient GP 2 program for computingminimum spanning trees. We provide empirical performance results as evidence for the program’ssubquadratic complexity on bounded degree graphs. This is achieved using depth-ﬁrst search as wellas rooted graph transformation. The program is based on Boruvka’s algorithm for minimum span-ning trees. Our performance results show that the program’s time complexity is consistent with thatof classical implementations of Boruvka’s algorithm, namely O ( m log n ) , where m is the number ofedges and n the number of nodes. GP 2 is an experimental rule-based graph programming language with simple semantics to facilitateformal reasoning. It has been shown that every computable function on graphs can be expressed as aGP 2 program [16].A challenge in rule-based graph programming is reaching the time efﬁciency of conventional pro-grams due to the cost of graph matching. In general, ﬁnding a match for a graph L in a graph G takessize ( G ) size ( L ) time, when in practise, we often want to do it in constant time.Other programming languages based on graph transformation rules include AGG [17], GReAT [1],GROOVE [12], GrGen.Net [14], Henshin [2] and PORGY [11], but we are not aware that any of themare able to match the time complexity of subquadratic graph algorithms.GP 2 allows to speed up graph matching by using rooted graph transformation rules, which was ﬁrstintroduced by Bak and Plump [4]. This enables nodes in the host graph declared as roots to be accessedin constant time, making matching locally around those in constant time possible for connected graphsof bounded degree.In previous work, we developed GP 2 programs that match the time complexity of their conventionalcounterparts on connected graphs of bounded degree. The ﬁrst such program produces a 2-colouring,and was shown to match the measured execution times of a tailor-made 2-colouring C program [5]. MoreGP 2 programs that run in linear time on connected graphs of bounded degree include tree recognition,binary DAG recognition, and topological sorting [8].Here we continue this work by presenting an efﬁcient GP 2 program computing a minimum spanningtree of a connected graph. Remember that a spanning tree of an undirected connected graph G withweighted edges is a subgraph that contains all nodes of G and is a tree. A minimum spanning tree (MST)of G is a spanning tree such that the sum of all edge weights is minimum. For example, Figure 1 showsa graph and its minimum spanning tree.64 A Fast Graph Program for Computing Minimum Spanning Trees

Figure 1: A weighted graph and its minimum spanning treeMSTs are useful for building networks between a set of nodes by minimising the cost. Such networksinclude communication, transport, piping, and computer networks. They also provide time efﬁcientapproximations to hard problems such as the travelling salesperson problem or the Steiner tree problem[20].Classical algorithms for ﬁnding MSTs given by Prim, Kruskal, and Boruvka all run in time O ( m log n ) ,where m is the number of edges and n is the number of nodes [6]. However, to reach this time bound, thealgorithms of Prim and Kruskal need data structures such as binary heaps or union-ﬁnd data structures.In contrast, Boruvka’s algorithm can be implemented efﬁciently without fancy data structures. Hence wechoose to implement this algorithm in GP 2.In Section 3, we give the GP 2 program mst-boruvka , which is based on depth-ﬁrst search androoted graph transformation. In Section 4 we give execution time measurements as evidence that onbounded degree graphs, the program’s complexity is consistent with the O ( m log n ) time bound of imple-mentations of Boruvka’s algorithm in conventional languages.This paper is a revised and extended version of [10]. This section brieﬂy introduces GP 2, a graph transformation language, ﬁrst deﬁned in [15]. Up-to-dateversions of the syntax and semantics of GP 2 can be found in [3]. The language is implemented by acompiler generating C code [5, 9].

GP 2 programs transform input graphs into output graphs, where graphs are directed and may containparallel edges and loops. Both nodes and edges are labelled with lists consisting of integers and characterstrings. This includes the special case of items labelled with the empty list which may be considered as“unlabelled”.The principal programming construct in GP 2 consist of conditional graph transformation rules la-belled with expressions. For example, the rule min_s in Figure 8 has three formal parameters of type list , two of type int , a left-hand graph and a right-hand graph which are speciﬁed graphically, and atextual condition starting with the keyword where .The small numbers attached to nodes are identiﬁers, all other text in the graphs consist of labels.Parameters are typed. In this paper we need the most general type list which represents lists witharbitrary values, and int which represents integers.Besides carrying expressions, nodes and edges can be marked red, green or blue. In addition, nodescan be marked grey and edges can be dashed. For example, rule root_current in Figure 4 contains redand unmarked nodes and a red edge. Marks are convenient, among other things, to record visited items . Courtehoute & D. Plump host graphs which are labelled with constant values (lists containing integers andcharacter strings). Formally, the application of a rule to a host graph is deﬁned as a two-stage processin which ﬁrst the rule is instantiated by replacing all variables with values of the same type, and evalu-ating all expressions. This yields a standard rule (without expressions) in the so-called double-pushoutapproach with relabelling [13]. In the second stage, the instantiated rule is applied to the host graph byconstructing two suitable pushouts. We refer to [3] for details and only give an equivalent operationaldescription of rule application.Applying a rule L ⇒ R to a host graph G works roughly as follows: (1) Replace the variables in L and R with constant values and evaluate the expressions in L and R , to obtain an instantiated rule ˆ L ⇒ ˆ R .(2) Choose a subgraph S of G isomorphic to ˆ L such that the dangling condition and the rule’s applicationcondition are satisﬁed (see below). (3) Replace S with ˆ R as follows: numbered nodes stay in place(possibly relabelled), edges and unnumbered nodes of ˆ L are deleted, and edges and unnumbered nodesof ˆ R are inserted.In this construction, the dangling condition requires that nodes in S corresponding to unnumberednodes in ˆ L (which should be deleted) must not be incident with edges outside S . The rule’s applicationcondition is evaluated after variables have been replaced with the corresponding values of ˆ L , and nodeidentiﬁers of L with the corresponding identiﬁers of S . For example, the condition i < j of rule min_s inFigure 8 requires that the integer label of the edge from node g ( ) to node g ( ) is smaller than the integerlabel of the edge from node g ( ) to node g ( ) , where g ( ) , g ( ) , g ( ) are the nodes in S correspondingto , , .A program consists of declarations of conditional rules and procedures, and exactly one declarationof a main command sequence, which is a distinct procedure named Main . Procedures must be non-recursive, they can be seen as macros. We describe GP 2’s main control constructs.The call of a rule set { r , . . . , r n } non-deterministically applies one of the rules whose left-hand graphmatches a subgraph of the host graph such that the dangling condition and the rule’s application conditionare satisﬁed. The call fails if none of the rules is applicable to the host graph.The command if C then P else Q is executed on a host graph G by ﬁrst executing C on a copy of G . If this results in a graph, P is executed on the original graph G ; otherwise, if C fails, Q is executedon G . The command try C then P else Q has a similar effect, except that P is executed on the resultof C ’s execution. If then P or else Q are omitted, no additional command is executed in the missingcases.The loop command P ! executes the body P repeatedly until it fails. When this is the case, P ! termi-nates with the graph on which the body was entered for the last time. The break command inside a loopterminates that loop and transfers control to the command following the loop.In general, the execution of a program on a host graph may result in different graphs, fail, or diverge.The operational semantics of GP 2 deﬁnes a semantic function which maps each host graph to the set ofall possible outcomes. See, for example, [16]. The bottleneck for efﬁciently implementing algorithms in a language based on graph transformationrules is the cost of graph matching. In general, to match the left-hand graph L of a rule within a hostgraph G requires time polynomial in the size of L [4, 5]. As a consequence, linear-time graph algorithms66 A Fast Graph Program for Computing Minimum Spanning Trees in imperative languages may be slowed down to polynomial time when they are recast as rule-basedprograms.To speed up matching, GP 2 supports rooted graph transformation where graphs in rules and hostgraphs are equipped with so-called root nodes. Roots in rules must match roots in the host graph so thatmatches are restricted to the neighbourhood of the host graph’s roots. We draw root nodes using doublecircles. For example, in the rule root_current of Figure 4, the nodes labelled are roots and so is thenode labelled in the right-hand side.Rooted graph matching can be implemented to run in constant time under mild conditions, providedthere are upper bounds on the maximal node degree and the number of roots in host graphs [4]. In this section, we take a look at Boruvka’s algorithm and its implementation in GP 2. We go throughan example execution of the program mst-boruvka in Subsection 3.1 in order to give an intuitive un-derstanding of the program and how it relates to the algorithm. Subsections 3.2, 3.3, 3.4, 3.5, and 3.6contain the program itself and its description.Prim’s, Kruskal’s, and Boruvka’s algorithms for computing MSTs can all be implemented to run inO ( m log n ) time, where m is the number of edges, and n the number of nodes. However Prim’s algorithmneeds binary heaps to achieve it, and Kruskal’s algorithm the union ﬁnd data structure [6]. The advantageof Boruvka’s algorithm is that it does not need fancy data structures to reach that time complexity bound[20]. GP 2 has no predeﬁned data structures except for the host graph that it transforms. Any additionaldata structures need to be encoded in the host graph itself, which can make a program tricky to read.Hence we choose to implement Boruvka’s algorithm in GP 2.Algorithm 1 shows pseudocode for Boruvka’s algorithm. Although it cannot translate directly intoGP 2, it is a suitable starting point for the development of a GP 2 program. Algorithm 1

Boruvka’s MST algorithm on an input graph G Preprocess : initialise the spanning forest F to be the nodes of G while F consists of more than one tree do for each tree T in F do FindEdge : select a minimum weight edge between T and G − T , prioritising already selectededges if they are minimum end for GrowForest : add the selected edges to F end while The idea of Boruvka’s algorithm is to initialise a forest as the nodes of the input graph without anyedges, and to grow that forest by adding minimum-weight edges from between its connected componentsuntil it becomes a minimum spanning tree of the input graph.As illustrated in Figure 2, the input of mst-boruvka is a connected graph with unmarked nodes andedges. Nodes are unlabelled, and edges have integer labels. In the output, the subgraph induced by theblue edges are a minimum spanning tree of the input. The additional root with label 1 is an auxiliaryconstruct used in the execution of the program (which could be removed in constant time). . Courtehoute & D. Plump ⇒ ∗ mst-boruvka 1 Figure 2: Example input and output of mst-boruvka

Throughout the execution of mst-boruvka , the graph induced by the blue edges is a subgraph of theminimum spanning tree highlighted in the output. We shall call this forest F , and its connected compo-nents its trees . Let us explore how mst-boruvka executes using the example in Figure 3, and compareit to the pseudocode in Algorithm 1. The Main procedure of mst-boruvka is depicted in Figure 4. ⇒ ∗ Preprocess 91 55 2 12 22 1 23 1 ⇒ ∗ TreesLoop! 91 55 2 12 22 1 23 1 ⇒ ∗ GrowForest ⇓ ∗ Rewind!1 55 2 12 22 1 23 13 ∗ ⇐ TreesLoop!1 55 2 12 22 1 23 13 ∗ ⇐ GrowForest1 55 2 12 22 1 23 13 ∗ ⇐ Rewind!1 55 2 12 22 1 23 11

Figure 3: Example execution of mst-boruvka

The procedure

Preprocess initialises the forest F to be just the nodes of the input(see line 1 of thepseudocode). It also sets up a linked list of red edges and red nodes that helps the program loop overthe trees of F efﬁciently. Each tree of F is represented by exactly one of its nodes being an entry in thelinked list. Additionally, there is a pointer in the form of an unmarked root node with an outgoing rededge towards the “current” node in the linked list. The pointer also stores the number of trees the foresthas in order to efﬁciently check whether only one tree is left, terminating the main loop (see line 2 of thepseudocode).The loop TreesLoop! moves the pointer through the nodes of the linked list, effectively loopingover the trees of F (see line 3 of the pseudocode). On each tree T , the procedure FindEdge is called,which selects a minimum weight edge between T and its complement in the host graph by marking itgreen (see line 4 of the pseudocode). If there is already an adjacent green edge with minimum weight,no new edge is selected since that could introduce a cycle into F . To ensure that only one node of eachtree is part of the list, the current tree gets marked for deletion from the list using a red loop under certainconditions. Subsection 3.6 elaborates on this.The procedure GrowForest adds the selected edges to F by green edges into blue ones (see line 6 ofthe pseudocode).The loop Rewind! serves to maintain the linked list. It moves the pointer back to the beginning68

A Fast Graph Program for Computing Minimum Spanning Trees of the list. On the way, it removes nodes that have been marked for deletion with a red loop. It alsodecrements the pointer’s label each time it encounters such a node since that node’s tree has been mergedwith another tree. mst-boruvka

The program mst-boruvka is depicted in Figure 4. Most of it has been explained by the exampleexecution in Subsection 3.1. Let us now examine the loop

TreesLoop! . Main = Preprocess; Loop!Loop = if one_tree then break else BodyBody = TreesLoop!; GrowForest; Rewind!TreesLoop = root_current; TraverseTree; MarkForDeletion; CleanUp;try next_tree else breakTraverseTree = ColourBlue; FindEdgeCleanUp = ColourRed; unroot_red!one_tree () root_current (x,y:list) ⇒ x y ⇒ x y next_tree (x,y,z:list) unroot_red (x:list) x yz ⇒ x yz x ⇒ x Figure 4: The GP 2 program mst-boruvka

The purpose of the loop

TreesLoop! is to ﬁnd a minimum weight edge from each tree to its com-plement and mark it green. It initialises by rooting the node the pointer points to. Then that node’stree is marked blue with the procedure

ColourBlue so it can easily be distinguished from the rest ofthe graph.

FindEdge then ﬁnds the minimum edge from the tree to its complement. The procedure

MarkForDeletion marks the tree for deletion if it will be merged with another one. The procedure

ColourRed makes the nodes of the tree be red again. The command unroot_red! unroots any redroots. The rule next_tree then moves the pointer to the next entry in the linked list.

Preprocess

The procedure

Preprocess depicted in Figure 5 uses depth-ﬁrst search (DFS) to construct the linked listand the pointer. An example of its input and output can be seen in Figure 3.The rule pre_init initialises some node of the input to be the starting point of the DFS, and con-structs the pointer. Since initially each node is its own tree, the pointer’s label will count the number ofnodes encountered during the DFS. Red nodes are considered to be discovered by the DFS, and unmarkednodes undiscovered.The rules pre_forward1 and pre_forward2 are called non-deterministically. They both move thered root to an adjacent unmarked node. The rules contain bidirectional edges (without arrowheads) that . Courtehoute & D. Plump

Preprocess = pre_init; PreLoop!; unroot_redPreLoop = PreForward!; try pre_back else breakPreForward = {pre_forward1, pre_forward2}pre_init (x:list) pre_back (a,x,y:list) x ⇒ x 1 x y ⇒ x y a a pre_forward1 (i:int; a,x,y:list) pre_forward2 (i:int; a,x,y,z:list) x yi ⇒ x yi+1 a a x yz i ⇒ x yz i+1 a a Figure 5: The procedure

Preprocess can be matched in either orientation. The rule is a shorthand for a non-deterministic call of copies ofthe same rule whose bidirectional edges have been replaced with directed edges in all possible combi-nations of orientation. The dashed edge serves as a way to keep track of the path the DFS has taken,which is backtracked by the rule pre_back . The backtracking enables the “forward” rules to ﬁnd newundiscovered nodes again.The rules pre_forward1 and pre_forward2 also increment the counter and construct the linkedlist of red edges. The reason we need both rules is to cover both cases of whether the newest entry of thelist is also the current red root or not.

FindEdge

The procedure

FindEdge , depicted in Figures 7 and 8, serves to ﬁnd a minimum-weight edge betweenthe current tree (blue nodes) to the rest of the graph (red nodes) using DFS, and to mark it green. Ifamong said minimum edges is an already selected (green) one, it will stay selected, and no additionaledge is selected for the current tree. If this were not the case, the selected edges would form a cycle on a3-cycle whose edges have equal weight for instance, causing the output MST not to be a tree.Let us examine the example execution of FindEdge in Figure 6. It is part of the transition from theﬁfth to the sixth graph labelled

TreesLoop! in the example execution of mst-boruvka in Figure 3. Westart with a graph where the current tree has blue nodes to distinguish it from the rest of the graph. Thiswas done using the procedure

ColourBlue , which is always called before

FindEdge as deﬁned in theprocedure

TraverseTree in Figure 4. The nodes of the tree are turned grey, but are still distinguishablefrom the the rest of the graph, which has red nodes.The procedure

FindEdge starts by turning the blue root grey, and creating a green root which servesas a ﬂag indicating whether the minimum edge has been initialised yet. The ﬂag is if initialisation hasalready happened, and otherwise.We enter the loop FindLoop! and apply find_forward to move the root along in the current treein a depth-ﬁrst fashion. The ﬂag is not yet set to , so we call MinSetup to initialise the minimum edgeusing min_init1 . The rule min_init2 exists in case the grey root’s only incident edge has already been Compared to [10], we added several “ min ” rules to cover all cases. A Fast Graph Program for Computing Minimum Spanning Trees

31 55 2 12 22 1 23 1 ⇒ ∗ find init; 31 55 2 12 22 1 23 1 ⇒ ∗ find forward 31 55 2 12 22 1 23 10 0create flag ⇒ ∗ min init1 31 55 2 12 22 1 23 10 ⇓ ∗ set flag31 55 2 12 22 1 23 11 ∗ ⇐ min31 55 2 12 22 1 23 11 ∗ ⇐ destroy flag31 55 2 12 22 1 23 1 31 55 2 12 22 1 23 11 ∗ ⇐ find back Figure 6: Example execution of

FindEdge selected (marked green) when the procedure

FindEdge was applied to a different tree. An edge selectedfrom both this tree and another tree is represented with a label that is a list consisting of the edge weightfollowed by a . The currently selected minimum edge of the current tree is represented by a green edgeincident to a grey as well as a red root.We then enter the procedure Success which minimises the weight among the unmarked edges inci-dent to the current grey root using the procedure

MinWithS (which only calls rules that minimise edgesincident to the current grey root), and then applies set_flag to indicate that the initialisation of theminimum edge is complete.Next, the rule find_back moves the grey root back through the tree in depth-ﬁrst fashion. We thenenter the next iteration of

FindLoop! . The rule find_forward cannot be applied, so we continue withthe loop

Minimise! since the ﬂag has already been set.The purpose of the loop

Minimise! is to ﬁnd an edge incident to the grey root with a smaller weightthan the currently selected edge. There are 14 different cases we have to distinguish with the rules thatupdate the minimum edge. They can be seen as combinations of the presence or absence of four ﬂags s , t , n , and p , present in the rule names. The ﬂag s is present if the new and previous minimum edgeshare their “source”, i.e. the incident grey node in the current tree. The ﬂag t is present if the new andprevious minimum edge share their “target”, i.e. the incident red node outside of the current tree. Theﬂag n denotes that the new minimum edge is also a selected minimum edge of a different tree from aprevious call of FindEdge . The presence of ﬂag p indicates that the previous minimum edge has alreadybeen selected for a different tree. These edges are denoted by a being appended to their label. Theyneed to be distinguished since their green mark needs to be preserved in order for the program to workcorrectly.The “min” rules with both the s and t ﬂags, i.e. the ones minimising parallel edges, are a special case.We omit the cases that involve previously selected edges (ﬂags n or p ) since such an edge would havealready been minimised over its parallel edges by previous applications of min1_st and min2_st . Weuse two rules with directed edges labelled j instead of one rule with a bidirectional edge labelled j due toparallel bidirectional edges being disallowed by GP 2. This is because, if the parallel bidirectional edgesare indistinguishable in the left hand side of a rule, the result of the rule application is not necessarilyunique up to isomorphism, since it could leave the host graph with an edge in one of two possibledirections.In order to prioritise edges that have already been selected for different trees, we call the rules of theprocedure MinWithN ﬁrst. They consist of the rules with ﬂag n . We can then call the rest of the rules . Courtehoute & D. Plump FindEdge = find_init; create_flag; FindLoop!; destroy_flagFindLoop = find_forward!; if flag then Minimise! else(try MinSetup); try find_back else breakMinSetup = try min_init2 then Success else (try min_init1 then Success)Success = MinWithS!; set_flagMinimise = try MinWithN else MinWithoutNMinWithS = {min_s, min_sn, min_sp, min_snp, min1_st, min2_st}MinWithN = {min_n, min_np, min_sn, min_snp, min_tn, min_tnp}MinWithoutN = {min, min_p, min_s, min_sp, min_t, min_tp, min1_st, min2_st}find_init (x:list) min_init1 (a,x,y:list) min_init2 (i:int; x,y:list)x ⇒ x x y ⇒ x y a a x y ⇒ x y i i:0create_flag () set_flag() flag () destroy_flag (x:list) /0 ⇒ ⇒ ⇒ x ⇒ /0 find_forward (a,x,y:list) find_back (a,x,y:list)x y ⇒ x y a a x y ⇒ x y aamin1_st (i,j:int; x,y:list) min2_st (i,j:int; x,y:list)x y ⇒ x y ji ijwhere i

FindEdge A Fast Graph Program for Computing Minimum Spanning Trees min_s (i,j:int; x,y,z:list) min_sn (i,j:int; x,y,z:list)x yz ⇒ x yz j i ijwhere i

MinWithoutN . Note that min_sn , min_tn , and min_n (i.e. “min” rules with n but not p ) are theonly rules that can be applied if the weights are equal. This is because they are the only ones selectinga previously selected edge, which we prioritise. Making the other rules applicable on equal weights canlead to non-termination. In our example, min is applied.Finally, the DFS terminates and the rule destroy_flag deletes the temporary ﬂag needed for thisprocedure. The ﬂag could have been implemented as an additional list entry of the unmarked root, butwas chosen to be its own green root for the sake of semantic clarity. GrowForest

The procedure

GrowForest depicted in Figure 9 serves to turn the edges selected by

FindEdge! (greenmark) into edges of the forest (blue mark), thus merging some of the trees. Graphs 6 and 7 in the exampleexecution of mst-boruvka in Figure 3 exemplify input and output graph of

GrowForest . GrowForest = grow_init; GrowLoop!; GrowClean!; unroot_redGrowLoop = GrowTree! try next_root else breakGrowTree = down!; add_edge!; try up else breakGrowClean = try ColourRed; try previous_root else breakgrow_init (x,y:list) down (a,x,y:list)x y ⇒ x y x y ⇒ x y a aadd_edge (a,x,y:list) up (a,x,y:list)x y ⇒ x y a a x y ⇒ x y a anext_root (x,y:list) previous_root (x,y:list)x y ⇒ x y x y ⇒ x y Figure 9: The procedure

GrowForest

The procedure

GrowForest traverses the graph by iterating through the list of trees, and conductinga DFS on each tree. next_root helps iterate through the list in the direction opposite to the orientationof the red edges.The rules down and up play the roles of forward and back in a DFS. They use blue edges to ensureonly the current tree is traversed. add_edge! is called right before up to turn all green edges adjacent tothe grey root blue. After the up rule is applied to a grey root, it is not visited again by the DFS, ensuringthe new blue edges will not be traversed. Future DFSs will also not traverse these edges since one of itsadjacent nodes is grey.The loop CleanUp! iterates through the list of trees in the direction opposite to

GrowTree! and calls

ColourRed on each tree to mark the nodes red again.74

A Fast Graph Program for Computing Minimum Spanning Trees

The program mst-boruvka calls several procedures to maintain the list data structure or to prepare thegraph for the next step. Now we describe these procedures,

ColourBlue in Figure 10,

ColourRed inFigure 11,

MarkForDeletion in Figure 12, and

Rewind in Figure 13.

ColourBlue = blue_init; BlueLoop!BlueLoop = blue_forward!; try blue_back else breakblue_init (x:list) blue_forward (a,x,y:list) blue_back (a,x,y:list)x ⇒ x x y ⇒ x y a a x y ⇒ x y a a Figure 10: The procedure

ColourBlueColourRed = red_init; RedLoop!RedLoop = red_forward!; try red_back else breakred_init (x:list) red_forward (a,x,y:list) red_back (a,x,y:list)x ⇒ x x y ⇒ x y a a x y ⇒ x y a a Figure 11: The procedure

ColourRed

The procedure

ColourBlue uses DFS to turn the nodes of a tree from red to blue, and the procedure

ColourRed to turn the nodes of a tree from grey to red.

MarkForDeletion = try clean else Mark; unroot_redMark = if red_loop then skip else add_loopclean (i:int; x,y:list) red_loop (x:list) add_loop (x:list)x y ⇒ x y i:0 i x ⇒ x x ⇒ x Figure 12: The procedure

MarkForDeletion

The procedure

MarkForDeletion determines whether the current tree needs to be removed from thelist of trees or not. This needs to be done when the current tree is being merged with another tree in theprocedure

GrowForest . However, in a set of trees that are merged into one tree, exactly one of themneeds to be kept as an entry in the list. This is done by exploiting the fact that exactly one of the greenedges used to merge that set of trees must have been selected by two different trees. If none of the edgesfulﬁl that condition, the merging would introduce a cycle. If multiple edges fulﬁl it, the trees are mergedinto a forest, and not a single tree. Hence the trees that select a previously selected edge are kept as anentry in the list. The rule clean easily detects these edges since their label is a list of their edge weightfollowed by a . . Courtehoute & D. Plump Rewind = try remove_mid else RemoveEndRemoveEnd = try {remove_top, remove_bottom} else keepremove_mid (i:int; x,y,z:list) remove_top (i:int; x,y:list)x y zi ⇒ x y zi-1 x yi ⇒ x yi-1 remove_bottom (i:int; x,y:list) keep (i:int; x,y:list)x yi ⇒ x yi-1 x yi ⇒ x yi Figure 13: The procedure

Rewind

The procedure

Rewind returns the pointer to the beginning of the list of trees. On the way, it removeslist entries marked for deletion with a red loop, and updates the pointer’s label which represents thenumber of trees in the list.

On the graph classes we tested, time measurements as illustrated in Figure 14 show subquadratic growthon square grids and ﬁxed degree wheels, and polynomial growth on unbounded degree wheels.The execution time of the program mst-boruvka has been measured on square grids, ﬁxed degreewheels, and undbounded degree wheels. The k th square grid is a k × k grid graph as depicted in Figure3. Figure 2 depicts a wheel graph with 8 spokes. The k th ﬁxed degree wheel is a wheel graph with 16spokes, each of which consist of a path graph with k edges. The k th undbounded degree wheel is a wheelgraph with k spokes.The edge weights of the input graphs are randomly generated integers between 1 and 1000. Thenumber of nodes of the square grids and ﬁxed degree wheels ranges up to over 100000, and that of theundbounded degree wheels to almost 35000. For each graph of a given size, the execution time depictedwith shapes is the average execution time of mst-boruvka on copies of that graph with at least 20random weight distributions. The bars around those data points show the range between the minimumand maximum measured execution time for that graph. The extent of that range can be attributed todiffering random weight distributions used for each time measurement. With a ﬁxed weight distribution,that range is much smaller.Figure 14a shows that mst-boruvka is subquadratic and close to linear on ﬁxed degree wheels andsquare grids. We expect the time complexity to be O ( m log n ) , where m is the number of edges and n thenumber of nodes, akin to those of standard implementations of Boruvka’s MST algorithm [20]. However,proving it will be left as future work. Note that, in order to reach this time complexity in GP 2, the useof root nodes is necessary.In Figure 14b, mst-boruvka is seen to be of an order worse than m log n on unbounded degreewheels. In fact, we conjecture it to be quadratic. GP 2 programs that are non-destructive in that theypreserve the input graph seem to require at least quadratic time on unbounded degree input graphs. For76 A Fast Graph Program for Computing Minimum Spanning Trees . . . . . . . . . · . . . . .

55 Number of nodes in input E x e c u t i o n t i m e ( s ) Square GridsFixed Degree Wheels (a) Bounded Degree Graphs · E x e c u t i o n t i m e ( s ) Unbounded Degree WheelsFixed Degree Wheels (b) Wheels

Figure 14: Execution times of mst-boruvka with averageexample, consider

MinWithN! seen in Figures 7 and 8. In each case, it has to match a root (say u ) andan adjacent non-root (say v ) as long as possible. Assume that in the host graph, a root that is a validmatch for u has a linear number of adjacent nodes, all of which are a valid match for v . Assume that theﬁrst time MinWithN is called, the node with the highest edge weight is matched as v . The program onlyneeds to check one node since every node is a valid match. Then assume that the second time MinWithN is called, the node with the next highest edge weight is matched. In the worst case, two nodes have tobe checked for a valid match. Summing up the number of nodes that are checked if we continue thispattern, we get a sum of consecutive integers with a linear number of terms, which is quadratic. Hencethe quadratic time complexity.Furthermore, procedures that are based on depth-ﬁrst search and preserve their input, such as theprocedure

ColourBlue in Figure 10, have quadratic time complexity on unbounded degree graphs. Therule blue_back looks for a dashed edge around the blue root. Assume the blue root has degree that islinear in the number of edges of a graph class the input graph belongs to. Only one of its adjacent edgescan be dashed since the dashed edges form a path from the blue root to the origin of the DFS. Sincethere’s only one valid match for the dashed edge, the rule application takes linear time. Every node ofthe input has to play the role of the blue root in blue_back at some point.The execution time on square grids is slower than that on ﬁxed degree wheels by a constant factor.This is likely due to the fact that a large part of ﬁxed degree wheels consists of path graphs, in whichseparate trees often share a minimum edge. So

MinWithN is applied more often in ﬁxed degree wheelsthan in square grids. Hence more rules (those of

MinWithN ) generally have to be called in square grids.The time measurements were taken on a Lenovo Thinkpad T460 (2.4 GHz Intel Corei5, 16GB RAM) running Manjaro Linux, using the Python 3.8.3 time module. Exact ﬁg-ures of the time measurements can be found at https://github.com/BrianCourtehoute/BrianCourtehoute.github.io/tree/master/PaperFiles/2020-06/Timings , and the sourceprogram at https://github.com/BrianCourtehoute/BrianCourtehoute.github.io/blob/master/PaperFiles/2020-06/Code/mst-boruvka.gp2 . This paper features an implementation of Boruvka’s algorithm for computing minimum spanning treesin the rule-based graph programming language GP 2. We have presented empirical evidence for its time . Courtehoute & D. Plump ( m log n ) timecomplexity bound of Boruvka’s algorithm implemented in conventional programming languages, where m is the number of edges and n the number of nodes. Furthermore, we have given empirical evidence forthe program’s non-linear time complexity on a graph class of unbounded degree.The program is longer than its equivalent in conventional languages. The C implementation of Boru-vka’s algorithm presented by Sedgewick [18, 19] shown in Appendix A has 62 lines of code (not ac-counting for lines only consisting of brackets). A textual representation of the GP 2 program has 330lines using a similar counting method. This is because, as an experimental language with a small syntax,GP 2 has no built-in data structures yet. Every data structure has to be implemented in the host graphitself. The procedures Preprocess and

Rewind for instance only serve the purpose of creating andmaintaining a list of trees.Alternatively, one could omit counting the rule deﬁnitions in the GP 2 program, since rules are thebasic operations of GP 2, and deﬁnitions of basic operations are not included in the line count of the Cprogram. In that case, we count 30 lines in the GP 2 program, which is much more comparable to thelength of the C code.Due to the large number of procedures and rules, it can be rather challenging to understand how theprogram mst-boruvka operates while reading it (there seems to be a trade-off between efﬁciency andreadability). This goes against the GP 2 design philosophy of facilitating formal reasoning, since provingsoundness is not obvious. Indeed, giving a correctness proof in a formal proof system like in [21] wouldbe a major undertaking. Hence the proof we plan to provide will not be in a formal proof system.For the immediate future, we plan to write a longer version of this paper where we prove thatO ( m log n ) is indeed a time bound of mst-boruvka on bounded degree graphs, and that the programindeed produces a minimum spanning tree of its input.Another goal is to ﬁnd more graph algorithms that can be implemented in GP 2 to reach their clas-sical time bounds. We also plan to expand our technique for giving time measurements of a program’sexecution by comparing the timings of a GP 2 program to that of C code, and by using randomly gen-erated inputs. Since the latter can produce wildly different timings however, it is not obvious how to dothis in a sensible manner.Finally, there is the open problem of how to create GP 2 programs that are efﬁcient on unboundeddegree graphs. There are GP 2 reduction programs in as yet unpublished work [7] that are efﬁcient onarbitrary inputs. They operate by repeatedly removing nodes and edges from the host graph. This is asensible approach for recognising whether a graph belongs to a certain graph class. However, when thepurpose of a program is to produce a structure based on the input such as minimum spanning trees ortopological sortings, this approach is not viable. It is not yet clear how to make these non-destructiveprograms efﬁcient on arbitrary inputs. References [1] Aditya Agrawal, Gabor Karsai, Sandeep Neema, Feng Shi & Attila Vizhanyo (2006):

The design of a lan-guage for model transformations . Software and System Modeling

Henshin:Advanced Concepts and Tools for In-Place EMF Model Transformations . In:

Model Driven EngineeringLanguages and Systems (MODELS 2010) , Lecture Notes in Computer Science A Fast Graph Program for Computing Minimum Spanning Trees [3] Christopher Bak (2015):

GP 2: Efﬁcient Implementation of a Graph Programming Language . Ph.D. thesis,Department of Computer Science, University of York. Available at http://etheses.whiterose.ac.uk/12586/ .[4] Christopher Bak & Detlef Plump (2012):

Rooted Graph Programs . In:

Proc. InternationalWorkshop on Graph Based Tools (GraBaTs 2012) , Electronic Communications of the EASST

Compiling Graph Programs to C . In:

Proc. International Confer-ence on Graph Transformation (ICGT 2016) , LNCS

Minimum-weight spanning tree algorithms A survey andempirical study . Computers & Operations Research

Fast Rule-Based Graph Programs . Work in progress.[8] Graham Campbell, Brian Courtehoute & Detlef Plump (2019):

Linear-Time Graph Algorithms in GP 2 .In:

Proceedings 8th Conference on Algebra and Coalgebra in Computer Science (CALCO 2019) , LeibnizInternational Proceedings in Informatics (LIPICS), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp.16:1–16:23, doi:10.4230/LIPIcs.CALCO.2019.16.[9] Graham Campbell, Jack Romö & Detlef Plump (2020):

The Improved GP 2 Compiler . ArXiv e-prints arXiv:2010.03993. Available at https://arxiv.org/abs/2010.03993 . 11 pages.[10] Brian Courtehoute & Detlef Plump (2020):

A Fast Graph Program for Computing Minimum SpanningTrees . In:

Proc. 11th International Workshop on Graph Computation Models (GCM 2020) , pp. 165–183. Available at . Pre-proceedings version of this paper.[11] Maribel Fernández, Hélène Kirchner, Ian Mackie & Bruno Pinaud (2014):

Visual Modelling of ComplexSystems: Towards an Abstract Machine for PORGY . In:

Proc. Computability in Europe (CiE 2014) , LectureNotes in Computer Science

Modelling and analysis using GROOVE . International Journal on Software Tools for Technology Transfer

Relabelling in Graph Transformation . In:

Proc. InternationalConference on Graph Transformation (ICGT 2002) , Lecture Notes in Computer Science

GrGen.NET - The expressive, convenient andfast graph rewrite system . International Journal on Software Tools for Technology Transfer

The Design of GP 2 . In:

Proc. Workshop on Reduction Strategies in Rewritingand Programming (WRS 2011) , Electronic Proceedings in Theoretical Computer Science

82, pp. 1–16,doi:10.4204/EPTCS.82.1.[16] Detlef Plump (2017):

From Imperative to Rule-based Graph Programs . Journal of Logical and AlgebraicMethods in Programming

88, pp. 154–173, doi:10.1016/j.jlamp.2016.12.001.[17] Olga Runge, Claudia Ermel & Gabriele Taentzer (2012):

AGG 2.0 — New Features for Specifying and An-alyzing Algebraic Graph Transformations . In:

Proc. Applications of Graph Transformations with IndustrialRelevance (AGTIVE 2011) , Lecture Notes in Computer Science

Algorithms in C: Parts 1-4, Fundamentals, Data Structures, Sorting, and Search-ing , 3rd edition. Addison-Wesley.[19] Robert Sedgewick (2001):

Algorithms in C, Part 5: Graph Algorithms , 3rd edition. Addison-Wesley. . Courtehoute & D. Plump [20] Steven S. Skiena (2008):

The Algorithm Design Manual , 2nd edition. Springer, doi:10.1007/978-1-84800-070-4.[21] Gia Wulandari & Detlef Plump (2020):

Verifying Graph Programs with First-Order Logic . In:

Graph Com-putation Models (GCM 2020), Revised Selected Papers , Electronic Proceedings in Theoretical ComputerScience

This volume.

A Boruvka’s Algorithm in C

In this appendix, we include Sedgewick’s C implementation of Boruvka’s algorithm for the sake ofcomparison [18, 19].Listing 1 contains the program computing a minimum spanning tree [19], which uses Union-Find aswell as an adjacency list implementation of weighted graphs. Edge nn [ maxV ], a[ maxE ]; void GRAPHmstE ( Graph G , Edge mst []) { int h , i , j , k , v , w , N; Edge e; int E = GRAPHedges (a , G); for ( UFinit (G ->V); E != 0; E = N) { for (k = 0; k < G ->V; k ++) nn [k] = EDGE (G ->V , G ->V , maxWT ); for (h = 0, N = 0; h < E; h ++) { i = find (a[h ]. v); j = find (a[h ]. w); if (i == j) continue ; if (a[h ]. wt < nn [i ]. wt ) nn [i] = a[h ]; if (a[h ]. wt < nn [j ]. wt ) nn [j] = a[h ]; a[N ++] = a[h ]; } for (k = 0; k < G ->V; k ++) { e = nn [k ]; v = e.v; w = e.w; if (( v != G ->V) && ! UFfind (v , w)) { UFunion (v , w); mst [k] = e; } } } } Listing 1: Boruvka’s AlgorithmListing 2 shows the interface of Union-Find [18]. void UFinit ( int ); int UFfind ( int , int ); int UFunion ( int , int ); Listing 2: Union-Find ADT interfaceUnion-Find itself is shown in Listing 3 [18]. static int *id , * sz ; void UFinit ( int N) { int i; id = malloc (N* sizeof ( int )); A Fast Graph Program for Computing Minimum Spanning Trees sz = malloc (N* sizeof ( int )); for (i = 0; i < N; i ++) { id [i] = i; sz [i] = 1; } } int find ( int x) { int i = x; while (i != id [i ]) i = id [i ]; return i; } int UFfind ( int p , int q) { return ( find (p) == find (q)); } int UFunion ( int p , int q) { int i = find (p) , j = find (q); if (i == j) return ; if ( sz [i] < sz [j ]) { id [i] = j; sz [j] += sz [i ]; } else { id [j] = i; sz [i] += sz [j ]; } } Listing 3: Union-Find ADT implementationListing 4 contains the implementation of weighted graphs using adjacency lists [19]. typedef struct node * link ; struct node { int v; double wt ; link next ; }; struct graph { int V; int E; link * adj ; }; link NEW ( int v , double wt , link next ) { link x = malloc ( sizeof *x); x ->v = v; x -> wt = wt ; x -> next = next ; return x; } Graph GRAPHinit ( int V) { int i; Graph G = malloc ( sizeof *G); G -> adj = malloc (V* sizeof ( link )); G ->V = V; G ->E = 0; for (i = 0; i < V; i ++) G -> adj [i] = NULL ; return G; } void GRAPHinsertE ( Graph G , Edge e) { link t; int v = e.v , w = e.w; if (v == w) return ; G -> adj [v] = NEW (w , e.wt , G -> adj [v ]) ; G -> adj [w] = NEW (v , e.wt , G -> adj [w ]) ; G ->E ++; }}