[PDF] Connected Components in Undirected Set--Based Graphs. Applications in Object--Oriented Model Manipulation

Abstract

This work introduces a novel algorithm for finding the connected components of a graph where the vertices and edges are grouped into sets defining a Set--Based Graph. The algorithm, under certain restrictions on those sets, has the remarkable property of achieving constant computational costs with the number of vertices and edges. The mentioned restrictions are related to the possibility of representing the sets of vertices by intension and the sets of edges using some particular type of maps. While these restrictions can result strong in a general context, they are usually satisfied in the problem of transforming connections into equations in object oriented models, which is the main application of the proposed algorithm. Besides describing the new algorithm and studying its computational cost, the work describes its prototype implementation and shows its application in different examples.

Full PDF

CConnected Components in Undirected Set–Based Graphs. Applications inObject–Oriented Model Manipulation.

Ernesto Kofman a,b ∗ , Denise Marzorati a , Joaqu´ın Fern´andez b a FCEIA-UNR, Argentina b CIFASIS-CONICET, Argentina

Abstract

This work introduces a novel algorithm for ﬁnding the connected components of a graph where the verticesand edges are grouped into sets deﬁning a

Set–Based Graph . The algorithm, under certain restrictionson those sets, has the remarkable property of achieving constant computational costs with the numberof vertices and edges. The mentioned restrictions are related to the possibility of representing the sets ofvertices by intension and the sets of edges using some particular type of maps. While these restrictions canresult strong in a general context, they are usually satisﬁed in the problem of transforming connectionsinto equations in object oriented models, which is the main application of the proposed algorithm.Besides describing the new algorithm and studying its computational cost, the work describes itsprototype implementation and shows its application in diﬀerent examples.

Keywords:

Large Scale Models, Connected Components, Set–Based Graphs, Modelica

1. Introduction

Finding the connected components of an undirected graphs is a classic problem of Graph Theory thatis employed in several application domains. Simple algorithms that solve this problem in linear time withthe number of vertices have been known since several decades ago [1]. Also, parallel algorithms that cansolve the problem in logarithmic time have been known for long time [2].One particular problem that requires ﬁnding the connected components of a graph is that of ﬂatteningthe equations of object oriented models [3], which is part of the ﬁrst stage of the compilation process.There, diﬀerent sub-models are related by connectors and the connections must be replaced by equationswhere sum of all connected variables of certain type must be zero. While solving the problem in lineartime may be aﬀordable in several situations, there are models that result of the coupling of thousandsof small sub-models where the cost can become prohibitive. Moreover, even if the problem is solved in areasonable amount of time, the resulting system of equations can be so large that it is intractable by thesubsequent stages of the compilation process.Fortunately, large models often contain several repetitive connections that are the result of using for statements and this is a fact that can be exploited to reduce the computational cost of the diﬀerentcompilation stages [4, 5, 6, 7, 8, 9, 10, 11, 12]. However, the possibility of exploiting the presence ∗ Corresponding author

Email address: [email protected] (Ernesto Kofman a,b ) Preprint submitted to Applied Mathematics and Computation August 11, 2020 a r X i v : . [ c s . D S ] A ug f repetitive or regular structures at each stage requires that the previous stages had kept a compactrepresentation. While there are some experimental implementations that in some particular cases cankeep a compact representation during the whole compilation process [7], there is not yet a general solution.Regarding the ﬂattening stage, a general solution would require to ﬁnd the sets of connected connectorswhich may be part of multidimensional arrays, solving the problem without actually expanding thosearrays into individual connectors. This problem is equivalent to ﬁnd the connected components of anundirected graph while keeping some sets of vertices and edges grouped together, which constitutes themain goal of the present work.The problem of manipulating large graphs grouping vertices and edges into sets to produce compactsystems of equations was recently proposed with the introduction of Set–Based Graphs [13]. There, acompact solution for the problems of maximum matching and ﬁnding strongly connected components indirected graph for equation sorting was proposed and implemented as part of the prototype ModelicaCCcompiler [7].In this work, we use the same tool (Set-Based Graphs) and propose a general algorithm for ﬁndingconnected components in undirected graphs. We show that, under certain assumptions, the computationalcost of the algorithm becomes independent on the size of the sets of vertices and edges (i.e., the algorithmhas a constant computational cost with the number of vertices and edges). In consequence, the cost ofgenerating the set of equations in the ﬂattening stage results independent on the size of the arrays ofconnectors.Besides introducing and analyzing the algorithm, we also describe a prototype implementation inGNU Octave [14]. In addition, we analyze three examples (incuding a multidimensional one) showing theeﬃciency of the novel procedure.The paper is organized as follows. After this introduction we brieﬂy present a problem that motivatesthe work. Then, Section 2 introduces some concepts and previous works that are used as the basis ofthe main results, presented in Section 3. The prototype implementation of the algorithm is described inSection 4 and its usage for ﬂattening connections is discussed in Section 5. Finally, Section 6 introducessome examples and Section 7 concludes the article.

This work was motivated by a problem that appears in Modelica compilers. Modelica models can berepresented by the coupling of several sub-models where the coupling is usually made using connectors .That way, the equations representing the structure of the circuit of Figure 1 can be represented by thepiece of code in Listing 1.

Figure 1: RC network isting 1: Modelica connections connect (S.p,R [1].p);connect ( S.n,G.p );for i in 1: N-1 loopconnect (R[i].n, R[i+1].p);end for;for i in 1:N loopconnect (C[i].p, R[i].n);connect (C[i].n, G.p);end for; The connectors (

S.p , S.n , etc) have two types of variables: eﬀort variables that are equal to eachother after being connected and ﬂow variables whose sum is zero for all connected connectors. Thus, theresulting equations for the structure of Listing 1 would be that of Listing 2

Listing 2: Modelica connections

S.p.effort =R[1]. p.effort ;S.p.flow +R[1]. p.flow =0;S.n.effort = G.p.effort ;S.n.flow + G.p.flow +sum( C.n.flow )=0;for i in 1: N-1 loopR[i]. n.effort =R[i+1]. p.effort ;C[i]. p.effort =R[i]. n.effort ;R[i]. n.flow +R[i+1]. p.flow +C[i]. p.flow =0;end for;C[N]. p.effort =R[N]. n.effort ;R[N]. n.flow +C[N]. p.flow =0;

The translation from connections to equations requires ﬁnding connected components in a graph wherethe vertices are the connectors (

S.p , S.n , etc.) and the edges are deﬁned by the presence of connect statements between the corresponding connectors.Modelica compilers solve this problem by ﬁrst expanding the for statements and the arrays of con-nectors and then ﬁnding the connected components and producing the equations as part of a processknown as flattening . The result of this process in a model like that of Listing 1 is a large piece of codewithout the for statements of Listing 2. In addition, the cost of producing that code is at least linearwith the size of the arrays involved ( N in the above example).When N is large (starting from 10 or 10 ) the computational costs become huge, and the length ofthe code produced may become intractable for the successive stages of the compilation process. Thus,we expect that the algorithms developed in this work provide a general solution for this problem as wellas for other problems that require a compact and eﬃcient connected components analysis in presence ofsome repetitive or regular structures.

2. Background

In this section we present some previous results and tools that are used along the rest of the paper.

In an eﬀort to unify the diﬀerent modeling languages used by the diﬀerent modeling and simulationtools, a consortium of software companies and research groups proposed an open uniﬁed object orientedmodeling language called

Modelica [3, 15], that in the last two decades was progressively adopted bydiﬀerent modeling and simulation tools. 3odelica allows the representation of continuous time, discrete time, discrete event and hybrid sys-tems. Elementary Modelica models are described by sets of diﬀerential and algebraic equations that canbe combined with algorithms specifying discrete evolutions. These elementary models can be connectedto other models to compose more complex models, facilitating the construction of multi–domain models.Modelica models can be built and simulated using diﬀerent software tools. OpenModelica [16] is themost complete open source package, while Dymola [17] and Wolfram System Modeler are the most usedcommercial tool. There are also some prototype tools oriented to diﬀerent problems, such as JModelica[18] (for optimization problems) and ModelicaCC.The simulation of Modelica models requires a previous compilation, that transforms the object orientedmodel description into a piece of code (usually in C language) containing a set of ordinary diﬀerentialequations (ODE) or diﬀerential algebraic equations (DAE) that can be solved by an appropriate ODE orDAE solver. The compilation process is usually divided in several stages: ﬂattening, alias removal, indexreduction, equation sorting, and ﬁnal code generation.All Modelica compilers by default expand the arrays and unroll the for loop cycles in the ﬁrst stepof the compilation process. In consequence, in presence of large arrays, the computational cost of thecompilation and the length of the produced code can become huge and the tools are unable to simulatesystems with more that about 10 state variables. While there are some experimental implementationsthat avoid expanding and unrolling [7, 19], there is not yet a general solution. Finding the connected components of an undirected graph is a simple problem for which there arehundreds of algorithms. Linear time algorithms have been known since a long time ago [1], and thereare also several parallel algorithms that can reduce the costs to logarithmic time. Among them, we shallbrieﬂy describe that of [2], which has certain features in common with the algorithm that constitutes themain result of this work.This algorithm represents the connected components using a vector D of length n (the number ofvertices in the graph) such that D ( i ) contains the smallest numbered vertex in the connected componentto which i belongs. A version of this procedure is described in Algorithm 1, where we consider that agraph G = ( V, E ) is given with a set of vertices V = { , , . . . , n } and a set of edges E = { e , . . . , e m } with e k = { i, j } where i, j ∈ V .The details and the explanation of this algorithm is given in [2]. The algorithm we shall developwill use a very similar idea to represent the connected components (with a more general idea of thevertex numbering) and we shall also use an auxiliary vector like C ( i ) with a similar idea for merging thecomponents in step 4 and applying the map into itself like in step 8 until all the members of a componentpoint to the same root vertex. The algorithms presented in this work are based on the use of

Set-Based Graphs (SB-Graphs), ﬁrstdeﬁned in [13]. SB-Graphs are regular graphs in which the vertices and edges are grouped in sets allowingsometimes a compact representation. We introduced next the main deﬁnitions.

Deﬁnition 1 (Set–Vertex) . A Set–Vertex is a set of vertices V = { v , v , . . . , v n } . Deﬁnition 2 (Set–Edge) . Given two Set–Vertices, V a and V b , with V a ∩ V b = ∅ , a Set–Edge connecting V a and V b is a set of non repeated edges E [ { V a , V b } ] = { e , e , . . . , e n } where each edge is a set of twovertices e i = { v ak ∈ V a , v bl ∈ V b } . lgorithm 1 Connected Components of [2] function Connect ( V, E ) (cid:46) All the steps are performed in parallel for all i ∈ V D ( i ) ← i for all i ∈ V . for it = 1 : log ( n ) do C ( i ) ← min j ( D ( j ) |{ C ( i ) , D ( j ) } ∈ E ∧ D ( j ) (cid:54) = D ( i )), if none then D ( i ), for all i ∈ V C ( i ) ← min j ( C ( j ) | D ( j ) = i ∧ C ( j ) (cid:54) = i ), if none then D ( i ), for all i ∈ V D ( i ) ← C ( i ) for all i ∈ V . for it = 1 : log ( n ) do C ( i ) ← C ( C ( i )) for all i ∈ V . end for D ( i ) ← min( C ( i ) , D ( C ( i ))) for all i ∈ V . end for return D end functionDeﬁnition 3 (Set–Based Graph) . A Set–Based Graph is a pair G = ( V , E ) where • V = { V , . . . , V n } is a set of disjoint set–vertices (i.e., i (cid:54) = j = ⇒ V i ∩ V j = ∅ ). • E = { E , . . . , E m } is a set of set–edges connecting set–vertices of V , i.e., E i = E [ { V a , V b } ] with V a ∈ V and V b ∈ V . In addition, given two set edges E i , E j ∈ E with i (cid:54) = j , such that E i = E [ { V a , V b } ] and E j = E [ { V c , V d } ] , then V a ∪ V b ∪ V c ∪ V d (cid:54) = V a ∪ V b . This is, two diﬀerentset–edges in E cannot connect the same set–vertices. As in regular graphs, we can deﬁne bipartite Set–Based Graph and directed Set–Based Graphs. Analgortihm for matching in bipartite Set–Based Graph and an algorithm for ﬁnding the strongly connectedcomponents of a directed Set–based Graph were recently presented in [13].An SB-Graph G = ( V , E ) deﬁnes an equivalent regular graph G = ( V, E ) where V = (cid:83) V i ∈ V and E = (cid:83) E i ∈ E . Thus, a SB–Graph contains the same information than a regular graph. However, SB-Graphs can have a compact representation of that information provided that every set–edge and everyset-vertex is deﬁned by intension .

3. Main Results

This section introduces the main result of the article. We ﬁrst introduce a simple but ineﬃcientalgorithm for ﬁnding the connected components of regular graphs. Then we show that this algorithm, inthe context of Set–Based Graphs, can be implemented using compact operations on some sets and mapsleading to computational costs that, under certain circumstances, become independent on the number ofvertices and edges.

We present ﬁrst an algorithm for computing the connected components in a regular graph G = ( V, E ).The proposed algorithm ﬁnds a collection of connected components represented in a similar way to thatAlgorithm 1. In particular: 5

We assume that there exists a total ordering between all individual vertices (they could be repre-sented by integer numbers, by arrays of integer numbers, etc). • Each connected component is represented by one of its vertices v k ∈ V , which is the smallest vertexof the connected component. • There is a map D map : V → V such that D map ( v r ) = v k implies that the vertex v r ∈ V is part ofthe connected component represented by v k . • Since the representative D map ( v r ) is the minimum vertex on the connected component, then D map ( v r ) ≤ v r for all v r ∈ V .Making use of this representation, Algorithm 2 ﬁnds the connected components represented by D map of an arbitrary graph G = ( V, E ). Algorithm 2

Connected Components function Connect ( V, E ) D map ← Identity map : V → V (cid:46)

All vertices are initially disconnected I old ← ∅ (cid:46) Previous image set of D map while I old (cid:54) = Image( D map ) do C map ← D map (cid:46) New map of connected components for all v r ∈ Image( D map ) do (cid:46) Component represented by v r if ∃{ v r , v s } ∈ E then (cid:46) v r is not an isolated vertex v k ← min( D map ( v b ) : ( { v a , v b } ∈ E ∧ D map ( v a ) = v r )) (cid:46) Minimum componentconnected to the component represented by v r if v k < v r then C map ( v r ) ← v k (cid:46) Connect components represented by v r and v k C map ( v a ) ← C map ◦ C map ( v a ) = C map ( v r ) = v k for all v a : C map ( v a ) = v r (cid:46) Allcomponents represented by v r are now represented by v k end if end if end for I old ← Image( D map ) (cid:46) Image of the previously connected components D map ← C map (cid:46) New map of connected components end while return D map end function The algorithm works as follows. It starts assuming that all vertices are disconnected so they representtheir own connected component. Then, it iterates until the image of D map becomes constant, meaningthat no further components can be connected.During each iteration a new map C map is computed by adding connections between components. Foreach component represented by v r , the algorithm takes into account all the edges connecting vertices ofthis component. Among all these edges, it takes the one that connects to certain vertex v b with the leastrepresentative v k = D map ( v b ) (it could happen that v k = v r if there is no connection from the componentrepresented by v r to any component represented by a smaller vertex). Then, if the representative v k is6maller than v r , the algorithm connects both components by making C map ( v r ) = v k . In that case, it alsoreconnects all the vertices that were connected to v r such that they are now connected to v k .Although it could be easily proved that the procedure is correct, it is possibly one of the less eﬃcientalgorithms one can imagine to ﬁnd connected components in a graph. Its computational cost appearsto grow at least quadratically with the number of vertices and edges. However, we shall see next thatin the context of Set–Based Graph this algorithm can be implemented in a way that the costs becomeindependent on the size of the diﬀerent sets involved.A key feature of the algorithm above that will allow this simpliﬁcation is that in each iteration C map iscomputed as a function of the complete map D map and vice-versa. That way, both maps can be entirelycomputed from each other in simple steps. The goal of using Set–Based Graph is to exploit the presence of repeating regular structures along thegraph, representing the diﬀerent sets by intension. While the deﬁnitions of SB–Graphs do not explicitlyestablish this, we propose next a simple way of representing the set edges that allows the intensivetreatment of the graph.Let E h be a set-edge connecting V i and V j . We shall characterize this set–edge using two maps thatrelate the individual edges e hk ∈ E with the vertices it connects v ir = map h,i ( e hk ) and v js = map h,j ( e hk ).This is, the set edge is compactly deﬁned as E h = (cid:91) k { v ir = map h,i ( e hk ) , v js = map h,j ( e hk ) } . Thus, provided that there is a compact expression for these maps and that the set-vertices are representedby intension, the complete SB–Graph has a compact representation.Using this representation of an SB–Graph, the previous algorithm can be reformulated as proposedin Algorithm 3.

Algorithm 3

Connected Components with SB–Graphs function ConnectSBG ( V , E ) V ← (cid:83) V i ∈ V (cid:46) Set of all vertices ( E , E ) ← edgeMaps( E ) (cid:46) Left and right maps from edges to vertices D map ← Identity map : V → V (cid:46)

All vertices are initially disconnected I old ← ∅ (cid:46) Previous image set of D map while I old (cid:54) = Image( D map ) do ER ← D map ◦ E (cid:46) Left map from edges to connected components ER ← D map ◦ E (cid:46) Right map from edges to connected components C ← minAdjMap( ER , ER ) (cid:46) Map from components to least components via E C ← minAdjMap( ER , ER ) (cid:46) Map from components to least components via E C map ← min( D map , C , C ) (cid:46) Map from components to least components I old ← Image( D map ) (cid:46) Image of the previously connected components D map ← ( C map ) ∞ (cid:46) New map of connected components end while return D map end function

7n this new algorithm, we made use of the following functions and notation: • Function edgeMaps( E ) returns two maps: a map of left connections E : E → V and a mapof right connections E : E → V , deﬁned as follows. For each set–edge E h ∈ E connecting setvertices V i , V j , the maps E , satisfy E ( e hk ) = map h,i ( e hk ) ∀ e hk ∈ E h E ( e hk ) = map h,j ( e hk ) ∀ e hk ∈ E h Notice that for each set edge, there are two possible deﬁnitions of E and E , according towhich one is associated with i and which one with j (the set–edges are non–directed). • Function minAdjMap(map , map ) computes a map map such thatmap ( v ) = min(map ( e ) : map ( e ) = v ) (1)In the context of this algorithm, v is a representative vertex and e is an edge. Thus, for all edgessuch that map ( e ) = v , the function takes the one for which map ( e ) is minimum and deﬁnesmap ( v ) = map ( e ). That way, map ( v ) is the least representative vertex connected via map to avertex represented by v .In the algorithm, the function is invoked twice with the inverted arguments in order to ﬁnd theleast representative connected to a component via both maps. • The notation ( C map ) ∞ is the result of applying C map on itself until arriving to a ﬁxed point.The algorithm is almost identical to the previous one, except that the iteration of C map on itself(step 11 in Algorithm 2) is now performed at the end of the cycle. The convergence of this new iterationis ensured by the fact that C map is always less or equal than the identity map and that its domain isﬁnite ( V ). We shall see in the next section that, under certain assumptions on the deﬁnition of the maps, allthe steps involved in this new algorithm can be computed by intension (including the inﬁnite iterationof C map on itself). Then, the computational cost of each iteration of the algorithm (steps 6–14) becomesindependent on the size of the sets.Regarding the number of iterations that are actually needed until all components are connected, thefollowing result establishes an upper bound. Lemma 1.

The numbers of iterations required to ﬁnd all connected components is at most ( N ) ,where N is the number of edges in the largest connected componet.Proof. Suppose that after certain number of iterations k , a component represented by v r contains one ormore connections to other components represented by v s , v s , etc. Suppose also that during the nextiteration the component represented by v r is not connected to any of those components.If that occurs is because v r < v s i (otherwise it would be connected to the component represented bythe minimum v s i ). In addition, the components represented by v s i will be connected in that iterationto some components represented by v t j < v r (otherwise, they would be connected to the component8epresented by v r ). Then, in the following iteration, v r will have connections to components representedby v t j < v r and it will be connected to the least v t j .Thus, every component containing connections to other components is always connected after a max-imum of two iterations. It means that after two iterations the number of diﬀerent components that willbe part of the same connected component is reduced at least to the half and they will be reduced to asingle component after at most 2 log ( N ) iterations.This lemma tells that the number of iterations (and so the computational costs) of the algorithm mayactually depend on the size of the sets. However, in several cases it does not:1. When the structure is such that each connected component can only have a bounded number ofvertices (independently of the size of the set-vertices).2. When the latter condition is not accomplished by some connected components, but each connectedcomponent can be split in two components: the ﬁrst one verifying the previous condition and thesecond one having all its vertices disconnected among them but connected to some vertices of theﬁrst component.3. When the second component of the previous case does have connections among its vertices, butthe connections follow an order: A connection between ( v r − v r − v r − . . . − v r p ), implies that v r < v r < v r < . . . < v r p .The independence of the computational costs with the size of the sets in the ﬁrst case is ensured byLemma 1.In the second case, the fact that the large set of edges has only connections to the small set of edgesimplies that in at most two iterations the edges of the large set will be connected to the edges of the smallset (the reason for that can be found in the proof of Lemma 1). After that, the number of componentsis reduced to a quantity that is independent on the size of the sets and so is the number of additionaliterations.In the third case, each connection of the form v r − v r − v r − . . . − v r p with v r < v r < v r < . . . < v r p produces that all the components get connected in a single iteration of the algorithm (unless they are ﬁrstconnected to the small set of components). Then, in either situation, the case reduces to the situationanalyzed in the previous case.In conclusion, the only situation in which a large number of iterations would be required is under thepresence of a large connected component resulting from a large non–ordered set of connections. Yet, thatwould be only possible when the maps that deﬁne the set edges have some irregular deﬁnition.

4. Implementation

Algorithm 3 was implemented in a prototype library of Octave for Set–Based Graphs. The librarydeﬁnes four basic classes:

Interval , Set , Map , and

SBGraph and diﬀerent operations involving them. Wedescribe next their main features.

A unidimensional interval is represented by three natural numbers:

Interval.start , Interval.step ,and

Interval.end . For instance, the sequence [3 , , , . . . , start =3, step =2, and end =199 (we shall simply denote it by [3 : 2 : 199]).9 general interval of dimension d is represented by three arrays of length d : Interval.start (1 : d ), Interval.step (1 : d ), and Interval.end (1 : d ). For instance, the sequence[(1; 1) , (1; 2) , . . . , (1; 100) , (4; 1) , (4; 2) , . . . , (4; 100) , . . . , (1000; 1) , (1000; 2) , . . . , (1000; 100)]is represented by start (1) = 1, step (1) = 1, end (1) = 100, start (2) = 1, step (2) = 3, end (2) = 1000.We shall denote it by [1 : 1 : 100] × [1 : 3 : 1000].On these intervals we deﬁned some basic functions and operations used by the higher level class thatdeﬁnes sets. A set is deﬁned as an array of disjoint intervals of the same dimension. This is,

Set.Interval (1)contains the ﬁrst interval,

Set.Interval (2) contains the second interval, etc. For instance, the set S = { , , , . . . , } ∪ { , , . . . , } is represented by an array of two intervals: [2 : 2 : 100] and [101 : 1 : 200] and we shall denoted it as S = { [2 : 2 : 100] } ∪ { [101 : 1 : 200] } .On the set class, we deﬁned some functions and operators, including the basic operations setUnion , setIntersection , and setMinus . All the operations are computed by intension using only the start , step and end values of the underlying intervals, and the result is another set represented by intervals.That way, the cost of the operations is independent on the size of the intervals involved. A one dimensional linear map is deﬁned by two rational numbers: linearMap.gain (which cannot benegative) and linearMap.offset . Similarly, a general d –dimensional linear map is deﬁned by two arraysof length d linearMap.gain (1 : d ), and linearMap.offset (1 : d ).A Map is then deﬁned by an array of disjoint sets

Map.domain (1 : M ) and an array of linear maps Map.linearMap (1 : M ), where all the sets and linear maps have the same dimension. For instance, amap like i =  j + 3 for j ∈ { , , . . . , }

100 for j ∈ { , , . . . , } j/ j ∈ { , , . . . , } is deﬁned by • Map.domain(1) = { } , Map.linearMap(1).gain =1,

Map.linearMap(1).offset =3 • Map.domain(2) = {

101 : 2 : 199 } , Map.linearMap(2).gain =0,

Map.linearMap(2).offset =100 • Map.domain(3) = {

102 : 2 : 200 } , Map.linearMap(1).gain =1/2,

Map.linearMap(1).offset =0A restriction in the deﬁnition of a map is that every domain and its correspondent linear map must besuch that the resulting image in each dimension is composed by natural numbers. Thus, when a gain isnot an integer number, the corresponding domain and oﬀset cannot be arbitrary. Otherwise, if the gainis integer, the oﬀset must be integer too.On these maps we also implemented several functions and operators. Among them, we mention thefollowing ones: 10 imageMap computes the set that is the image of a given set through a given map. Similarly, preImageMap computes the preimage set. • compMaps computes the new map that results from composing two maps (map = map ◦ map ). • minMap computes the minimum map between two maps, i.e., map ( v ) = min(map ( v ) , map ( v )),which can result equal to map in some subdomain, and equal to map in the remaining subdomain.This function requires establishing an ordering between the elements. For one dimensional setsthe ordering is that of the natural numbers. For higher dimensional sets, the order between twoelements is established at the ﬁrst dimension in which they diﬀer. This is, we say that v < w if v < w or v = w ∧ v < w , etc. • minAdjMap : Given two maps map and map with the same domain, this function computes a newmap map according to Eq.(1). The computation of the new function is based on the followingobservation: – If map is bijective, then map can be computed as map ◦ map − . – If map is constant, then map can be computed as map ( v ) = min(map ( e )) for all e in thedomain of the maps.Then, the function is implemented computing on each sub-domain and on each dimension of map according to the previous observation. • mapInf : Consider a map map with the following restrictions: – All its linear maps have gains (in all the dimensions) that can only take the values 1 and 0. – If a gain is 1, the corresponding oﬀset cannot be positive.On this map, this function computes a new map map that is the result of composing map withitself until reaching convergence. The computations are performed without actually iterating onmap . Instead, it computes the ﬁxed points of the iteration and the maps to those ﬁxed points.The implementation is based on the following observations: – A domain where the map has gain 1 and oﬀset 0 remains unchanged after each iteration. – If all domains have gain 0, then the iteration converges after at most N steps where N is thenumber of domains. – If a domain has gain 1 and oﬀset -1, then after some iterations of the map it will take a valueoutside the domain ( interval.start − interval.start − – If a domain has gain 1 and oﬀset -2, we shall have two arrival points after leaving the domain.So we can split the interval in two intervals with gain 0 and diﬀerent oﬀset. For larger negativeoﬀset values the idea is the same. 11 .4. Set–Based Graphs

Set–Based Graphs are represented by an array of sets

SBG.setVertex (1 : n ) containing the set verticesand an array of set edges SBG.setEdge (1 : m ).Every set-edge contains two integer numbers SE.index1 , SE.index2 and two maps,

SE.map1 , SE.map2 ,with identical domain. The integer numbers represent the position of the set–vertices that are connectedby the set edge, and the maps represent the connections between individual vertices. For instance, a set–edge with index1 =3 and index2 =5 connects the set vertices

SBG.setVertex (3) with

SBG.setVertex (5).Then, given h ∈ SE.map1.domain , the h –th edge of the set–edge connects the vertices SE.map1 ( h ) with SE.map2 ( h ).On this class, we implemented the function connectComp that computes the connected componentsof a given SB-Graph. This function returns a map D map as explained in Section 3. While Algorithm 2 is general, the implementation described above imposes the following restrictionson the set–based graphs:1. Every individual vertex is represented by an array of natural numbers of dimension d .2. Every set-vertex is a union of a ﬁnite number of intervals of dimension d . Every interval in eachdimension is deﬁned by three natural numbers: start , step , and end .3. The maps that deﬁne the set edges map h,i : N d → N d are piecewise linear . Each map has a ﬁnitenumber of domains with a corresponding linear aﬃne function. In every domain, the function actingin each dimension is characterized by two rational numbers: the gain and the oﬀset .4. The implementation of the mapInf function imposes a further restriction to the maps: In a givendomain and dimension, if map h,i and map h,j have both nonzero gains, then the gains must be thesame. Otherwise, function minAdjMap might return a map with some gain that is not 1 or 0 and,if that map turns to be less than the identity, then mapInf cannot be applied.The last restriction can be easily avoided with a more general implementation of mapInf consideringgains diﬀerent from 1 and 0.

5. Application to Connection Flattening

In this section we analyze the use of the proposed algorithm in the context of replacing connectionsby equations in object oriented models.

The original motivation of this work was that of automatically obtaining a code like that of Listing 2given a set of connections like those of Listing 1. For that goal, we propose the following procedure:1. Build a SB Graph: • Associate a set-vertex to each array of connectors. For the example of Listing 1 the set verticesare

S.p , S.n , G.p , R [1 : N ] .p , R [1 : N ] .n , C [1 : N ] .p , and R [1 : N ] .n .12 Associate a set-edge to each set of connections between every pair of set vertices. In theexample some set edges would be – E = E [ S.p, R.p ], characterized by maps map ( e ) = S.p and map ( e ) = R [1] .p . – E = E [ R.n, R.p ], characterized by maps map ( e i ) = R [ i ] .n and map ( e i ) = R [ i + 1] .p for i = 1 , . . . , N − – E = E [ C.n, G.p ], characterized by maps map ( e i ) = C [ i ] .n and map ( e i ) = G.p for i = 1 , . . . , N .2. Find the connected components using Algorithm 3.3. Given the map D map representing the sets of connected components, write the corresponding equa-tions.The last step ﬁrst splits the domain and image of D map into atomic sets, i.e., sets containing a singleintervals. That way, the sets can be traversed in the resulting code using for statements. Then, theprocedure uses the facts that the image of D map are the representatives of the connected componentsand that the preimage of each atomic set of the image contains the corresponding connected components.Since the preimage is also split into atomic sets, it can be also traversed using for statements in theresulting code. Then, once the code for traversing the connected components is written, it is simple toadd the appropriate code for the eﬀort and ﬂow variables. The restrictions described in Sec.4.5 about the implementation and the conditions enumerated afterLemma 1 establishes the circumstances under which the algorithm eﬀectively achieves a constant costwith respect to the number of vertices and edges. While these conditions may be quite restrictive ingeneral, in the context of replacing connections by equations in object oriented models they are almostinvariantly satisﬁed: • The connectors in a model are always instantiated as scalar or arrays with diﬀerent dimensions.We can represent all of them using arrays of vertices with the maximum dimension found. Thatway the ﬁrst two restrictions of Sec.4.5 are always satisﬁed. • The third restriction is satisﬁed provided that: – In presence of nested for loop statements, the interval of the iterators are independent oneach other. This is, we cannot write for i in 1:N loop; for j in 1:i loop since in thatcase the domain of the maps deﬁning the set edges would not be an interval. – The connections have linear aﬃne operations with each index. This is, we can only haveexpressions like connect(v[a*i+b, c*j+d], w[e*i+f, g*j+h]) where i and j are the nestediterators and a, b, c, d, e, f, g, h are rational constants. • The fourth restriction is satisﬁed provided that a and e in the previous item are diﬀerent only ifone of them is zero (and the same for c and g ).Regarding the conditions listed after Lemma 1 under which the algorithm performs a limited numberof iterations, they are automatically satisﬁed under the assumption that the maps are piecewise linearsince in that case any large set of connected connectors will keep a strict ordering.13 . Examples and Results We introduce three examples where we applied the presented algortihm using the Octave implemen-tation described in Section 4. In all cases, the experiments were run on laptop with an Intel i3 coreprocessor running Ubuntu OS.

We consider ﬁrst the example of Listing 1 with N = 1000. The vertices S.p , S.n , and

G.p arerepresented by numbers 1, 2, and 3, respectively. The vertices R [1 : 1000] .p are represented by numbers1001 to 2000, and vertices R [1 : 1000] .n by numbers 2001 to 3000. Similarly, C [1 : 1000] .p are representedby numbers 3001 to 4000, and C [1 : 1000] .n are represented by numbers 4001 to 5000.Using Algorithm 3, the map D map results as follows: D map ( v ) =  v if v ∈ { } v − v ∈ { } v − v ∈ { } v ∈ { } v ∈ { } v ∈ { } v if v ∈ { } v −

999 if v ∈ { } v if v ∈ { } v if v ∈ { } which can be easily veriﬁed to be correct. The representative of the connected components are S.p (represented by number 1),

S.n (represented by number 2) , R [2 : 1000] .p (represented by numbers 1002to 2000), and R [1000] .n (represented by number 3000).Octave reports 2 .

42 seconds to compute the connected components. The algorithm ﬁnishes after onlyone iteration. In order to check that the computation time was independent on N we repeated thecalculations for N = 10 , N = 1 , ,

000 and the three cases took almost exactly the same time.It is worth mentioning that the Octave is an interpreter, so the time of 2 .

42 seconds would be noticeablyreduced on a compiled implementation.We also implemented in Octave a simple automatic code generator for connected components. In thisexample, the generated code is shown in Listing 3.

Listing 3: Generated Equations for i in {[1001:1:1001]}effort (i) = effort (1)endfor i in {[1:1:1]}flow(i) + flow(i +1000) = 0endfor i in {[3:1:3]}effort (i) = effort (2)endfor i in {[4001:1:5000]}effort (i) = effort (2)end or i in {[2:1:2]}flow(i) + flow(i+1) + sum(flow(i1), for i1 in [4001:1:5000]) = 0endfor i in {[2001:1:2999]}effort (i) = effort (i-999)endfor i in {[3001:1:3999]}effort (i) = effort ( i-1999 )endfor i in {[1002:1:2000]}flow(i) + flow(i+999) + flow(i +1999) = 0endfor i in {[4000:1:4000]}effort (i) = effort (3000)endfor i in {[3000:1:3000]}flow(i) + flow(i +1000) = 0end For the same system of Figure 1, we changed the connections as follows:

Listing 4: Modelica connections connect (S.p,R [1].p);connect ( S.n,G.p );connect (C[1]. n,G.p);for i in 1: N-1 loopconnect (R[i].n, R[i+1].p);connect (C[i+1]. n, C[i].n); // recursive connection end for;for i in 1:N loopconnect (C[i].p, R[i].n);end for;

In this case, the algorithm ﬁnds the following map of connected components: D map ( v ) =  v ∈ { } v if v ∈ { } v − v ∈ { } v − v ∈ { } v ∈ { } v ∈ { } v ∈ { } v if v ∈ { } v −

999 if v ∈ { } v if v ∈ { } v if v ∈ { } v ∈ { } The map is exactly the same as before, but it is now more partitioned in the domain [4001 : 5000]. Thepresence of the recursive connection on

C.n is solved in a single step by the computation of mapInf function. 15he time taken by the algorithm in this case is 3 .

28 seconds (reported by Octave), and, as before, thealgorithm ﬁnishes after completing one iteration. The larger time can be explained by the fact that thenumber of maps is larger than before.

This example consists of a 2D network formed by N × M cells with 4 connectors each (left, right,up and down connectors), a ground component with one connector and a source component with twoconnectors. The network is connected as it is shown in Figure 2 and expressed in Listing 5. Figure 2: 2D NetworkListing 5: Modelica connections for i in 1: N-1,j in 1: M-1 loopconnect (Cell[i,j ].r, Cell[i,j +1].l);connect (Cell[i,j ].d, Cell[i+1,j].u);end for;for i in 1:N loopconnect (Cell[i,M ].r, Cell[i,1 ].l);end for;for j in 1:M loopconnect (Cell [1,j]. u,S.p);connect (Cell[N,j ]. d,S.n);end for;

In this case, each vertex is represented by two numbers:

S.p , S.n , and

G.p are [1 , , , i, j ].left is represented by [ N + i, M + j ]. Similarly, Cell[ i, j ].right, Cell[ i, j ].up, and Cell[ i, j ].left arerepresented by [2 N + i, M + j ], [3 N + i, M + j ], and [4 N + i, M + j ] respectively.Taking N = 1000 and M = 100, for instance, the algorithm ﬁnds the following map of connectedcomponents: 16 map ( v ) =  [2; 2] if v ∈ { [3 : 1 : 3] × [3 : 1 : 3] } v if v ∈ { [1 : 1 : 1] × [1 : 1 : 1] } v + [ − − v ∈ { [2001 : 1 : 3000] × [300 : 1 : 300] } v + [ − − v ∈ { [4001 : 1 : 4999] × [401 : 1 : 500] } v + [ − −

99] if v ∈ { [2001 : 1 : 3000] × [201 : 1 : 299] } v if v ∈ { [2 : 1 : 2] × [2 : 1 : 2] } [2; 2] if v ∈ { [5000 : 1 : 5000] × [401 : 1 : 500] } [1; 1] if v ∈ { [3001 : 1 : 3001] × [301 : 1 : 400] } v if v ∈ { [1001 : 1 : 2000] × [101 : 1 : 101] } v if v ∈ { [3002 : 1 : 4000] × [301 : 1 : 400] } v if v ∈ { [1001 : 1 : 2000] × [102 : 1 : 200] } that can be also veriﬁed to be correct. The time reported by Octave is 4 .

14 seconds and it is againindependent on N and M . The code produced is listed below. Listing 6: Generated Equations for 2D Network for i,j in {[3001:1:3001] x [301:1:400]}effort (i,j) = effort (1,1)endfor i,j in {[1:1:1] x [1:1:1]}flow(i,j) + sum(flow(i +3000, j1), for j1 in [301:1:400]) = 0endfor i,j in {[3:1:3] x [3:1:3]}effort (i,j) = effort (2,2)endfor i,j in {[5000:1:5000] x [401:1:500]}effort (i,j) = effort (2,2)endfor i,j in {[2:1:2] x [2:1:2]}flow(i,j) + flow(i+1,j+1) + sum(flow(i +4998, j1), for j1 in [401:1:500]) = 0endfor i,j in {[2001:1:3000] x [300:1:300]}effort (i,j) = effort ( i-1000,101 )endfor i,j in {[1001:1:2000] x [101:1:101]}flow(i,j) + flow(i +1000, j+199) = 0endfor i,j in {[2001:1:3000] x [201:1:299]}effort (i,j) = effort ( i-1000,j-99 )endfor i,j in {[1001:1:2000] x [102:1:200]}flow(i,j) + flow(i +1000, j+99) = 0endfor i,j in {[4001:1:4999] x [401:1:500]}effort (i,j) = effort ( i-999,j-100 )endfor i,j in {[3002:1:4000] x [301:1:400]}flow(i,j) + flow(i+999,j+100) = 0end

7. Conclusions and Future Research

We presented a novel algorithm for ﬁnding connected components in undirected graph that, undercertain regularity assumptions, has constant computational costs with the number of vertices and edges.17his is achieved using the concept of

Set-Based Graphs and, to the best of our knowledge, constitutesthe ﬁrst algorithm of this type.We described also a prototype implementation of the algorithm and its application to connectionﬂattening in object oriented models, a ﬁeld in which it is very common that the regularity assumptionsare accomplished. In addition, we demonstrated the usefulness and the functionality of the algorithmthrough three examples of large scale graphs, including a two-dimensional case.We believe this work opens several future lines of work and research. The implementation itself is asimple prototype in a high level interpreted language, so we are currently working on implementing thealgorithm in ModelicaCC compiler [7] in C++ language. In addition, we are also working on developingmore algorithms of this type (using SB-Graphs with maps) for other problems related to Modelica com-pilation: ﬁnding maximum matching in bipartite graphs and strongly connected components (directedgraphs). These problems were already solved using SB-Graphs in [13] but the solution was quite compli-cated and not as general as the one found here using maps for representing set-edges. Another relatedproblem that we are trying to solve using SB-Graphs is that of producing the code for computing thesparse Jacobian matrix in large systems of diﬀerential algebraic equations.Besides these new problems, there are several issues related to the algorithm presented here thatshould be taken into account in the future. Among them, it would be important to establish somebounds on the cost of every step of the algorithm with respect to the number of diﬀerent linear mapsthat are used to describe each map. In addition, we need to ﬁnd less restrictive conditions under whichthe algorithm actually has a constant cost with respect to the size of the sets.Another important goal is that of implementing these algorithms in a more robust and completeModelica compiler such as OpenModelica [16].Finally, we believe that this algorithm can be eﬀectively applied in other ﬁelds beyond object orientedmodels. Any problem leading to analysis on a large graph containing some regular connections is inprinciple a good candidate to be solved using SB-Graphs.The Octave library containing the algorithm, the functions and the examples presented in this articlecan be downloaded from . Funding

This work was partially funded by grant PICT–2017 2436 (ANPCYT).