FFast and Compact Planar Embeddings (cid:73)
Leo Ferres a , Jos´e Fuentes-Sep´ulveda b,c , Travis Gagie d,c, ∗ , Meng He e , GonzaloNavarro b,c a Faculty of Engineering, Universidad del Desarrollo & Telef´onica I+D, Santiago, Chile b Department of Computer Science, University of Chile, Santiago, Chile. c Center of Biotechnology and Bioengineering, University of Chile, Santiago, Chile. d School of Computer Science and Telecommunications, Diego Portales University,Santiago, Chile e Faculty of Computer Science, Dalhousie University, Halifax, Canada
Abstract
There are many representations of planar graphs, but few are as elegant asTur´an’s (1984): it is simple and practical, uses only 4 bits per edge, can handleself-loops and multi-edges, and can store any specified embedding. Its maindisadvantage has been that “it does not allow efficient searching” (Jacobson,1989). In this paper we show how to add a sublinear number of bits to Tur´an’srepresentation such that it supports fast navigation while retaining simplicity.As a consequence of the inherited simplicity, we offer the first efficient parallelconstruction of a compact encoding of a planar graph embedding. Our experi-mental results show that the resulting representation uses about 6 bits per edgein practice, supports basic navigation operations within a few microseconds,and can be built sequentially at a rate below 1 microsecond per edge, featuringa linear speedup with a parallel efficiency around 50% for large datasets.
Keywords:
Planar embedding, Compact data structures, Parallel construction
1. Introduction
The rate at which we store data is increasing even faster than the speed andcapacity of computing hardware. Thus, if we want to use efficiently what westore, we need to represent it in better ways. The surge in the number and com-plexity of the maps we want to have available on mobile devices is particularlypronounced and has resulted in a bewildering number of ways to store planargraphs. Each of these representations has its disadvantages, however: some do (cid:73)
A previous version of this paper appeared in the 15th Algorithms and Data StructuresSymposium (WADS 2017)[1]. ∗ Corresponding author
Email addresses: [email protected] (Leo Ferres), [email protected] (Jos´eFuentes-Sep´ulveda), [email protected] (Travis Gagie), [email protected] (Meng He), [email protected] (Gonzalo Navarro)
Preprint submitted to Computational Geometry February 20, 2018 a r X i v : . [ c s . D S ] F e b ot support fast navigation, some are large, some cannot represent multi-edgesor certain embeddings, and some are costly to build. In this paper we introducea compact representation of planar graph embeddings that addresses all theseissues, and demonstrate its practicality.More concretely, as described in Section 2, a planar embedding with n nodesand m edges can be represented in m log 12 ≈ . m bits [2], which has beenmatched with o ( m )-bit redundancy with a structure that in addition supportsefficient navigation [3]. The structure is, however, complex and no implemen-tation has been attempted. The much simpler representation of Tur´an [4] uses4 m bits, which is still close to the lower bound, but it does not support naviga-tion. The other existing representations require more than 4 m bits for generalplanar embeddings, some restrict the embeddings where they apply, and mosthave complicated construction algorithms. The majority of these constructionscannot be parallelized, and the others require O ( m log m ) work.Our contribution in this paper is threefold:1. We show how to add o ( m ) bits to Tur´an’s representation such that itsupports fast navigation. We can list the edges incident to any vertex inclockwise or counter-clockwise order using constant time per edge, includ-ing starting the enumeration at any desired neighbour. As a consequence,we can also list the nodes on a face in constant time per node. We canalso find a vertex’s degree in time O ( f ( m )) for any f ( m ) ∈ ω (1), anddetermine whether two vertices are neighbours in O ( f ( m )) time for any f ( m ) ∈ ω (log m ).2. We give a parallel algorithm that builds our data structure from any span-ning tree of the planar embedding, in O ( m ) work and O (log m ) span( O (cid:0) log m (cid:1) span to support the neighbour query). This is the first linear-work practical parallel algorithm for building compact representations ofplanar graphs.3. We implement and experimentally evaluate the space, query, and con-struction performance of our representation. In practice, our structureuses less than 6 m bits, performs navigation operations within a few mi-croseconds, and can be built sequentially at a rate below 1 microsecondper edge. The parallel algorithm scales linearly, with an efficiency around50% for large datasets, with up to 24 processors.Summarizing, we offer the first simple compact representation of planar em-beddings, which is easy to program, uses little space, and is efficiently built andnavigated. Our structure is thousands of times faster than the classical onewhen compression makes our representation fit in main memory. We leave thecode publicly available at .Tur´an chooses an arbitrary spanning tree of the graph, roots it at a vertex onthe outer face and traverses it, writing its balanced-parentheses representationas he goes and interleaving that sequence with another over a different binaryalphabet, consisting of an occurrence of one character for the first time he seeseach edge not in the tree and an occurrence of the other character for the secondtime he sees that edge. These two sequences can be written as three sequences2ver { , } : one of length 2 n − m − n + 2 encoding the interleaved sequence;and one of length 2 m indicating how they are interleaved. Our extension of thisrepresentation is based on the observation that the interleaved sequence encodesthe balanced-parentheses representation of the complementary spanning tree ofthe dual of the graph. By adding a sublinear number of bits to each balanced-parentheses representation, we can support fast navigation in the trees, and bystoring the sequence indicating the interleaving as a bitvector with support foroperations rank and select [5], we can support fast navigation in the graph.Section 2 surveys the related work on compact representations of planarembeddings. Section 3 describes bitvectors and the balanced-parentheses rep-resentation of trees, which are the building blocks of our extension of Tur´an’srepresentation. We also describe the model of parallelism we use in our con-struction algorithms. In Section 4 we prove the observation mentioned aboveon Tur´an’s interleaved sequence. In Section 5 we describe our data structureand how we implement queries. Section 6 describes our parallel constructionalgorithm and discusses some implementation issues. In Section 7 we describeour experiments on space, query and construction performance, and discussthe results. Finally, in Section 8 we present our conclusions and future workdirections.
2. Related work
Tutte [2] showed that representing a specified embedding of a connected pla-nar multi-graph with n vertices and m edges requires m log 12 ≈ . m bits inthe worst case. Tur´an [4] gave a very simple representation that uses 4 m bits.Jacobson [5] argued that this representation “does not allow fast searching” and(introducing techniques that we will apply to Tur´an’s representation) proposedone that instead uses O ( m ) bits and supports fast navigation, based on bookembeddings [6]. Munro and Raman [7] estimated that Jacobson’s representa-tion uses 64 n bits and proposed one using 2 m + 8 n + o ( m ) bits that retains fastnavigation, still based on the same book embeddings (but this does not han-dle self-loops). Keeler and Westbrook [8] also noted that “the constant factorin [Jacobson’s] space bound is relatively large” and gave a representation thatuses m log 12 + O (1) bits for planar graphs (not embeddings), as well as forplanar embeddings containing either no self-loops or no vertices with degree 1;however, they again gave up fast navigation. Chiang, Lin and Lu [9], improv-ing previous work by Chuang et al. [10], gave a representation (without allowingself-loops) that uses 2 m +3 n + o ( m ) bits with fast navigation, based on so-calledorderly spanning trees. However, although all planar graphs can be representedwith orderly spanning trees, some planar embeddings cannot. For simple planarembeddings (i.e., no self-loops nor multiple edges, thus m ≤ n ), their space de-creases to 2 n + 2 m + o ( m ) ≤ m + o ( m ) on connected graphs. Barbay et al. [11]gave a data structure that uses O ( m ) bits to represent simple planar graphswith fast navigation, based on realizers of planar triangulations [12]. Still, their3onstant is relatively large, 18 n + o ( m ). Finally, Blelloch and Farzan [3], extend-ing the work of Blandford et al. . [13], matched for the first time Tutte’s lowerbound on general planar embeddings, with a structure that uses m log 12 + o ( m )bits and supports fast navigation. Their structure is based on small vertex sep-arators [14]. They can also represent any planar graph within its lower-boundspace plus a sublinear redundancy, even when the exact lower bound is unknownfor general planar graphs [15]. While Blelloch and Farzan closed the problemin theoretical terms, their representation is complicated and has not been im-plemented. Other authors [16, 17, 18, 19, 20] have considered special kinds ofplanar graphs, notably tri-connected planar graphs and triangulations. We referthe reader to Munro and Nicholson’s [21] and Navarro’s [22, Chapter 9] recentsurveys for further discussion of compact data structures for graphs.Most of the navigable representations we have mentioned require compli-cated construction algorithms, generally defying a parallel implementation. Itis not known how to compute a book embedding [6] in parallel, which is neces-sary to build the representations of Jacobson and of Munro and Raman. Thereare also no parallel algorithms to build orderly spanning trees [9], necessary forthe representation of Chiang et al. Its predecessor [10] uses instead a triangula-tion and a canonical ordering; for the latter there is only a CREW constructionrunning in O (cid:0) log n (cid:1) time with n processors [23]. As for the vertex separa-tors [14] required to build the representation of Blandford et al. and of Blellochand Farzan, Kao et al. [24] designed a linear-work, logarithmic-span algorithmfor computing a cycle separator of a planar graph. However, the constructionof these representations of planar embeddings decompose the input graph byrepeatedly computing separators until each piece is sufficiently small. This in-creases the total work to O ( n log n ) even if this optimal parallel algorithm isused. The best linear-work parallel algorithms [25] for building the realizers[12] used in the construction of Barbay et al. ’s representation have O (log n )span in the expected case but O (log n log log n ) deterministic span.
3. Preliminaries
A bitvector is a binary string that supports the queries rank and select inaddition to random access, where rank b ( i ) returns the number of bits set to b inthe prefix of length i of the string and select b ( j ) returns the position of the j thbit set to b . For convenience, we define select b (0) = 0.It is possible to represent a bitvector of length (cid:96) in (cid:96) + o ( (cid:96) ) bits and supportrandom access, rank and select in constant time [5, 26, 27]. Furthermore, ifthe bitvector has k (cid:0) (cid:96)k (cid:1) + o ( (cid:96) ) bits [28], which is (cid:96)H ( k/(cid:96) )+ o ( (cid:96) ) = k log( (cid:96)/k )+ O ( k )+ o ( (cid:96) ), with H ( x ) = − x log x − (1 − x ) log(1 − x ). All our logarithms are to the base 2 unless otherwise stated.By adding some further operations on the bitvectors, we can represent anordered tree or forest of t vertices using 2 t + o ( t ) bits and support natural navi-gation queries in constant time. One of the most popular such representations is4 string of balanced parentheses: we traverse each tree from left to right, writingan opening parenthesis when we first visit a vertex (starting at the root) anda closing parenthesis when we leave it for the last time (or, in the case of theroot, when we finish the traversal). We can encode the string of parentheses asa bitvector of length 2 t , with 0s encoding opening parentheses and 1s encodingclosing parentheses. By adding o ( t ) further bits, we can support in constanttime, among others, the following queries used by our solution [7, 29, 30]: • match ( i ) locates the position of the parenthesis matching the i th paren-thesis in the bitvector (i.e., finds the other parenthesis referring to thesame node); • parent ( v ) returns the parent of v , or 0 if v is the root of its tree. Nodes v and parent ( v ) are represented as their pre-order rank in the traversal. As we focus on practical parallel algorithms, we describe and analyze ourconstruction using the
Dynamic Multithreading (DyM) Model [31] (we never-theless express our final results in terms of the PRAM model as well). In theDyM model, a multithreaded computation is modelled as a directed acyclicgraph (DAG) where vertices are instructions and edge ( u, v ) represents prece-dence between instructions u and v . The model is based on two parametersof the multithreaded computation: its work T and its span T ∞ . The work isthe running time on a single thread, that is, the number of nodes (i.e., instruc-tions) in the DAG, assuming each instruction takes constant time. The span isthe length of the longest path in the DAG, that is, the intrinsically sequentialpart of the computation. The time T p needed to execute the computation on p threads then has complexity Θ( T /p + T ∞ ), which can be reached with a greedyscheduler. The improvement of a multithreaded computation using p threads iscalled speedup , T /T p . The upper bound on the achievable speedup, T /T ∞ , iscalled parallelism . Finally, the efficiency is defined as T /pT p and can be inter-preted as the percentage of improvement achieved by using p cores or how closewe are to the linear speedup. In the DyM model, the workload of the threadsis balanced by using the work-stealing algorithm [32].To describe parallel algorithms in the DyM model, we augment sequentialpseudocode with three keywords. The spawn keyword, followed by a procedurecall, indicates that the procedure should run in its own thread and may thusbe executed in parallel to the thread that spawned it. The sync keyword indi-cates that the current thread must wait for the termination of all threads it hasspawned. Finally, parfor is “syntactic sugar” for spawn ing one thread per iter-ation in a for loop, thereby allowing these iterations to run in parallel, followedby a sync operation that waits for all iterations to complete. In practice, the parfor keyword is implemented by halving the range of loop iterations, spawn-ing one half and using the current procedure to process the other half recursivelyuntil reaching one iteration per range. After that, the iterations are executedin parallel. Therefore, this implementation adds an overhead bounded above by5he logarithm of the number of loop iterations. We include such overheads inour complexities.
4. Spanning trees of planar graphs
It is well known [33, 34, 35] that for any spanning tree T of a connectedplanar graph G , the edges dual to T are a spanning tree T ∗ of the dual of G ,with T and T ∗ interdigitating; see Figure 1 for an illustration (including multi-edges and a self-loop). If we choose T as the spanning tree of G for Tur´an’srepresentation, then we store a 0 and a 1, in that order, for each edge in T ∗ .We now show that these bits encode a traversal of T ∗ . Lemma 1.
Consider any planar embedding of a planar graph G , any spanningtree T of G and the complementary spanning tree T ∗ of the dual of G . If weperform a depth-first traversal of T starting from any vertex on the outer faceof G and always process the edges incident to the vertex v we are visiting incounter-clockwise order (starting from the edge immediately after the one to v ’sparent or, if v is the root of T , from immediately after any incidence of theouter face), then each edge not in T corresponds to the next edge we cross in adepth-first traversal of T ∗ . Proof.
Suppose the traversal of T ∗ starts at the vertex of the dual of G corre-sponding to the outer face of G . We now prove by induction that the vertex weare visiting in T ∗ always corresponds to the face of G incident to the vertex weare visiting in T and to the previous and next edges in counter-clockwise order.Our claim is true before we process any edges, since we order the edgesstarting from an incidence of the outer face to the root of T . Assume it is stilltrue after we have processed i < m edges, and that at this time we are visiting v in T and v ∗ in T ∗ . First suppose that the ( i + 1)th edge ( v, w ) we process is in T . We note that w (cid:54) = v , since otherwise ( v, w ) could not be in T . We cross from v to w in T , which is also incident to the face corresponding to v ∗ . Now ( v, w )is the previous edge — considering their counter-clockwise order at w , startingfrom ( v, w ) — and the next edge (which is ( v, w ) again if w has degree 1) is alsoincident to v ∗ . This is illustrated on the left side of Figure 2. In fact, the nextedge is the one after ( v, w ) in a clockwise traversal of the edges incident to theface corresponding to v ∗ .Now suppose ( v, w ) is not in T and let w ∗ be the vertex in T ∗ correspondingto the face on the opposite side of ( v, w ), which is also incident to v . We notethat w ∗ (cid:54) = v ∗ , since otherwise ( v, w ) would have to be in T . We cross from v ∗ to w ∗ in T ∗ . Now ( v, w ) is the previous edge — this time still considering theircounter-clockwise order at v — and the next edge (which may be ( v, w ) againif it is a self-loop) is also incident to w ∗ . This is illustrated on the right sideof Figure 2. In fact, the next edge is the one that follows ( v, w ) in a clockwisetraversal of the edges incident to the face corresponding to w ∗ .Since our claim remains true in both cases after we have processed i + 1edges, by induction it is always true. In other words, whenever we should6 G − T T ∗ H G C BE DFA3 4 62 51 78
Figure 1:
Top left:
A planar embedding of a planar graph G , with a spanning tree T of G shown in red and the complementary spanning tree T ∗ of the dual of G shown in blue withdashed lines. Bottom left:
The two spanning trees, with T rooted at the vertex on theouter face and T ∗ rooted at the vertex A corresponding to the outer face. Right:
The list ofedges we process while traversing T starting at and processing edges in counter-clockwiseorder, with the edges in T shown in red and the ones in G − T shown in black; the edges of T ∗ corresponding to the edges in G − T are shown in blue. process next an edge e in G that is not in T , we are visiting in T ∗ one of thevertices corresponding to the faces incident to e (i.e., one of the endpoints ofthe edge in the dual of G that corresponds to e ). Since we process each edgein G twice, once at each of its endpoints or twice at its unique endpoint if itis a self-loop, it follows that the list of edges we process that are not in T ,corresponds to the list of edges we cross in a traversal of T ∗ . (cid:3) wv ∗ v wv ∗ w ∗ x yu u Figure 2:
Left:
If we process an edge ( v, w ) in T , then we move to w in our traversal of T and the next edge, ( w, x ) in this case, is also incident to the vertex v ∗ we are visiting in ourtraversal of T ∗ . Right:
If ( v, w ) is not in T , then in T ∗ we move from v ∗ to the vertex w ∗ corresponding to the face on the opposite side of ( v, w ) in G . The next edge, ( v, y ) in thiscase, is also incident to w ∗ . We process the edges in counter-clockwise order so that the traversals of T and T ∗ are from left to right and from right to left, respectively; processingthem in clockwise order would reverse those directions. For example, for theembedding in Figure 1, if we start the traversal of the red tree T at vertex 1 andstart processing the edges at (1 ,
5. Data structure
Our extension of Tur´an’s representation of a planar embedding of a con-nected planar graph G with n vertices and m edges consists of the followingcomponents, which take 4 m + o ( m ) bits: • a bitvector A [1 .. m ] in which A [ i ] = 1 if and only if the i th edge we processin the traversal of T described in Lemma 1, is in T ; • a bitvector B [1 .. n − B [ i ] = 0 if and only if the i th time weprocess an edge in T during the traversal, is the first time we process thatedge; • a bitvector B ∗ [1 .. m − n + 1)] in which B ∗ [ i ] = 0 if and only if the i thtime we process an edge not in T during the traversal, is the first time weprocess that edge.Notice B encodes the balanced-parentheses representation of T , except that itlacks the leading 0 and trailing 1 encoding the parentheses for the root. ByLemma 1, B ∗ encodes the balanced-parentheses representation of a traversal ofthe spanning tree T ∗ of the dual of G complementary to T (the right-to-lefttraversal of T ∗ , in fact), except that it also lacks the leading 0 and trailing 1encoding the parentheses for the root. Therefore, since B and B ∗ encode forests,we can support match and parent with them.To build A , B and B ∗ given the embedding of G and T , we traverse T asin Lemma 1. Whenever we process an edge, if it is in T then we append a 1 to A and append the edge to a list L ; otherwise, we append a 0 to A and appendthe edge to another list L ∗ . When we have finished the traversal, we replaceeach edge in L or L ∗ by a 0 if it is the first occurrence of that edge in that list,8nd by a 1 if it is the second occurrence; this turns L and L ∗ into B and B ∗ ,respectively. For the example shown in Figure 1, L and L ∗ eventually containthe edges shown in the columns labelled T and G − T , respectively, in the tableon the on the right side of the figure, and A [1 ..
28] = 0110110101110010110100010100 B [1 ..
14] = 00101100110011 B ∗ [1 ..
14] = 01001001110101 . We identify each vertex v in G by its pre-order rank in our traversal of T .We say that, while we visit v , we process all the edges that lead from v to othernodes w . Note that each edge ( v, w ) is processed twice, while visiting v andwhile visiting w , but these correspond to two distinct positions in our traversal.Consider the following queries: first ( v ) : return i such that the first edge we process while visiting v is the i thwe process during our traversal; last ( v ) : return i such that the last edge we process while visiting v is the i th weprocess during our traversal; next ( i ) : return j such that if we are visiting v when we process the i th edgeduring our traversal, then the next edge we process when visiting v , incounter-clockwise order, is the one we process j th; prev ( i ) : return j such that if we are visiting v when we process the i th edgeduring our traversal, then the previous edge we processed when visiting v , in counter-clockwise order, is the one we process j th; mate ( i ) : return j such that we process the same edge i th and j th during ourtraversal; vertex ( i ) : return the vertex v such that we are visiting v when we process the i th edge during our traversal.With these it is straightforward to reenact our traversal of T and recover theembedding of G . For example, with the following queries we can list the edgesincident to the root of T in Figure 1 and determine whether they are in T : first (1) = 1 mate (1) = 4 vertex (4) = 3 A [1] = 0 next (1) = 2 mate (2) = 10 vertex (10) = 2 A [2] = 1 next (2) = 11 mate (11) = 17 vertex (17) = 5 A [11] = 1 next (11) = 18 mate (18) = 26 vertex (26) = 7 A [18] = 1 . To see why we can recover the embedding from the traversal, consider that ifwe have already correctly embedded the first i edges processed in the traversal,then we can embed the ( i + 1)th correctly given its endpoints and its rank in thecounter-clockwise order at those vertices. Queries last and prev are superfluousfor this task, but they allow traversing the neighbours of a node in clockwiseorder. 9 .1. Implementing the basic queries We now explain our constant-time implementations of first , next , prev , mate and vertex . Query first . If m = 0 then first ( v ) is undefined, which we indicate by returning0. Otherwise, we first process an edge at v immediately after first arriving at v . Since we identify v with its pre-order rank in our traversal of T and B lacksthe opening parenthesis for the root, while first arriving at any vertex v otherthan the root we write the ( v − B and, thus, the B. select ( v − A . If v is the root then first ( v ) = 1 and so, since select x (0) = 0, this case isalso handled by the formula below: first ( v ) = (cid:26) A. select ( B. select ( v − m ≥
10 otherwise.In our example, first (5) = A. select ( B. select (4)) + 1 = A. select (7) + 1 = 12and indeed the twelfth edge we process, (5 , Query last . The logic of last is similar to that of first ; we must locate the closingparenthesis that represents v in T . last ( v ) = (cid:26) A. select ( B. match ( B. select ( v − m ≥
10 otherwise.
Query next . If the i th edge we process is the last edge we process at a vertex v then next ( i ) is undefined, which we again indicate by returning 0. This is thecase when i = 2 m , or A [ i ] = 1 and B [ A. rank ( i )] = 1. Otherwise, if the i th edgewe process is not in T , then A [ i ] = 0, and we process the next edge at v onetime step later. Finally, if the i th edge e we process is in T and not the last onewe process at v , then we next process an edge at v immediately after returningto v by processing e again at time mate ( i ). This is the case when A [ i ] = 1 and B [ A. rank ( i )] = 0. In other words, next ( i ) = i + 1 if i < m and A [ i ] = 0 mate ( i ) + 1 if i < m and A [ i ] = 1 and B [ A. rank ( i )] = 00 otherwise.In our example, since A [12] = 1, B [ A. rank (12)] = B [8] = 0, the twelfth edgewe process is (5 ,
6) and it is also the fifteenth edge we process, next (12) = mate (12) + 1 = 16 , and indeed the second edge we process at vertex 5 is (5 , uery prev . The logic for prev is similar to that of next ; we only need to con-sider that, once we move one position backwards, we might arrive at a closingparenthesis. The formula follows. prev ( i ) = i − i > A [ i −
1] = 0 mate ( i −
1) if i > A [ i −
1] = 1 and B [ A. rank ( i − Query mate . To implement mate ( i ), we check A [ i ] to determine whether wewrote a bit in B or in B ∗ while processing the i th edge, and use rank on A tofind that bit in the corresponding sequence. We then use match to find the bitencoding the matching parenthesis, and finally use select on A to find where wewrote in A that matching bit. Therefore, mate ( i ) = (cid:26) A. select ( B ∗ . match ( A. rank ( i ))) if A [ i ] = 0 A. select ( B. match ( A. rank ( i ))) otherwise.To compute mate (12) for our example, since A [12] = 1, mate (12)= A. select ( B. match ( A. rank (12)))= A. select ( B. match (8))= A. select (9)= 15 . Query vertex . Suppose the i th edge e we process is not in T and we process it atvertex v . If the preceding time we processed an edge in T was the first time weprocessed that edge, we then wrote a 0 in B , encoding the opening parenthesisfor v ; otherwise, we then wrote a 1 in B , encoding the closing parenthesis forone of v ’s children. Now suppose e is in T . If that is the first time we process e , we move to the other endpoint w of e — which is a child of v — and writea 0 in B , encoding the opening parenthesis for w . If it is the second time weprocess e , then we write a 1 in B , encoding the closing parenthesis for v itself.Therefore, vertex ( i ) = B. rank ( A. rank ( i )) + 1if A [ i ] = 0 and B [ A. rank ( i )] = 0 B. parent ( B. rank ( B. match ( A. rank ( i )))) + 1if A [ i ] = 0 and B [ A. rank ( i )] = 1 B. parent ( B. rank ( A. rank ( i ))) + 1if A [ i ] = 1 and B [ A. rank ( i )] = 0 B. rank ( B. match ( A. rank ( i ))) + 1otherwise.In our example, since A [16] = 0 and B [ A. rank (16)] = B [9] = 1, vertex (16) 11 unction degree Input: node v d = 0 edg = first ( v ) while edg (cid:54) = 0 do edg = next ( edg ) d = d + 1 return d Function listing
Input: node v edg = first ( v ) while edg (cid:54) = 0 do mt = mate ( edg ) output vertex ( mt ) edg = next ( edg ) Function face
Input: edge e edg = e , fst = true while edg (cid:54) = e or fst do fst = false mt = mate ( edg ) output vertex ( mt ) edg = next ( mt ) = B. parent ( B. rank ( B. match ( A. rank (16)))) + 1= B. parent ( B. rank ( B. match (9))) + 1= B. parent ( B. rank (8)) + 1= B. parent (5) + 1= 5 , and indeed we process the sixteenth edge (5 ,
7) while visiting 5.We remind the reader that since B lacks parentheses for the root of T , B. parent (5) refers to the parent of the fifth vertex in an in-order traversal of T not including the root, i.e., the parent vertex 5 of vertex 6. Adding 1 includesthe root in the traversal, so the final answer correctly refers to vertex 5. Thelack of parentheses for the root also means that, e.g., B. parent (4) refers to theparent of vertex 5 and returns 0 because vertex 5 is the root of its own tree inthe forest encoded by B , without vertex 1. Adding 1 to that 0 also correctlyturns the final value into 1, the in-order rank of the root. Of course, we have theoption of prepending and appending bits to A , B and B ∗ to represent the rootsof T and T ∗ , but that slightly confuses the relationship between the positionsof the bits and the time steps at which we process edges.We also note that, if we do not require that node identifiers are preciselypreorder ranks in T , then we can use the positions of their 0 in B as theiridentifiers. This removes the need for using B. rank and B. select in all theformulas that convert between node identifiers and positions in T . We can define more complex queries on top of the basic ones. For exam-ple, we give the pseudocode of three queries: degree ( v ) returns the number ofneighbours of vertex v ; listing ( v ) returns the list of neighbours of vertex v , incounter-clockwise order; face ( e ) returns the list of vertices, in clockwise order,of one of the face where the edge e belongs. We also support the other order(clockwise or counter-clockwise, or the other face where e belongs) by using last and prev instead of first and next .Queries listing ( v ) and face ( e ) are implemented in optimal time, that is, O (1) per returned element. Instead, degree ( v ) requires time O ( degree ( v )).We can also determine neighbour ( u, v ), that is, whether two vertices u and v are neighbours, by listing the neighbours of each in interleaved form, in time12 (min( degree ( u ) , degree ( v ))). These times are not so satisfactory comparedwith the O (1) achieved by other representations [7, 9] to compute neighbour ( u, v )and degree ( v ).For degree ( v ), we can get arbitrarily close to constant time by adding o ( m ) further bits to our representation, that is, we can solve the query in time O ( f ( m )) for any given function f ( m ) ∈ ω (1). To do this, we store a bitvector D [1 ..n ] marking with 1s the (at most) m/f ( m ) = o ( m ) vertices with degree atleast f ( m ), which takes nH ( m/ ( nf ( m )))+ o ( n ) = O (( m/f ( m )) log( nf ( m ) /m ))+ o ( n ) = o ( m ) bits by using a sparse bitvector representation [28] (recall that G is connected, so m ≥ n − E [1 .. m ]where we append, for each D [ v ] = 1, degree ( v ) − E has m/f ( m ) 1s, it can also be stored as a sparse bitvector using O (( m/f ( m )) log f ( m )) + o ( m ) = o ( m ) bits. Therefore, if D [ v ] = 1, its de-gree is obtained in constant time with select ( E, r ) − select ( E, r − r = rank ( D, v ). If, instead, D [ v ] = 0, then we know that degree ( v ) < f ( m )and thus we apply the procedure that sequentially counts the neighbours, intime O ( f ( m )).We can use a similar idea, albeit more complex, to answer neighbour ( u, v )queries in time O ( f ( m )), for any f ( m ) = ω (log m ). We consider the graphinduced by the O ( m/f ( m )) = o ( m/ log m ) nodes with degree f ( m ) or higherand eliminate multi-edges and self-loops. The resulting graph G (cid:48) is simple andstill planar, so it has average degree less than 6 and thus o ( m/ log m ) edges.We represent G (cid:48) in classical adjacency-list form, with the nodes inside eachlist sorted by increasing node identifier. This requires o ( m ) bits in total. Tosolve neighbour ( u, v ) in G (cid:48) , we can use binary search for v in the list of u intime O (log m ) = o ( f ( m )). To answer neighbour ( u, v ) on G , we check whethereither u or v is low-degree (assuming we mark low-degree nodes in a bitvector D (cid:48) analogous to D ) and, if so, list its neighbours in O ( f ( m )) time. If not, wetranslate nodes u and v to their corresponding nodes u (cid:48) = rank ( D (cid:48) , u ) and v (cid:48) = rank ( D (cid:48) , v ) in G (cid:48) and query G (cid:48) in time o ( f ( m )).The following theorem summarizes the results of this section. Theorem 1.
We can store a given planar embedding of a connected planargraph G with m edges in m + o ( m ) bits such that later, given a vertex v , wecan list the edges incident to v in clockwise or counter-clockwise order, even ifwe are given a particular starting edge incident to v , using constant time peredge. We can also traverse the edges limiting a face in constant time per edge.Further, we can find a vertex’s degree in O ( f ( m )) time for any given function f ( m ) ∈ ω (1) , and determine whether two vertices are neighbours in O ( f ( m )) time for any given function f ( m ) ∈ ω (log m ) .5.3. Reducing space on simple planar graphs Chiang et al. [9] use 2 m + 3 n + o ( m ) bits to represent planar graphs withoutself-loops, which can be more than the 4 m + o ( m ) bits used in our representation.However, if G is simple (i.e., has no loops nor multiple edges), their representa-tion requires only 2 m + 2 n + o ( m ) ≤ m + o ( m ) bits. We remind the reader that13his representation can handle any simple planar graph, but does not alwaysrespect the given embedding, so they cannot represent arbitrary embeddings.We show that, if there are no self-loops, our representation can use lessthan 4 m + o ( m ) bits, by exploiting some redundancy in our representation andwithout changing the main scheme. Assume we represent a single sequence S [1 .. m ] over an alphabet of four symbols, Σ = { ( , ) , [ , ] } , that replaces A , B ,and B ∗ . That is, the parentheses are the 0s and 1s in B , the brackets are the0s and 1s and B ∗ , and A corresponds to whether the symbols are parenthesesor brackets. In our running example, the sequence is S [1 .. m ] = [ ( ( ] ) ( [ ) [ ) ( ( ] [ ) [ ) ( ] ( ] ] [ ) ] ) [ ] . The zeroth-order entropy of S is defined as H ( S ) = (cid:80) c ∈ Σ m c m log mm c , where c occurs m c times in S . The k th-order entropy, for any k >
0, is defined as H k ( S ) = (cid:80) C ∈ Σ k | S C | m H ( S C ), where S C is the string formed by the symbolsthat follow the context C in S (assume S is circular for simplicity, so that S [1]follows S [2 m ]).Ferragina and Venturini [36] show how to store a string S within | S | H k ( S ) + o ( | S | log | Σ | ) bits, for any k = o (log | Σ | | S | ), so that any substring of length O (log | S | ) can be extracted in constant time. We use their result to store S in 2 mH ( S ) + o ( m ) bits. Instead of a structure on parentheses on bitvector B and another on bitvector B ∗ , we build both parentheses structures on topof sequence S . Both are similar to the original o ( m )-bit structure of Navarroand Sadakane [30], only that the structure built to navigate parentheses ignoresthe bracket symbols, and vice versa (a similar arrangement is described byNavarro [22, pp. 311–315]). The only changes are that each symbol uses 2 bitsinstead of 1, that there are two symbols that do not change the “excess” count(number of opening minus closing parentheses up to some position), and thatin order to extract a chunk of Θ(log m ) symbols, we use the extraction methodof Ferragina and Venturini [36]. A rank / select functionality on top of A is alsoeasily provided on top of S , by using the same o ( m )-bit structures [26, 27] andinterpreting both parentheses as 1s and both brackets as 0s. Therefore, with o ( m ) further bits, we provide the necessary functionality on top of the H ( S )bits needed to encode S .This entropy gives precisely 2 bits per symbol (and thus 4 m bits in total)for general planar embeddings, but if there are no self-loops, then the substring“[ ]” cannot appear in S (other longer strings cannot appear either, but wewould need a higher-entropy model to capture them). An upper bound tothe first-order entropy when this substring is forbidden is obtained by noticingthat we can have only 3 symbols, instead of 4, following an opening bracket;therefore we can encode S using n log 4 + n log 4 + ( m − n ) log 3 + ( m − n ) log 4 = m log 12 + n log(4 / ≈ . m + 0 . n . This is still 4 m in the worst case.To obtain a nontrivial bound in terms of m , we calculate the exact first-orderentropy of S when substring “[ ]” is forbidden.Let us use the names op = (, cp =), ob = [, and cb =]. Let us call x y numberof symbols y following a symbol x in S ; for example op ob is the number of opening14rackets following opening parentheses, that is, the number of occurrences ofsubstring “( [” in S . It must then hold that (cid:80) op ∗ = (cid:80) cp ∗ = n and (cid:80) ob ∗ = (cid:80) cb ∗ = m − n . It also holds (cid:80) ∗ op = (cid:80) ∗ cp = n and (cid:80) ∗ ob = (cid:80) ∗ cb = m − n .The system of restrictions must be satisfied while maximizing2 mH ( S ) = nH ( op ∗ ) + nH ( cp ∗ ) + ( m − n ) H ( ob ∗ ) + ( m − n ) H ( cb ∗ ) , where H ( x , . . . , x ) = (cid:80) x i x log xx i and x = (cid:80) x i . Forbidding self-loops impliesthe additional restriction ob cb = 0.We solve the optimization problem with a combination of algebraic and nu-meric computation, using Maple and C, up to 4 significant digits. We find thatthe entropy is maximized at a value slightly below 3 . m . Therefore, the result-ing space with no self-loops and using the described compressed representationcan be bounded by 3 . m + o ( m ) bits. Simple graphs have no self-loops andno multiple-edges, but this second restriction translates into longer forbiddensubstrings, whose effect is harder to analyze.We remind the reader that the representation of Keeler and Westbrook [8],on the other hand, achieves m log 12 ≈ . m bits when no self-loops (or, alter-natively, no degree-one nodes) are permitted, yet it does not support queries.When neither self-loops nor degree-one nodes are permitted, they reach 3 m bits.In this case, both “[ ]” and “( )” are forbidden strings. While we have not beenable to compute the exact first-order entropy in this case, this must be at most n log 3 + n log 4 + ( m − n ) log 3 + ( m − n ) log 4 = m log 12 ≈ . m , which is ob-tained by using log 4 bits to encode the symbol that follows a closing bracket orparenthesis, and log 3 bits to encode the symbol that follows an opening bracketor parenthesis.We note that these space improvements can also be applied on top of therepresentation of Chiang et al. [9] since, when encoding a simple graph, thedifference between both representations is that they use a particular spanningtree (which may also force a particular embedding). Our representation can be easily extended to unconnected planar graphs,because our parentheses representations can immediately be extended to handleforests instead of just individual trees. To handle an unconnected planar graph,we first find all the connected components of the graph and then compute anarbitrary spanning tree for each connected component. Then, we constructthe binary sequences: the sequence B will represent the forest of the spanningtrees, concatenating all the balanced-parentheses representations; the sequence B ∗ will represent the complementary spanning tree of the dual of the graph.Finally, sequence A indicates the interleaving of the sequences B and B ∗ . Wevisit the connected components in arbitrary order. The maximum is 3 . m , found for m = 1 . n , op op = cp op = 0 . n , ob op = 0 . n , cb op = 0 . n , op cp = cp cp = 0 . n , ob cp = 0 . n , cb cp = 0 . n , op ob = cp ob =0 . n , ob ob = 0 . n , cb ob = 0 . n , op cb = cp cb = 0 . n , ob cb = 0 , cb cb = 0 . n . k > k − A will be 2( m + k −
1) and the length of B will be 2( n + k − K [1 ..n + k − K contains k − k log( n/k ) + O ( k ) bits [37]. Since n ≤ m + k , the space of the whole structurecan be written in terms of m and k as 4 m + k log( m/k ) + O ( k ) + o ( m ) bits.The k log( m/k ) + O ( k ) or k log( n/k ) + O ( k ) bits to describe the embeddingare asymptotically optimal: consider a chain of t triangles (delimited with m =2 t + 1 edges) and k − k connected components intotal) to represent all the ways to distribute k − t bins. This requireslog (cid:0) k + t − k − (cid:1) = k log( t/k ) + O ( k ) = k log( m/k ) + O ( k ) bits with any encoding.This is also k log( n/k ) + O ( k ) bits, since this graph has n = 2 t + k nodes.We use K to avoid listing fake edges in any of the traversal operations.The fake edges increase the degree of a node by a constant factor: a node mayhave one fake edge per face it participates in, which at most doubles its degree.Further, a node in the frontier of its component may have two extra fake edgesthreading it with other connected components. Therefore, the time complexityof the navigation operations is not affected.The fake edges may, in addition, be useful for a more ambitious face oper-ation that takes into account the actual embedding, where a face is surroundedby a sequence of edges but is also limited by the frontier edges of the connectedcomponents it has inside. To find all those edges, we also traverse the fake edgesin the face traversal, yet without listing them. The fake edges will lead us tothe other connected components that are contained and/or surround the facewe are listing.
6. Parallel construction
In this section we discuss the parallel construction of our extension of Tur´an’srepresentation. Since the representation is based on spanning trees and treetraversals, we can borrow ideas of well-known parallel algorithms, such as par-allel Euler Tour traversal or parallel computation of spanning trees.We assume that a tree T is represented with adjacency lists. Such represen-tation consists of an array of nodes V T [1 ..n ], and an array of edges E T [1 .. n − v ∈ V T stores two indices in E T , v. first and v. last , delimiting the ad-jacency list of v , which starts with v ’s parent edge (except the root) and is sorted16ounter-clockwise around v . The number of children of v is then v. last − v. first (plus 1 for the root). Each edge e ∈ E T has three fields: e.src and e.tgt arethe positions in V T of its source and target vertices, and e.mat is the positionin E T of the mate edge e (cid:48) of e , where e (cid:48) .src = e.tgt and e (cid:48) .tgt = e.src . Ourrepresentation of graphs is similar, with the exception that the concept of parent of a vertex is not valid in graphs; therefore the first edge in the adjacency listof a vertex v cannot be interpreted as v ’s parent edge. We will first assume that the input consists of a connected planar graphembedding G = ( V G , E G ) and a spanning tree T = ( V T , E T ) of G , together withan array C that stores the number of edges of G \ T between any two consecutiveedges in T , in counter-clockwise order. In Section 6.3 we will explain how toobtain T and C in parallel.With the spanning tree, we construct the bitvectors A , B , and B ∗ by per-forming an Euler Tour over T . During the tour, by writing a 0 for each forward(parent to child) edge and a 1 for each backward (child to parent) edge, weobtain the bitvector B . By reading in C the number of edges of G \ T betweentwo consecutive edges of T , representing these edges with 0s and the edges of T with 1s, we obtain the bitvector A . Finally, by using the previous Euler Tourand the array C we can obtain the bitvector B ∗ , by finding out which is thefirst (0) and which is the second (1) occurrence of each edge.Algorithm 1 gives the detailed pseudocode. It works in the following steps:1. In lines 1–4, it initializes the output bitvectors ( A and B ∗ are set to 0s)and creates an auxiliar array LE that is used to store the traversal of thetree following the Euler Tour. Each entry of LE represents one traversededge of T and stores four fields: value is 0 or 1 depending on whether theedge is a forward or a backward edge, respectively; succ is the index in LE of the next edge in the Euler tour; rankA is the rank of the edge in A ; and rankB is the rank of the edge in B .2. In lines 5–19, the algorithm traverses T to create the Euler Tour. Foreach edge e j ∈ E T , rankA is set to C [ E T [ j ] .mat ] + 1 and rankB to 1 (lines6–7). Those ranks will be used later to compute the final positions ofthe edges in A , B , and B ∗ . For each forward edge, a 0 is written in thecorresponding value field and the succ field is connected to the next edgein the Euler Tour. For backward edges the procedure is similar. Note thatall the edges in the adjacency list of a node of T are forward edges, except(for non-root nodes) the first one, which is the parent edge.3. Line 20 computes the final ranks in A and B using a parallel list rankingalgorithm that adds up the weights from the beginning of the list to eachelement. The weights are stored in the fields rankA and rankB of LE . Weuse the list ranking algorithm of Helman and J´aj´a [38].4. Bitvectors A and B are written in lines 21–23. Since initially all theelements of A are 0s, it is enough to set to 1 all the elements in the rankA lgorithm 1: Parallel compact planar embedding algorithm.
Input :
A planar graph embedding G = ( V G , E G ), a spanning tree T = ( V T , E T ) of G , an array C of size | E T | , and the starting vertex init . Output:
Bitvectors A , B and B ∗ induced by G and T . A = a bitvector of length | E G | initialized with 0s B = a bitvector of length | E T | − B ∗ = a bitvector of length | E G | − | E T | + 2 initialized with 0s LE = an array of length | E T | parfor j = 1 to | E T | do LE [ j ] . rankA = C [ E T [ j ] .mat ] + 1 LE [ j ] . rankB = 1 if E T [ j ] .src = init or E T [ j ] .src. first (cid:54) = j then // forward edge LE [ j ] . value = 0 // opening parenthesis if E T [ j ] .tgt. first = E T [ j ] .tgt. last then // target is a leaf LE [ j ] . succ = E T [ j ] .mat else // target has children LE [ j ] . succ = E T [ j ] .tgt. first + 1 else // backward edge LE [ j ] . value = 1 // closing parenthesis if E T [ j ] .mat = E T [ j ] .tgt.last then // j was the last edge of target, return LE [ j ] . succ = E T [ j ] .tgt. first else // continue with next edge from target LE [ j ] . succ = E T [ j ] .mat + 1 parallelListRanking ( LE ) parfor j = 1 to | E T | do A [ LE [ j ] . rankA ] = 1 B [ LE [ j ] . rankB ] = LE [ j ] . value D pos , D edge = arrays of length | E G | and | E G | − | E T | + 2, respectively parfor j = 1 to | E T | do p = LE [ j ] . rankA − LE [ j ] . rankB base = ref ( E T [ j ] .mat ) delta = p − base − parfor k = base + 1 to base + C [ E T [ j ] .mat ] do D pos [ k ] = k + delta D edge [ k + delta ] = k parfor j = 1 to | E G | − | E T | + 2 do mat = E G [ D edge [ j ]] .mat if j > D pos [ mat ] then B ∗ [ j ] = 1 createRankSelect ( A ), createBP ( B ), createBP ( B ∗ ) fields. For B , the algorithm copies the content of field value at position rankB , for all the elements in LE .5. The algorithm now computes the position of each edge of G \ T in B ∗ .That information is implicit in the fields rankA and rankB of LE (line 26),once the list ranking of step 3 is carried out. For each edge e ∈ E T , thealgorithm computes the positions in B ∗ of the edges of G \ T that follow, incounter-clockwise order, the mate edge of e (lines 27–31). The algorithmuses two auxiliary arrays, D pos and D edge . Let edge E G [ j ] belong to G \ T .Then D pos [ j ] stores the position of the edge in B ∗ . The array D edge is18he inverse of D pos : D edge [ i ] is the position of the i -th edge of B ∗ in E G .This step uses function ref ( e ), which maps the position e of an edge in E T to its position in E G . This is naturally returned by the spanning treeconstruction, which gives the identity in G of the edges selected for T .6. In lines 32–35, the algorithm computes whether the edges stored in D pos are forward or backward edges. For each edge e in G \ T , it compares thepositions in B ∗ of e and its mate. If the position of e is greater, then e isa backward edge and, therefore, is represented with a 1.7. Finally, in line 36 the structures to support operations rank , select , match ,and parent are constructed. For the bitvector A , the parallel algorithmof Labeit et al. [39] ( createRankSelect ) is used. For B and B ∗ theparallel algorithm of Ferres et al. [40] for balanced parenthesis sequences( createBP ) is used.We have omitted some implementation details for simplicity. For example,the pseudocode uses parfor throughout, whereas the implementation uses thethreads in a more controlled manner. Line 29, in particular, is more efficientlydone in sequential form. We have also omitted some space optimizations, suchas the reuse of some fields instead of allocating new arrays. Analysis.
Step 1 initializes the arrays, which requires T = O ( m ) work and T ∞ = O (log m ) span (due to the overhead of the implicit parfor ). In step 2,the algorithm traverses the edges of T , performing an independent computationon each edge. Therefore, with the overhead of the parfor loop, we obtain T = O ( n ) and T ∞ = O (log n ) time. Step 3 uses a parallel list ranking algorithm [38]over n elements, which has complexities T = O ( n ) and T ∞ = O (log n ). Step 4assigns the values to A and B independently for each entry, thus we have again T = O ( n ) and T ∞ = O (log n ). In step 5, the algorithm traverses all the edgesin G \ T . Since the loop in line 29 is also processed in parallel, we obtain T = O ( m − n ) and T ∞ = O (log( m − n )). Similarly to step 4, in step 6 the algorithmsets the entries of bitvector B ∗ , which can be done independently for each entry,obtaining times T = O ( m − n ) and T ∞ = O (log( m − n )). Finally, step 7builds the rank/select structures in times T = O ( m ) and T ∞ = O (log m ) [39].The construction of the structures supporting match and parent over balancedparentheses is constructed in times T = O ( m ) and T ∞ = O (log m ) [40].In addition to the size of the compact data structure, our algorithm uses O ( m log m ) bits for the arrays LE , D pos and D edge . As said, the constant iskept low in practice by reusing fields. Notice that the memory consumption isindependent of the number of threads. Before discussing how to construct the structures to speed up degree ( v )and neighbour ( u, v ) queries, let us discuss the parallel construction of the sparsebitvector of Raman et al. [28]. Let (cid:96) be the length of the sparse bitvector. Theirrepresentation divides the bitvector into blocks of length b = (log (cid:96) ) /
2. The i thblock is described as a pair ( c i , o i ), where c i corresponds to the number of 1s19nside the block, also known as the class of the block, and o i corresponds to its offset , an identifier among all the different blocks sharing the same class. Thus,the bitvector is represented as two arrays, C [1 .. (cid:100) (cid:96)/b (cid:101) ] and O [1 .. (cid:100) (cid:96)/b (cid:101) ], where C [ i ] = c i and O [ i ] = o i . We can compute in parallel each entry of the arrays C and O independently, using linear time on each block [22, Sec. 4.1]. Thus,we have O ( (cid:96) ) work and O (log( (cid:96)/b ) + b ) = O (log (cid:96) ) span. In order to reduce thespace consumption of the arrays C and O , the entries of the arrays are packedinto the bits of consecutive machine words. Notice that the size of the elementsof C is fixed, (cid:100) log( b + 1) (cid:101) bits, whereas the size of those of O , (cid:100) log o i (cid:101) bits, isvariable. To pack the entries of O in parallel, we need to compute an array P [1 .. (cid:100) (cid:96)/b (cid:101) ] pointing to the starting position of each element in O . Array P iscomputed with a parallel parallel prefix sum over the values (cid:100) log o i (cid:101) . This takeslinear work and logarithmic span [39], and then we can write each value o i to itspacked position in parallel. The array P is retained to provide efficient accessto O . To reduce its space to o ( (cid:96) ) bits, only the entries of the form P [ i · log n ] arestored in absolute form, whereas the others are stored as differences from thepreceding multiple of log n , using O (log log n ) bits. This space reduction is easilycomputed in parallel within the same time bounds. Once the data structures C , O , and P , using (cid:96)H + o ( (cid:96) ) bits, are built, we can access in constant timeany chunk of O (log (cid:96) ) bits from the bitvector by using tables [28]. Therefore, wecan provide rank and select functionality by building the classical o ( (cid:96) )-bit datastructures on top of the bitvector, in parallel [39]. In total, we use O ( (cid:96) ) workand O (log (cid:96) ) span.The structures to support degree ( v ) can then be constructed in parallel asfollows: First, we construct the bitvector D by checking all the vertices withdegree at least f ( m ). Remember that the degree of a vertex v can be computedin constant time with v. last − v. first . Since the degree of each vertex can beobtained independently, we can do this in parallel with O ( m ) work and O (log m )span. Then, we construct the bitvector E by writing in unary the degree of eachhigh-degree vertex. To do that, we perform a parallel prefix sum over all thedegrees of high-degree vertices. The prefix sum returns the positions where wehave to write a 1 in E . Thus, we construct E with O ( m ) work and O (log m )span. Finally, we construct the compact representation of D and E in O ( m )work and O (log m ) span, using the sparse bitvectors of Raman et al. [28].For the neighbour ( u, v ) query, we must contract the original graph G intoa smaller graph G (cid:48) = ( V (cid:48) , E (cid:48) ), induced by all the vertices with degree at least f ( m ). To build G (cid:48) efficiently in parallel we do as follows. We first compute D (cid:48) [1 ..n ] similarly to D . We then fill two arrays X [1 ..n ] and Y [1 .. m ], so that X [ i ] = D (cid:48) [ i ]; and Y [ j ] = 1 if D (cid:48) [ E G [ j ] .src ] = 1 and D (cid:48) [ E G [ j ] .tgt ] = 1, and Y [ j ] = 0 otherwise. Next, we perform a parallel prefix sum over X , so that X [ i ]is the name of node i in G (cid:48) (if D (cid:48) [ i ] = 1). We also perform a parallel prefixsum on Y , so as to write contiguously in array E (cid:48) the mapped edge targets, E (cid:48) [ j (cid:48) ] = X [ E G [ j ] .tgt ] for those entries j where Y [ j ] = 1, where j (cid:48) = (cid:80) jk =1 Y [ k ].For each such edge, we also check if it is the first with this X [ E G [ j ] .src ] value,and if so, we record that j (cid:48) is the start of the adjacency list of node X [ E G [ j ] .src ],in an array V (cid:48) [ X [ E G [ j ] .src ]] = j (cid:48) . 20hus V (cid:48) and E (cid:48) are an adjacency list representation of G (cid:48) , built with O ( m )work and O (log m ) span. Instead of sorting the adjacency lists, however, webuild a wavelet tree representation on E (cid:48) [39]. This supports the operation rank generalized to sequences, and therefore we use that high-degree nodes u and v of G are connected if and only if X [ v ] is mentioned in the adjacency list of X [ u ], that is, E (cid:48) . rank X [ v ] ( V (cid:48) [ X [ u ] + 1] − − E (cid:48) . rank X [ v ] ( V (cid:48) [ X [ u ]] − >
0. Thegeneralized rank operation takes time O (log | V (cid:48) | ) and the wavelet tree is builtwith O ( | E (cid:48) | ) = o ( m/ log m ) work and O (cid:0) log | E (cid:48) | (cid:1) = O (cid:0) log m (cid:1) span. Lemma 2.
Given a connected planar graph embedding G with m edges anda spanning tree of G , we can compute in parallel a compact representation of G , using m + o ( m ) bits and supporting the navigational operations describedin Section 5, in O ( m ) work and O (log m ) span ( O (cid:0) log m (cid:1) span if operation neighbour is supported), using O ( m log m ) bits of additional memory.6.3. Parallel computation of spanning trees In this section we discuss the parallel computation of the spanning tree T = ( V T , E T ) and the array C used in Section 6.1.Generating a rooted (or a directed) spanning tree turns out to be a difficultto parallelize problem. Even if it seems to be easier on planar embeddings, wedo not know of good worst-case results on the DyM model. We discuss practicalsolutions later.Such a spanning tree algorithm returns an array of parent references foreach vertex. With this array of references, we can construct the correspondingadjacency list representation of the spanning tree. To do that, we mark with a1 each edge E G that belongs to E T and with a 0 the rest of the edges. Using aparallel prefix sum algorithm over E G , we compute the position of all the markededges of E G in E T . The first and last fields of each node in the spanning tree arecomputed similarly. As a byproduct of the computation of E T , we can computethe array C , which stores the number of edges of G \ T between two consecutiveedges in T , in counter-clockwise order. This can be done by using the marks inthe edges, counting the number of 0s between two consecutive 1s. Note that thestarting vertex for the spanning tree must be in the outer face of G , to meet thedescription of the compact data structure for planar embeddings. Overall, werequire times T = O ( m ) and T ∞ = O (log m ) once the spanning tree is built,which is the complexity of the variants of the parallel prefix sum algorithm weemploy. By combining the results with Lemma 2, we have the main result onconstruction. Theorem 2.
The compact representation introduced in Theorem 1 of a con-nected planar graph embedding G with m edges can be constructed under the Dy-namic Multithreaded parallel model with O ( m + spw) work and O (log m + sps) span ( O (cid:0) log m + sps (cid:1) span if operation neighbour is supported), where spw and sps are the work and span, respectively, of any rooted spanning tree algorithmon planar embeddings. n practice. The generation of a spanning tree is also difficult to parallelize inpractice. Bader and Cong [41] mention that “the spanning tree problem is no-toriously hard for any parallel implementation to achieve reasonable speedup”,and propose an algorithm that is shown to perform well in practice. This is theone we use in our implementation.Their algorithm works as follows. Given a starting vertex of the graph G with n vertices and m edges, the algorithm computes sequentially a spanningtree of size O ( p ), called stub spanning tree , where p is the number of availablethreads. Then, the leaves of the stub spanning tree are evenly assigned to the p threads as starting vertices. Each thread traverses G , using its starting vertices,constructing spanning trees with a DFS traversal using a stack. For each vertex,a reference to its parent is assigned. Since a vertex can be visited by severalthreads, the assigment of the parent of the vertex may genarate a race condition .However, since the parent assigned by any thread already belongs to a spanningtree, any assignment will generate a correct tree. Thus, the race condition isbenign. Once a thread has no more vertices on its stack, it tries to steal verticesfrom the stack of another thread by using the work-stealing algorithm. Since thespanning trees generated by all the threads are connected to the stub spanningtree, the union of all the spanning tree generates a spanning tree of G .They analyze their algorithm in expectation on random graphs, obtaining O ( m/p ) time when p (cid:28) m , but general random graphs have a very small diam-eter. The diameter seems to be a lower bound for the span of their algorithm,and this is Θ( n / ) on random planar graphs [42]. Also, their best possibletime is O ( √ m ), achieved when using p = √ m processors. Despite its anal-ysis, the algorithm of Bader and Cong has a good practical behavior and itsimplementation is simple.To handle unconnected planar graphs, we can first use the algorithm ofShun et al. [43], which finds the connected components within O ( n ) work and O (cid:0) log n (cid:1) span with high probability, and is shown to perform well in practice. PRAM model.
We can also analyze our algorithm under the PRAM model.Algorithm 1 is easily translated into the EREW model, reaching O ( m/ log m )processors and O (log m ) time, dominated by the parallel list ranking of line20, the expansion from n to m processors in line 29, and the construction ofsuccinct structures in line 36. The construction in Section 6.2, of the structuresthat speed up degree and neighbour queries, is also easily carried out in theEREW model within those bounds, except for the sorting of the edges of G (cid:48) .This can be done in O (log m ) time with O ( m ) processors in the EREW model[44], and in O (cid:0) log m (cid:1) time with O ( m/ log m ) processors in the CREW model[45]. The postprocessing we have described in this section, once the spanningtree is built, also runs in O (log m ) time and O ( m/ log m ) EREW processors.The most costly part of the process is likely to be the construction of thespanning tree. The best PRAM results we know of are O (cid:0) log m log ∗ m (cid:1) timeand O ( m ) processors in the EREW model [46], O (cid:0) log m (cid:1) time and O ( m/ log m )processors in the arbitrary CRCW model [47], and O (log m ) time and O (cid:0) m (cid:1) processors in the same model [48]. 22 heorem 3. The compact representation introduced in Theorem 1 of a con-ncected planar graph embedding G with m edges can be constructed under thePRAM EREW model with O ( m ) processors and O (cid:0) log m log ∗ m (cid:1) time, andunder the PRAM arbitrary CRCW model with O ( m/ log m ) processors and O (cid:0) log m (cid:1) time, or O (cid:0) m (cid:1) processors and O (log m ) time.
7. Experiments
We implemented the data structure construction and queries in C and com-piled it using GCC 5.4. For the parallel construction we used Cilk Plus exten-sion, an implementation of the DyM model. We build only the basic structures,excluding those to speed up operations degree and neighbour . The code anddata needed to replicate our results are available at .The experiments were carried out on a NUMA machine with two NUMAnodes. Each NUMA node includes a 14-core Intel R (cid:13) Xeon R (cid:13) CPU (E5-2695)processor clocked at 2.3GHz. The machine runs Linux 4.4.0-83-generic, in 64-bit mode. The machine has per-core L1 and L2 caches of sizes 64KB and 256KB,respectively and a per-processor shared L3 cache of 35MB, with a 768GB DDR3RAM memory (384GB per NUMA node), clocked at 1867MHz. Hyperthreadingwas enabled, giving a total of 28 logical cores per NUMA node.
Our experiments ran on real and artificial datasets with different numbersof nodes. The datasets are shown in Table 1. For the artificial datasets wegenerated points ( x, y ) with the function rnorm of R . The real dataset, wc ,corresponds to the coordinates of 2 , ,
467 unique cities in the world. Fromthose real or generated points, we obtained a
Delaunay Triangulation using
Triangle , a software for the generation of meshes and triangulations . Finally,we generated planar embeddings from the Delaunay triangulations with the Edge Addition Planarity Suite . The minimum and maximum degree of thedataset wc was 3 and 36, respectively. For the rest of the datasets, the minimumdegree was 3 and the maximum degree was 16. The rnorm function generates random numbers with normal distribution given a meanand a standard deviation. In our case, the x and y components were generated using mean 0and standard deviation 10000. For more information about the rnorm function, visit https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Normal.html The dataset containing the coordinates was created by
MaxMind , available from . The original dataset contains 3 , , , ,
467 cities withunique coordinates to build our dataset wc . Available at . Our triangulations weregenerated using the options -cezCBVPNE . Available at https://github.com/graph-algorithms/edge-addition-planarity-suite .Our embeddings were generated using the options -s -q -p . n ) Edges ( m )1 wc 2,243,467 6,730,3952 pe5M 5,000,000 14,999,9833 pe10M 10,000,000 29,999,9794 pe15M 15,000,000 44,999,9835 pe20M 20,000,000 59,999,9756 pe25M 25,000,000 74,999,979 Table 1: Datasets used in our experiments.
There are no other implemented compact representations of planar embed-dings. In this subsection we aim to show that representations designed forother kinds of graphs are indeed not competitive for this graph family. We com-pare our compact representation with four solutions designed to compress Webgraphs, social networks and planar graphs [49, 50, 51, 52], and with one paral-lel framework for processing general graphs in compressed form [53]. The threesolutions for Web graphs and social networks require reordering the verticesof the graph. The solution of Apostolico and Drovandi [49] ( AD ) enumeratesthe vertices through a BFS traversal of the graph. The reordering induces twouseful properties: locality (a vertex with index i will have neighbours with in-dexes close to i ), and similarity (vertices with similar index will have similaradjacency lists). Thus, the vertices and their adjacency lists are compressedfollowing the ordering induced by the BFS traversal. The solution of Boldi et al. [50] ( BRSV ) reorders the nodes based on a clustering algorithm called
Layered Label Propagation (LLP) . The LLP algorithm is used in combinationwith the
WebGraph framework [54]. Brisaboa et al. [51] proposed the k -tree structure for graph compression. The k -tree is a compact tree representationof the adjacency matrix of a graph. The structure exploits the clustering ofthe edges in the adjacency matrix, representing large empty areas of the matrixefficiently. The clustering is dependent on the ordering of the vertices of thegraph. In our comparison, we used the k -tree structure combined with the BFStraversal of [49], as suggested by Hern´andez and Navarro [55]. Blandford et al. [52] proposed a compact representation based on graph separators ( GS ). Toconstruct the compact representation, the vertices of the graph must be renum-bered. The new numbering is computed recursively, decomposing the graphby the computation of graph separators. The sequence of computed separatorsgenerate the new numbering. After the renumbering step, adjacent vertices tendto be close in the numbering. The representation takes advantage of that andreorders the adjacency list of each vertex, storing the difference between consec-utive neighbours. Finally, the adjacency lists are encoded space-efficiently. Inour experiments, we use the child-flipping heuristic [52] to compute the num-bering of the vertices and snip code to encode the adjacency lists, which was24ataset Plain Ligra+ BRSV AD k -tree GS Pemb wc 74.67 52.50 14.57 14.73 16.40 14.88 6.00pe5M 74.67 52.99 14.97 14.14 15.33 15.12 5.93pe10M 74.67 53.15 15.03 14.33 14.73 15.12 5.93pe15M 74.67 53.20 15.04 14.38 14.38 15.12 5.72pe20M 74.67 53.24 15.07 14.43 14.15 15.12 5.93pe25M 74.67 53.32 15.11 14.50 13.96 15.14 5.80 Table 2: Bits per edge (bpe) of the plain representation, alternative compressed graph repre-sentations, and ours. the best among the choices we tested. Shun et al. [53] introduced
Ligra +, alightweight graph processing framework for shared-memory multicore machines.In Ligra+, the graph is stored in compressed form, by compressing the adjacencylist of each vertex. The adjacency list of each vertex is sorted in increasing orderand then the consecutive differences are run-length encoded. Finally, we alsoconsider a plain representation (
Plain) composed by an array of length 2 m ,representing the concatenation of the adjacency lists, and an array of length n ,representing the beginning of the adjacency list of each vertex.Table 2 shows the bits per edge ( bpe ) of all the representations, where oursolution is called Pemb , for planar embedding . In the table, we consider fourbytes for each vertex and edge in the plain representation, equivalent to an in-teger number in common programming languages. Our compact representationreaches the best results, using less than half of the space of its closest competi-tor. Note that the other results, using widely different techniques, obtain veryclose results, around 15 bpe. This seems to suggest that exploiting planarityis the key to obtain a drastic reduction in space. Our results, with at most 6bpe, are in accordance with the 4 m + o ( m ) bits of Theorem 1. Notice that dueto the reordering needed by the other representations, they are not suitable forrepresenting a particular planar embedding.
We test the time to carry out the three basic queries introduced in Section 5: degree , listing and face . Additionally, we test a more complex operation:a depth-first search traveral, dfs , starting from an arbitrary vertex and usinga stack. We solve degree by sequentially traversing the edges, as we have notbuilt the extra data structures to speed up this query. Observe that, given anadjacency list representation, answering degree and listing queries is straight-forward. We measured the time of queries degree and listing
10 times pervertex, face
10 times per edge, and dfs
10 times for 30 random vertices. Table 3 We can get closer to 4 bpe by sparsifying the sublinear-size structures used to querybitvectors and parentheses, thus trading space for query time. degree listing face dfs degree listing face dfs wc 0.01 0.12 0.35 0.51 s s pe5M 0.02 0.14 0.51 1.39 s s pe10M 0.03 0.14 0.60 2.65 s s pe15M 0.03 0.15 0.62 4.51 s s pe20M 0.03 0.15 0.64 5.66 s s pe25M 0.03 0.15 0.64 7.46 s s lim25M 9.31 ms ms ms - 2.04 µs µs µs s Table 3: Median times of degree, listing and face queries, and the DFS traversal. All thevalues are in microseconds ( µs ), except the dfs columns and the lim25M row, which explicitlyindicate µs , ms or s (seconds). shows the median time per query, both for the plain representation and for ourcompact representation. The plain representation answers degree and listing queries 200 and 150 times faster than the compact representation, respectively.This result was expected, since the plain representation we use already hasthe list of neighbours in counter-clockwise order. For the face query and the dfs traversal, the adjacency list representation is only 16 and 26 times faster,respectively.This slowdown is the price of a representation that uses about 13 timesless space, that is, it could hold graphs 13 times larger in main memory. Toillustrate the effect of holding the compressed graph representation in mainmemory versus having to handle it on disk, we replicate the experiments in amachine with artificially limited memory. For these new experiments we use thepe25M dataset, whose plain representation requires 668MB, whereas its compactrepresentation needs only 52MB. The machine was set to use at most 600MBof RAM memory , just slightly less than the necessary to hold the whole inputrepresentation. The results are shown in the last row of Table 3. For degree query, the compact representation is around 4,500 times faster than the plainrepresentation. For the listing query, the difference is around 5,000 times. Forthe face query, the compact representation is around 2,400 times faster thanthe plain representation. We aborted the experiment on dfs for the adjacencylist representation after two hours; a projection of the other results suggeststhat more than a day would have been needed.Thus, the compact representation pays off when it is the key to allow holding The computer tested is a Intel R (cid:13) Core TM i7-7500U CPU, with four physical cores runningat 2.70GHz. The computer runs Linux 4.8.0-53-generic, in 64-bit mode. This machine hasper-core L1 and L2 caches of sizes 64KB and 256KB, respectively, and a shared L3 cache of4MB, with a 8GB DDR4 RAM. To reduce the size of the available physical memory, we setthe mem parameter of the Linux Kernel to mem=600MB . wc pe5M pe10M pe15M pe20M pe25Mseq Table 4: Running times of the parallel construction algorithm in seconds. the graph in main memory.
We now evaluate the performance of our parallel construction. In our im-plementation of the parallel spanning tree algorithm of Bader and Cong [41],to limit the worst case, we included a treshold of O ( m/p ) elements in the stacksize of each thread. Each time a thread has more nodes that the threshold, itcreates a new parallel thread with half of its stack. Additionally, we also return,for each node, the reference to its parent. This yields better performance thanforcing the first edge of each node to lead to its parent.Additionally, we implemented a sequential algorithm called seq , which cor-responds to a sequential DFS algorithm to build the spanning tree, followedby the serialization of the parallel algorithm. To serialize a parallel algorithmin the DyM model, we replaced each parfor keyword for the for keyword anddeleted the spawn and sync keywords. Each data point is the median of 15measurements.Table 4 shows the running times obtained in our experiments, and Figure 3shows the speedups compared with the seq algorithm. On average, the seq algorithm took about 82% of the time obtained by the parallel algorithm runningwith 1 thread. With p ≥
2, the parallel algorithm shows better times thanthe seq algorithm. We observe an almost linear speedup up to p = 24, withan efficiency of at least 40% for the smaller datasets and almost 50% for thebigger ones. With p = 28 the speedup has a slowdown, due to the topologyof our machine. Up to 24 cores, all the threads were running in the same27 umber of threads S peedup ll wcpe5Mpe10Mpe15Mpe20Mpe25M ll l l l l l l l l l l l l l lll l l l l l l l l l l l l l l Figure 3: Speedup of the parallel algorithm.
NUMA node. With p ≥
28, both NUMA nodes are used, which implies highercommunication costs. The communication costs intra NUMA nodes are lowerthan the communication costs inter NUMA nodes [56]. In particular, the case of p = 28 also uses both NUMA nodes, since at least one core on our machine wasavailable to OS processes. For p = 56, the wc dataset exhibits an efficiency ofonly 24%, as it is the smallest one. For the bigger datasets, the lowest efficiencyis 32%.The running times and speedups reported in Table 4 and Figure 3 include theconstruction of bitvectors and balanced parentheses sequences, to support rank,select, parent, and match operations. To measure the efficiency of our algorithm,without the influence of the construction of those additional data structures,we repeated all the construction experiments, excluding the additional datastructures. In the new experiments, we observed that the speedup increases onaverage 2.7% for p ≤
24 and 3.2% for p ≥
28, reaching a maximum speedup of18.8, compared to the values reported in Figure 3.Table 5 shows the running time for different edge densities of the datasetpe25M, and Figure 4 shows the corresponding speedups compared with thealgorithm seq . The different densities are generated by deleting x million edgesfrom the dataset pe25M, with x ∈ { , , , , , } . If several componentsare generated, we reconnect them by restoring one edge between two componentsand then choosing new edges to be deleted. Thus, we report results for 45 to75 ( to ) million edges. The dataset corresponds to the originaldataset pe25M. We observe a decrease in the running time for all values of p ,according to the decrease in the number of edges. With respect to , therest of the datasets show a greater decrease in the running time for increasingvalues of p , reaching speedups of up to 19.5 for . In the case of datasetswith the same number of edges (see columns pe15M and pe20M in Table 4, andcolumns and in Table 5), the datasets with higher number of vertices28 Table 5: Running times of the parallel construction algorithm varying the edge density forthe dataset pe25M . The running times are measured in seconds. show higher running times. Comparing Figures 3 and 4, we observe that ouralgorithm scales similarly for triangulated and non-triangulated graphs.Figure 5 shows the memory consumption of our algorithm. Specifically, thefigure shows for each dataset the space used by its adjacency list representation( inputGraph ), the peak consumption of our construction ( peakMem ) in additionto the input and the output, the space of its plain representation ( plainGraph ),and the size of its compact representation ( compGraph ). The plain representa-tion, consisting of an array of edges of length 2 m and an array of vertices oflength n , is enough to navigate a graph, but for the construction we need moreinformation about the embedding of the input graph. This richer adjacency listrepresentation is what we call inputGraph . To measure the peak consumption,we use malloc count , which monitors the memory allocated and released with malloc and free , respectively, and reports the peak usage. The observed peakconsumption equals the size of the arrays LE , D pos and D edge . Compared withthe space consumption of the input adjacency list representation, our implemen-tation uses 73% of extra space. The final compact representation uses about8% of the plain representation, as we have seen. Timo Bingmann. Malloc count - Tools for runtime memory usage analysis and profiling.URL: https://panthema.net/2013/malloc_count/ . Last accessed: August 08, 2017. umber of threads S peedup ll ll l l l l l l l l l l l l l lll l l l l l l l l l l l l l l Figure 4: Speedup of the parallel algorithm varying the edge density for the dataset pe25M . inputGraph peakMem plainGraph compGraph Datasets M e m o r y c on s u m p t i on ( M B ) wcpe5Mpe10Mpe15Mpe20Mpe25M Figure 5: Memory consumption of the parallel algorithm and the final compact structure.
8. Conclusions and future work
Tur´an’s representation of planar embeddings [4] is much simpler than theknown alternatives and encodes any planar embedding of m edges in just 4 m bits, close to the lower bound of 3 . m bits. In this paper we have shownhow to add o ( m ) bits to this encoding in order to support fast nagivation andqueries of the graph, in constant time for the most fundamental operations.While there are asymptotically optimal representations [3], the simplicity ofTur´an’s encoding enabled us to introduce the first actual implementation ofsuch a compact data structure, where the basic navigation operations are solvedwithin microseconds. Further, the structure can be built at a rate of aboutone microsecond per edge, and the construction can be parallelized with linearspeedup and an efficiency near 50%. Our parallel construction algorithm haslinear work and logarithmic span on the dynamic multithreaded model once aspanning tree of the embedding is computed.30ne intriguing question is about the queries we do not support in constanttime. Some previous representations [7, 9, 3] can compute the degree of anode in O (1) time, whereas we can handle any superconstant time. Similarly,they can answer neighbour queries in O (1) time, whereas our structure needssuperlogarithmic time. The representation closest to ours [9] uses the sametechnique of two types of parentheses, but the arrangement of the parenthesesfollows a so-called orderly spanning tree. While much more complex to buildand unable to represent some embeddings, such spanning tree induces a certainregularity on the representation of the edges leaving each node, which allowsdetermining in constant time the number of such edges, and whether two nodesare connected. It is an interesting question whether we can find a simplerarrangement that retains those properties.Another future research line is how to make our data structure dynamic.We can use a scheme inspired by Munro et al. [57]. Suppose we store our staticdata structure and a dynamic buffer that contains information about edges thathave been added or deleted. If we want to know if an edge is present, wecheck our static data structure and then check the buffer to see if its statushas changed. Once the buffer becomes too large — e.g., more than m/ log (cid:15) m bits — we rebuild our static structure. Even when updates arrive sequentially,there are some issues to consider, such as how to quickly report the neighboursof a node that originally had many edges but has had most of them deleted(perhaps by moving all the information about a node into the buffer when halfits incident edges have been updated) and how to detect if the graph has becomenon-planar. There are more issues when the updates can be made in parallel,since then we may need locks for nodes and finding a practical design becomeschallenging.Finally, we believe we can generalize our data structure to store efficientlygraphs that are almost planar, using for example generalizations of the techniqueof Fischer and Peters [58] to store graphs that are almost trees. Of course, it isNP-hard to find the maximum planar subgraph of an arbitrary graph [59], butthere have been recent advances in approximating it and in practice bridges andtunnels, for example, might already be identified anyway. Acknowledgments
The first author received funding from CORFO 13CEE2-21592 (2013-21592-1-INNOVA PRODUCCION2013-21592-1). The second author received fundingfrom Conicyt Fondecyt grant 3170534. The second, third and fifth authorsreceived travel funding from EU grant H2020-MSCA-RISE-2015 BIRDS GANo. 690941, and funding from Basal Funds FB0001, Conicyt, Chile. The thirdauthor received funding from Academy of Finland grant 268324. The fourthauthor received funding from NSERC of Canada. The fifth author receivedfunding from Millennium Nucleus Information and Coordination in Networks,ICM/FIC RC130003. Early parts of this work were done while the third authorwas at the University of Helsinki and while the third and fifth authors werevisiting the University of A Coru˜na. 31any thanks to J´er´emy Barbay, Luca Castelli Aleardi, Guojing Cong, ArashFarzan, Cecilia Hern´andez, Ian Munro, Pat Nicholson, Romeo Rizzi and JulianShun for fruitful discussions. We thank Susana Ladra and Guy Blelloch forsharing their k -tree and graph separators code with us. We also thank Tele-fonica I+D, in particular, Pablo Garc´ıa, for sharing their computing equipmentwith us. The third author is grateful to the late David Gregory for his courseon graph theory. ReferencesReferences [1] L. Ferres, J. Fuentes, T. Gagie, M. He, G. Navarro, Fast and compactplanar embeddings, in: Proceedings of the 15th International Symposium,Algorithms and Data Structures (WADS), Springer International Publish-ing, 2017, pp. 385–396.[2] W. T. Tutte, A census of planar maps, Canadian Journal of Mathematics15 (1963) 249–271.[3] G. E. Blelloch, A. Farzan, Succinct representations of separable graphs,in: Proceedings of the 21st Annual Conference on Combinatorial PatternMatching (CPM), Springer-Verlag, 2010, pp. 138–150.[4] G. Tur´an, On the succinct representation of graphs, Discrete Applied Math-ematics 8 (3) (1984) 289 – 294.[5] G. Jacobson, Space-efficient static trees and graphs, in: Proceedings of the30th Annual Symposium on Foundations of Computer Science (FOCS),IEEE Computer Society, 1989, pp. 549–554.[6] M. Yannakakis, Embedding planar graphs in four pages, Journal of Com-puter and System Sciences 38 (1) (1989) 36–67.[7] J. I. Munro, V. Raman, Succinct representation of balanced parenthesesand static trees, SIAM Journal on Computing 31 (3) (2001) 762–776.[8] K. Keeler, J. Westbrook, Short encodings of planar graphs and maps, Dis-crete Applied Mathematics 58 (1995) 239–252.[9] Y.-T. Chiang, C.-C. Lin, H.-I. Lu, Orderly spanning trees with applications,SIAM Journal on Computing 34 (2005) 924–945.[10] R. C.-N. Chuang, A. Garg, X. He, M.-Y. Kao, H.-I. Lu, Compact encod-ings of planar graphs via canonical orderings and multiple parentheses, in:Proceedings of the 25th International Colloquium on Automata, Languagesand Programming (ICALP), LNCS 1443, 1998, pp. 118–129.[11] J. Barbay, L. C. Aleardi, M. He, J. I. Munro, Succinct representation oflabeled graphs, Algorithmica 62 (2012) 224–257.3212] W. Schnyder, Embedding planar graphs on the grid, in: Proceedings ofthe 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),Society for Industrial and Applied Mathematics, 1990, pp. 138–148.[13] D. K. Blandford, G. E. Blelloch, I. A. Kash, Compact representations ofseparable graphs, in: Proceedings of the 14th Annual ACM-SIAM Sympo-sium on Discrete Algorithms (SODA), Society for Industrial and AppliedMathematics, 2003, pp. 679–688.[14] R. J. Lipton, R. E. Tarjan, A separator theorem for planar graphs, SIAMJournal on Applied Mathematics 36 (1979) 177–189.[15] N. Bonichon, C. Gavoille, N. Hanusse, D. Poulalhon, G. Schaeffer, Planargraphs, via well-orderly maps and trees, Graphs and Combinatorics 22 (2)(2006) 185–202.[16] X. He, M. Y. Kao, H.-I. Lu, A fast general methodology for information-theoretically optimal encodings of graphs, SIAM Journal on Computing 30(2000) 838–846.[17] L. C. Aleardi, O. Devillers, G. Schaeffer, Succinct representation of trian-gulations with a boundary, in: Proceedings of the 9th International Confer-ence on Algorithms and Data Structures (WADS), Springer-Verlag, 2005,pp. 134–145.[18] L. Castelli Aleardi, O. Devillers, G. Schaeffer, Succinct representations ofplanar maps, Theoretical Computer Science 408 (2-3) (2008) 174–187.[19] E. Fusy, G. Schaeffer, D. Poulalhon, Dissections, orientations, and treeswith applications to optimal mesh encoding and random sampling, ACMTransactions on Algorithms 4 (2) (2008) 19:1–19:48.[20] K. Yamanaka, S.-I. Nakano, A compact encoding of plane triangulationswith efficient query supports, Information Processing Letters 110 (18-19)(2010) 803–809.[21] J. I. Munro, P. K. Nicholson, Compressed representations of graphs, in:Encyclopedia of Algorithms, Springer, 2016, pp. 382–386.[22] G. Navarro, Compact Data Structures: A Practical Approach, CambridgeUniversity Press, 2016.[23] X. He, M.-Y. Kao, Parallel construction of canonical ordering and convexdrawing of triconnected planar graphs, in: Proceedings of the 4th Inter-national Symposium on Algorithms and Computation (ISAAC), 1993, pp.303–312.[24] M. Kao, S. Teng, K. Toyama, An optimal parallel algorithm for planarcycle separators, Algorithmica 14 (1995) 398–408.3325] M. Kao, M. F¨urer, X. He, B. Raghavachari, Optimal parallel algorithms forstraight-line grid embeddings of planar graphs, SIAM Journal on DiscreteMathematics 7 (4) (1994) 632–646.[26] D. R. Clark, Compact PAT trees, Ph.D. thesis, University of Waterloo,Canada (1996).[27] J. I. Munro, Tables, in: Proceedings of the 16th Conference on Foundationsof Software Technology and Theoretical Computer Science (FSTTCS),LNCS 1180, 1996, pp. 37–42.[28] R. Raman, V. Raman, S. R. Satti, Succinct indexable dictionaries with ap-plications to encoding k-ary trees, prefix sums and multisets, ACM Trans-actions on Algorithms 3 (4).[29] R. F. Geary, N. Rahman, R. Raman, V. Raman, A simple optimal repre-sentation for balanced parentheses, Theoretical Computer Science 368 (3)(2006) 231–246.[30] G. Navarro, K. Sadakane, Fully functional static and dynamic succincttrees, ACM Trans. Algorithms 10 (3) (2014) 16:1–16:39.[31] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Multithreaded algo-rithms, in: Introduction to Algorithms, 3rd Edition, The MIT Press, 2009,pp. 772–812.[32] R. D. Blumofe, C. E. Leiserson, Scheduling multithreaded computationsby work stealing, Journal of the ACM 46 (5) (1999) 720–748.[33] N. Biggs, Spanning trees of dual graphs, Journal of Combinatorial Theory,Series B 11 (2) (1971) 127–131.[34] D. Eppstein, Dynamic generators of topologically embedded graphs, in:Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Al-gorithms (SODA), Society for Industrial and Applied Mathematics, 2003,pp. 599–608.[35] T. R. Riley, W. P. Thurston, The absence of efficient dual pairs of spanningtrees in planar graphs, Electronic Journal of Combinatorics 13 (1).[36] P. Ferragina, R. Venturini, A simple storage scheme for strings achievingentropy bounds, Theoretical Computer Science 371 (1) (2007) 115–121.[37] D. Okanohara, K. Sadakane, Practical entropy-compressed rank/select dic-tionary, in: Proceedings of the 9th Workshop on Algorithm Engineeringand Experiments (ALENEX), 2007, pp. 60–70.[38] D. R. Helman, J. J´aJ´a, Prefix computations on symmetric multiprocessors,Journal of Parallel and Distributed Computing 61 (2001) 265–278.3439] J. Labeit, J. Shun, G. E. Blelloch, Parallel lightweight wavelet tree, suffixarray and fm-index construction, Journal of Discrete Algorithms 43 (2017)2–17.[40] J. Fuentes-Sep´ulveda, L. Ferres, M. He, N. Zeh, Parallel construction ofsuccinct trees, Theoretical Computer Science. To appear.[41] D. A. Bader, G. Cong, A fast, parallel spanning tree algorithm for symmet-ric multiprocessors (SMPs), Journal of Parallel and Distributed Computing65 (2005) 994–1006.[42] G. Chapuy, E. Fusy, O. Gim´enez, M. Noy, On the diameter of randomplanar graphs, Combinatorics, Probability & Computing 24 (1) (2015) 145–178.[43] J. Shun, L. Dhulipala, G. Blelloch, A simple and practical linear-workparallel algorithm for connectivity, in: Proceedings of the 26th ACM Sym-posium on Parallelism in Algorithms and Architectures (SPAA), 2014, pp.143–153.[44] R. Cole, Parallel merge sort, SIAM Journal on Computing 17 (4) (1988)770–785.[45] G. Bilardi, A. Nicolau, Adaptive bitonic sorting: An optimal parallel al-gorithm for shared-memory machines, SIAM Journal on Computing 18 (2)(1989) 216–228.[46] G. E. Shannon, A linear-processor algorithm for depth-first search in planargraphs, Information Processing Letters 29 (3) (1988) 119–123.[47] M. Kao, S. Teng, K. Toyama, An optimal parallel algorithm for planarcycle separators, Algorithmica 14 (5) (1995) 398–408.[48] T. Hagerup, Planar depth-first search in O (log n ) parallel time, SIAM Jour-nal on Computing 19 (4) (1990) 678–704.[49] A. Apostolico, G. Drovandi, Graph compression by BFS, Algorithms 2 (3)(2009) 1031–1044.[50] P. Boldi, M. Rosa, M. Santini, S. Vigna, Layered label propagation: Amultiresolution coordinate-free ordering for compressing social networks,in: Proceedings of the 20th International Conference on World Wide Web(WWW), ACM, 2011, pp. 587–596.[51] N. Brisaboa, S. Ladra, G. Navarro, Compact representation of web graphswith extended functionality, Information Systems 39 (1) (2014) 152–174.[52] D. K. Blandford, G. E. Blelloch, I. A. Kash, Compact representations ofseparable graphs, in: Proceedings of the 14th Annual ACM-SIAM Sympo-sium on Discrete Algorithms (SODA), 2003, pp. 679–688.3553] J. Shun, L. Dhulipala, G. E. Blelloch, Smaller and faster: Parallel process-ing of compressed graphs with Ligra+, in: Proceedings of the 25th DataCompression Conference (DCC), 2015, pp. 403–412.[54] P. Boldi, S. Vigna, The webgraph framework I: Compression techniques,in: Proceedings of the 13th International Conference on World Wide Web(WWW), ACM, 2004, pp. 595–602.[55] C. Hern´andez, G. Navarro, Compressed representations for web and socialgraphs, Knowledge and Information Systems 40 (2) (2014) 279–313.[56] U. Drepper, What every programmer should know about memory (2007).URL http://people.redhat.com/drepper/cpumemory.pdfhttp://people.redhat.com/drepper/cpumemory.pdf