Graph Drawing via Gradient Descent, (GD ) 2
Reyan Ahmed, Felice De Luca, Sabin Devkota, Stephen Kobourov, Mingwei Li
GGraph Drawing via Gradient Descent, ( GD ) Reyan Ahmed [0000 − − − , Felice De Luca [0000 − − − , SabinDevkota [0000 − − − , Stephen Kobourov [0000 − − − , MingweiLi [0000 − − − Department of Computer Science, University of Arizona, USA
Abstract.
Readability criteria, such as distance or neighborhood preser-vation, are often used to optimize node-link representations of graphs toenable the comprehension of the underlying data. With few exceptions,graph drawing algorithms typically optimize one such criterion, usuallyat the expense of others. We propose a layout approach, Graph Drawingvia Gradient Descent, ( GD ) , that can handle multiple readability crite-ria. ( GD ) can optimize any criterion that can be described by a smoothfunction. If the criterion cannot be captured by a smooth function, anon-smooth function for the criterion is combined with another smoothfunction, or auto-differentiation tools are used for the optimization. Ourapproach is flexible and can be used to optimize several criteria thathave already been considered earlier (e.g., obtaining ideal edge lengths,stress, neighborhood preservation) as well as other criteria which havenot yet been explicitly optimized in such fashion (e.g., vertex resolution,angular resolution, aspect ratio). We provide quantitative and qualitativeevidence of the effectiveness of ( GD ) with experimental data and a func-tional prototype: http://hdc.cs.arizona.edu/~mwli/graph-drawing/ . Graphs represent relationships between entities and visualization of this infor-mation is relevant in many domains. Several criteria have been proposed to eval-uate the readability of graph drawings, including the number of edge crossings,distance preservation, and neighborhood preservation. Such criteria evaluate dif-ferent aspects of the drawing and different layout algorithms optimize differentcriteria. It is challenging to optimize multiple readability criteria at once andthere are few approaches that can support this. Examples of approaches thatcan handle a small number of related criteria include the stress majorizationframework of Wang et al. [35], which optimizes distance preservation via stressas well as ideal edge length preservation. The Stress Plus X (SPX) frameworkof Devkota et al. [13] can minimize the number of crossings, or maximize theminimum angle of edge crossings. While these frameworks can handle a limitedset of related criteria, it is not clear how to extend them to arbitrary optimiza-tion goals. The reason for this limitation is that these frameworks are dependenton a particular mathematical formulation. For example, the SPX framework was a r X i v : . [ c s . D S ] A ug R. Ahmed, F. De Luca, S. Devkota, S. Kobourov, M. Li designed for crossing minimization, which can be easily modified to handle cross-ing angle maximization (by adding a cosine factor to the optimization function).This “trick” can be applied only to a limited set of criteria but not the majorityof other criteria that are incompatible with the basic formulation.Fig. 1: Three ( GD ) layouts of the dodecahedron: (a) optimizing the number ofcrossings, (b) optimizing uniform edge lengths, and (c) optimizing stress.In this paper, we propose a general approach, Graph Drawing via GradientDescent, ( GD ) , that can optimize a large set of drawing criteria, provided thatthe corresponding metrics that evaluate the criteria are smooth functions. If thefunction is not smooth, ( GD ) either combines it with another smooth functionand partially optimizes based on the desired criterion, or uses modern auto-differentiation tools to optimize. As a result, the proposed ( GD ) frameworkis simple: it only requires a function that captures a desired drawing criterion.To demonstrate the flexibility of the approach, we consider an initial set ofnine criteria: minimizing stress, maximizing vertex resolution, obtaining idealedge lengths, maximizing neighborhood preservation, maximizing crossing an-gle, optimizing total angular resolution, minimizing aspect ratio, optimizing theGabriel graph property, and minimizing edge crossings. A functional prototypeis available on http://hdc.cs.arizona.edu/~mwli/graph-drawing/ . This isan interactive system that allows vertices to be moved manually. Combinationsof criteria can be optimized by selecting a weight for each; see Figure 1. Many criteria associated with the readability of graph drawings have been pro-posed [36]. Most of graph layout algorithms are designed to (explicitly or implic-itly) optimize a single criterion. For instance, a classic layout criterion is stressminimization [25], where stress is defined by (cid:80) i Fig. 2: The ( GD ) framework: Given a graph and a set of criteria (with weights),formulate an objective function based on the selected set of criteria and weights.Then compute the quality (value) of the objective function of the current layoutof the graph. Next, generate the gradient (analytically or automatically). Usingthe gradient information, update the coordinates of the layout. Finally, updatethe objective function based on the layout via regular or stochastic gradientdescent. This process is repeated for a fixed number of iterations.nodes are related by inequalities in the form of x i ≥ x j + gap for a node pair( i, j ). These kinds of constraints are known as hard constraints and are differentfrom the soft constrains in our ( GD ) framework. GD ) Framework The ( GD ) framework is a general optimization approach to generate a layoutwith any desired set of aesthetic metrics, provided that they can be expressed bya smooth function. The basic principles underlying this framework are simple.The first step is to select a set of layout readability criteria and a loss functionsthat measures them. Then we define the function to optimize as a linear combi-nation of the loss functions for each individual criterion. Finally, we iterate thegradient descent steps, from which we obtain a slightly better drawing at eachiteration. Figure 2 depicts the framework of ( GD ) : Given any graph G and read-ability criterion Q , we find a loss function L Q,G which maps from the currentlayout X (i.e. a n × L Q,G of X where a lower value is always desirable, at each iteration, a slightly better layoutcan be found by taking a small ( (cid:15) ) step along the (negative) gradient direction: X ( new ) = X − (cid:15) · ∇ X L Q,G .To optimize multiple quality measures simultaneously, we take a weightedsum of their loss functions and update the layout by the gradient of the sum. raph Drawing via Gradient Descent, ( GD ) There are different kinds of gradient descent algorithms. The standard methodconsiders all vertices, computes the gradient of the objective function, and up-dates vertex coordinates based on the gradient. For some objectives, we needto consider all the vertices in every step. For example, the basic stress formu-lation [25] falls in this category. On the other hand, there are some problemswhere the objective can be optimized only using a subset of vertices. For exam-ple, consider stress minimization again. If we select a set of vertices randomlyand minimize the stress of the induced graph, the stress of the whole graph isalso minimized [37]. This type of gradient descent is called stochastic gradientdescent. However, not all objective functions are smooth and we cannot computethe gradient of a non-smooth function. In that scenario, we can compute the sub-gradient, and update the objective based on the subgradient. Hence, as long asthe function is continuously defined on a connected component in the domain,we can apply the subgradient descent algorithm. In table 3, we give a list of lossfunctions we used to optimize 9 graph drawing properties with gradient descentvariants. In section 4, we specify the loss functions we used in detail.When a function is not defined in a connected domain, we can introduce asurrogate loss function to ‘connect the pieces’. For example, when optimizingneighborhood preservation we maximize the Jaccard similarity between graphneighbors and nearest neighbors in graph layout. However, Jaccard similarityis only defined between two binary vectors. To solve this problem we extendJaccard similarity to all real vectors by its Lov´asz extension [5] and apply that tooptimize neighborhood preservation. An essential part of gradient descent basedalgorithms is to compute the gradient/subgradient of the objective function. Inpractice, it is always not necessary to write down the gradient analytically as itcan be computed automatically via automatic differentiation [22]. Deep learningpackages such as Tensorflow [1] and PyTorch [28] apply automatic differentiationto compute the gradient of complicated functions.When optimizing multiple criteria simultaneously, we combine them via aweighted sum. However, choosing a proper weight for each criterion can be tricky.Consider, for example, maximizing crossing angles and minimize stress simulta-neously with a fixed pair of weights. At the very early stage, the initial drawingmay have many crossings and stress minimization often removes most of theearly crossings. As a result, maximizing crossing angles in the early stages canbe harmful as it move nodes in directions that contradict those that come fromstress minimization. Therefore, a well-tailored weight scheduling is needed for asuccessful outcome. Continuing with the same example, a better outcome can beachieved by first optimizing stress until it converges, and later adding weightsfor the crossing angle maximization. To explore different ways of scheduling, weprovide an interface that allows manual tuning of the weights. We implemented the ( GD ) framework in JavaScript. In particular we usedthe automatic differentiation tools in tensorflow.js [34] and the drawing library R. Ahmed, F. De Luca, S. Devkota, S. Kobourov, M. Li d3.js [6]. The prototype is available at http://hdc.cs.arizona.edu/~mwli/graph-drawing/ . In this section we specify the aesthetic goals, definitions, quality measures andloss functions for each of the 9 graph drawing properties we optimized: stress,vertex resolution, edge uniformity, neighborhood preservation, crossing angle,aspect ratio, total angular resolution, Gabriel graph property, and crossing num-ber. In the following discussion, since only one (arbitrary) graph is considered,we omit the subscript G in our definitions of loss function L Q,G and write L Q for short. Other standard graph notation is summarized in Table 1. Notation Description G Graph V The set of nodes in G , indexed by i , j or kE The set of edges in G , indexed by a pair of nodes ( i, j ) in Vn = | V | Number of nodes in G | E | Number of edges in GAdj n × n and A i,j Adjacency matrix of G and its ( i, j )-th entry D n × n and d ij Graph-theoretic distances between pairs of nodes and the ( i, j )-th entry X n × || X i − X j || The Euclidean distance between nodes i and j in the drawing θ i i th crossing angle ϕ ijk Angle between incident edges ( i, j ) and ( j, k ) Table 1: Graph notation used in this paper. We use stress minimization to draw a graph such that the Euclidean distance be-tween pairs of nodes is proportional to their graph theoretic distance. Followingthe ordinary definition of stress [25], we minimize L ST = (cid:88) i K, Adj ) (3)where LHL is given by Berman et al. [5], ˆ K denotes the k -nearest neighborprediction: ˆ K i,j = (cid:40) − ( || X i − X j || − d i,πk + d i,πk +1 ) if i (cid:54) = j i = j (4)where d i,π k is the Euclidean distance between node i and its k th nearest neighborand Adj denotes the adjacency matrix. Note that ˆ K i,j is positive if j is a k-NNof i , otherwise it is negative, as is required by LHL [5]. R. Ahmed, F. De Luca, S. Devkota, S. Kobourov, M. Li Reducing the number of edge crossings is one of the classic optimization goals ingraph drawing, known to affect readability [29]. Following Shabbeer et al. [32],we employ an expectation-maximization (EM)-like algorithm to minimize thenumber of crossings. Two edges do not cross if and only if there exists a linethat separates their extreme points. With this in mind, we want to separateevery pair of edges (the M step) and use the decision boundaries to guide themovement of nodes in the drawing (the E step). Formally, given any two edges e = ( i, j ) , e = ( k, l ) that do not share any nodes (i.e., i , j , k and l are alldistinct), they do not intersect in a drawing (where nodes are drawn at X i =( x i , y i ), a row vector) if and only if there exists a decision boundary w = w ( e ,e ) (a 2-by-1 column vector) together with a bias b = b ( e ,e ) (a scalar) such that: L CN, ( e ,e ) = (cid:80) α = i,j,k or l ReLU (1 − t α · ( X α w + b )) = 0.Here we use ( e , e ) to denote the subgraph of G which only has two edges e and e , t i = t j = 1 and t k = t l = − 1. The loss reaches its minimum at 0 whenthe SVM classifier f w,b : x (cid:55)→ xw + b predicts node i and j to be greater than 1and node k and l to be less than − 1. The total loss for the crossing number istherefore the sum over all possible pairs of edges. Similar to (soft) margin SVM,we add a term | w ( e ,e ) | to maximize the margin of the decision boundary: L CN = (cid:80) e =( i,j ) , e =( k,l ) ∈ Ei , j , k and l all distinct L CN, ( e ,e ) + | w ( e ,e ) | . For the E and M steps, weused the same loss function L CN to update the boundaries w ( e ,e ) , b ( e ,e ) andnode positions X : w ( new ) = w − (cid:15) ∇ w L CN (M step 1) b ( new ) = b − (cid:15) ∇ b L CN (M step 2) X ( new ) = X − (cid:15) ∇ X L CN ( X ; w ( new ) , b ( new ) ) (E step)To evaluate the quality we simply count the number of crossings. When edge crossings are unavoidable, the graph drawing can still be easier toread when edges cross at angles close to 90 degrees [36]. Heuristics such as thoseby Demel et al. [11] and Bekos et al. [4] have been proposed and have beensuccessful in graph drawing challenges [12]. We use an approach similar to theforce-directed algorithm given by Eades et al. [19] and minimize the squaredcosine of crossing angles: L CAM = (cid:80) all crossed edge pairs( i,j ) , ( k,l ) ∈ E ( (cid:104) X i − X j ,X k − X l (cid:105)| X i − X j |·| X k − X l | ) . Weevaluate quality by measuring the worst (normalized) absolute discrepancy be-tween each crossing angle θ and the target crossing angle (i.e. 90 degrees): Q CAM = max θ | θ − π | / π . raph Drawing via Gradient Descent, ( GD ) Good use of drawing area is often measured by the aspect ratio [15] of thebounding box of the drawing, with 1 : 1 as the optimum. We consider multiplerotations of the current drawing and optimize their bounding boxes simultane-ously. Let AR = min θ min( w θ ,h θ )max( w θ ,h θ ) , where w θ and h θ denote the width and heightof the bounding box when the drawing is rotated by θ degrees. A naive approachto optimize aspect ratio, which scales the x and y coordinates of the drawing bycertain factors, may worsen other criteria we wish to optimize and is thereforenot suitable for our purposes. To make aspect ratio differentiable and compatiblewith other objectives, we approximate aspect ratio based on 4 (soft) boundaries(top, bottom, left and right) of the drawing. Next, we turn this approximationand the target (1 : 1) into a loss function using cross entropy loss. We minimize L AR = (cid:88) θ ∈{ πkN , for k =0 , ··· ( N − } crossEntropy ([ w θ w θ + h θ , h θ w θ + h θ ] , [0 . , . N is the number of rotations sampled (e.g., N = 7), and w θ , h θ are the(approximate) width and height of the bounding box when rotating the drawingaround its center by an angle θ . For any given θ -rotated drawing, w θ is definedto be the difference between the current (soft) right and left boundaries, w θ =right − left = (cid:104) softmax( x θ ) , x θ (cid:105) − (cid:104) softmax( − x θ ) , x θ (cid:105) , where x θ is a collectionof the x coordinates of all nodes in the θ -rotated drawing, and softmax returns avector of weights ( . . . w k , . . . ) given by softmax( x ) = ( . . . w k , . . . ) = e xk (cid:80) i e xi . Notethat the approximate right boundary is a weighted sum of the x coordinatesof all nodes and it is designed to be close to the x coordinate of the right-most node, while keeping other nodes involved. Optimizing aspect ratio withthe softened boundaries will stretch all nodes instead of moving the extremepoints. Similarly, h θ = top − bottom = (cid:104) softmax( y θ ) , y θ (cid:105) − (cid:104) softmax( − y θ ) , y θ (cid:105) Finally, we evaluate the drawing quality by measuring the worst aspect ratioon a finite set of rotations. The quality score ranges from 0 to 1 (where 1 isoptimal): Q AR = min θ ∈{ πkN , for k =0 , ··· ( N − } min( w θ ,h θ )max( w θ ,h θ ) Distributing edges adjacent to a node makes it easier to perceive the informa-tion presented in a node-link diagram [24]. Angular resolution [3], defined as theminimum angle between incident edges, is one way to quantify this goal. For-mally, AN R = min j ∈ V min ( i,j ) , ( j,k ) ∈ E ϕ ijk , where ϕ ijk is the angle formed bybetween edges ( i, j ) and ( j, k ). Note that for any given graph, an upper boundof this quantity is πd max where d max is the maximum degree of nodes in thegraph. Therefore in the evaluation, we will use this upper bound to normalizeour quality measure to [0 , Q ANR = ANR π/d max . To achieve a better drawing quality via gradient descent, we define the angular energy of an angle ϕ to be e − s · ϕ , where s is a constant controlling the sensitivity of angular energy withrespect to the angle (by default s = 1), and minimize the total angular energyover all incident edges: L ANR = (cid:88) ( i,j ) , ( j,k ) ∈ E e − s · ϕ ijk (6) Good vertex resolution is associated with the ability to distinguish differentvertices by preventing nodes from occluding each other. Vertex resolution istypically defined as the minimum Euclidean distance between two vertices inthe drawing [9, 31]. However, in order to align with the units in other objectivessuch as stress, we normalize the minimum Euclidean distance with respect to areference value. Hence we define the vertex resolution to be the ratio betweenthe shortest and longest distances between pairs of nodes in the drawing, V R = min i (cid:54) = j || X i − X j || d max , where d max = max k,l || X k − X l || . To achieve a certain targetresolution r ∈ [0 , 1] by minimizing a loss function, we minimize L V R = (cid:88) i,j ∈ V,i (cid:54) = j ReLU (1 − || X i − X j || r · d max ) (7)In practice, we set the target resolution to be r = √ | V | , where | V | is the numberof vertices in the graph. In this way, an optimal drawing will distribute nodesuniformly in the drawing area. The purpose of the ReLU is to output zero whenthe argument is negative, as when the argument is negative the constraint isalready satisfied. In the evaluation, we report, as a quality measure, the ratiobetween the actual and target resolution and cap its value between 0 (worst)and 1 (best). Q V R = min(1 . , min i,j || X i − X j || r · d max ) (8) A graph is a Gabriel graph if it can be drawn in such a way that any diskformed by using an edge in the graph as its diameter contains no other nodes.Not all graphs are Gabriel graphs, but drawing a graph so that as many ofthese edge-based disks are empty of other nodes has been associated with goodreadability [18]. This property can be enforced by a repulsive force around themidpoints of edges. Formally, we establish a repulsive field with radius r ij equalto half of the edge length, around the midpoint c ij of each edge ( i, j ) ∈ E , andwe minimize the total potential energy: raph Drawing via Gradient Descent, ( GD ) L GA = (cid:88) ( i,j ) ∈ E,k ∈ V \{ i,j } ReLU ( r ij − | X k − c ij | ) (9)where c ij = X i + X j and r ij = | X i − X j | . We use the (normalized) minimum dis-tance from nodes to centers to characterize the quality of a drawing with respectto Gabriel graph property: Q GA = min ( i,j ) ∈ E,k ∈ V | X k − c ij | r ij . In this section, we describe the experiment we conducted on 10 graphs to assessthe effectiveness and limitations of our approach. The graphs used are depictedin Figure 3 along with information about each graph. The graphs have beenchosen to represent a variety of graph classes such as trees, cycles, grids, bipartitegraphs, cubic graphs, and symmetric graphs.In our experiment we compare ( GD ) with neato [20] and sfdp [20], whichare classical implementations of a stress-minimization layout and scalable force-directed layout. In particular, we focus on 9 readability criteria: stress ( ST ), ver-tex resolution ( VR ), ideal edge lengths ( IL ), neighbor preservation ( NP ), crossingangle ( CA ), angular resolution ( ANR ), aspect ratio ( AR ), Gabriel graph properties( GG ), and crossings ( CR ). We provide the values of the nine criteria correspondingto the 10 graphs for the layouts computed by by neato, sfdp, random, and 3 runsof ( GD ) initialized with neato, sfdp, and random layouts in Table 2. The bestresult is shown with bold font, green cells indicate improvement, yellow cells rep-resent ties, with respect to the initial values (scores for different criteria obtainedusing neato, sfdp, and random initialization). From the experimental results wesee that ( GD ) improves the random layout in 90% of the tests. ( GD ) alsoimproves or ties initial layouts from neato and sfdp, but the improvements arenot as strong or as frequent, most notably for the CR , NP , and CA criteria.In this experiment, we focused on optimizing a single metric. In some applica-tions, it is desirable to optimize multiple criteria. We can use a similar techniquei.e., take a weighted sum of the metrics and optimize the sum of scores. In theprototype ( http://hdc.cs.arizona.edu/~mwli/graph-drawing/ ), there is aslider for each criterion, making it possible to combine different criteria. Although ( GD ) is a flexible framework that can optimize a wide range of crite-ria, it cannot handle the class of constraints where the node coordinates are re-lated by some inequalities, i.e., the framework does not support hard constraints.Similarly, this framework does not naturally support shape-based drawing con-straints such as those in [16, 17, 35]. ( GD ) takes under a minute for the smallgraphs considered in this paper. We have not experimented with larger graphsas the implementation has not been optimized for speed. Transpose Export short long cycle, |V|=10, |E|=10bipartite, |V|=10, |E|=25cube, |V|=8, |E|=12symmetric, |V|=20, |E|=21block, |V|=25, |E|=55dodecahedron, |V|=20, |E|=30tree, |V|=15, |E|=14grid, |V|=25, |E|=40spx_teaser, |V|=128, |E|=256complete, |V|=20, |E|=190 graph random neato sfdp GD2_ST GD2_AR GD2_CAM GD2_ANR Fig. 3: Drawings from different algorithms: neato, sfdp and ( GD ) with stress( ST ), aspect ratio ( AR ), crossing angle maximization ( CAM ) and angular resolu-tion ( ANR ) optimization on a set of 10 graphs. Edge color is determined by thediscrepancy between actual and ideal edge length (here all ideal edge lengths are1); informally, short edges are red and long edges are blue. We introduced the graph drawing framework ( GD ) and showed how this ap-proach can be used to optimize different graph drawing criteria and combinationsthereof. The framework is flexible and natural directions for future work includeadding further drawing criteria and better ways to combine them. To computethe layout of large graphs, a multi-level algorithmic model might be needed.It would also be useful to have a way to compute appropriate weights for thedifferent criteria. raph Drawing via Gradient Descent, ( GD ) Acknowledgments This work was supported in part by NSF grants CCF-1740858, CCF-1712119,and DMS-1839274. Crossingsneato sdfp rnd ( GD )2 n ( GD )2 s ( GD )2 r dodec. tree block 23.0 symme. 1.0 bipar. grid spx t. 73.0 GD )2 n ( GD )2 s ( GD )2 r dodec. 0.14 0.15 0.53 0.1 0.15 cycle tree cube 0.08 0.12 0.29 0.03 bipar. 0.31 0.26 0.44 0.16 0.13 grid 0.01 0.09 0.41 GD )2 n ( GD )2 s ( GD )2 r dodec. 21.4 17.58 111.05 tree compl. 33.54 31.58 37.87 31.53 31.49 cube 2.75 2.71 11.69 2.66 2.69 symme. 9.88 5.38 180.48 9.88 spx t. 674.8 418.4 9794 GD )2 n ( GD )2 s ( GD )2 r dodec. 0.39 0.39 0.01 cycle tree 0.61 0.56 0.04 0.78 0.83 block 0.05 0.01 0.0 bipar. 0.01 0.03 0.01 0.02 0.04 grid 0.52 GD )2 n ( GD )2 s ( GD )2 r dodec. 0.32 0.3 0.1 cycle tree block 0.57 0.93 0.12 0.83 0.93 compl. cube symme. 0.75 0.95 0.05 0.75 bipar. spx t. 0.36 0.44 0.03 0.49 0.46 Gabriel graph propertyneato sdfp rnd ( GD )2 n ( GD )2 s ( GD )2 r dodec. 0.16 tree block 0.16 0.03 0.04 0.57 0.14 compl. 0.0 0.01 0.02 0.04 0.01 cube 0.43 0.51 0.01 0.75 bipar. 0.08 0.11 0.25 0.48 0.64 grid spx t. 0.04 0.0 0.02 0.06 Vertex resolutionneato sdfp rnd ( GD )2 n ( GD )2 s ( GD )2 r dodec. 0.52 0.54 0.07 0.7 tree 0.68 0.57 0.23 bipar. 0.83 spx t. 0.47 GD )2 n ( GD )2 s ( GD )2 r dodec. 0.92 0.91 0.88 cycle tree 0.73 0.67 block 0.9 0.74 0.7 compl. 0.89 0.97 0.91 cube 0.76 0.79 0.57 0.87 0.79 symme. 0.58 0.67 bipar. 0.82 0.9 grid spx t. 0.98 0.86 0.88 Crossing angleneato sdfp rnd ( GD )2 n ( GD )2 s ( GD )2 r dodec. tree block 0.11 0.1 0.24 cube bipar. spx t. 0.16 0.22 0.25 0.16 Table 2: The values of the nine criteria corresponding to the 10 graphs for thelayouts computed by neato, sfdp, random, and 3 runs of ( GD ) initialized withneato, sfdp, and random layouts. Bold values are the best. Green cells show animprovement, yellow cells show a tie, with respect to the initial values. References 1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machinelearning. In: 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI’16). pp. 265–283 (2016)2. ´Abrego, B.M., Fern´andez-Merchant, S., Salazar, G.: The rectilinear crossing num-ber of k n : Closing in (or are we?). Thirty Essays on Geometric Graph Theory(2012)3. Argyriou, E.N., Bekos, M.A., Symvonis, A.: Maximizing the total resolution ofgraphs. In: Proceedings of the 18th International Conference on Graph Drawing.pp. 62–67. Springer (2011)4. Bekos, M.A., F¨orster, H., Geckeler, C., Holl¨ander, L., Kaufmann, M., Spallek,A.M., Splett, J.: A heuristic approach towards drawings of graphs with high cross-ing resolution. In: Proceedings of the 26th International Symposium on GraphDrawing and Network Visualization. pp. 271–285. Springer (2018)5. Berman, M., Rannen Triki, A., Blaschko, M.B.: The lov´asz-softmax loss: a tractablesurrogate for the optimization of the intersection-over-union measure in neuralnetworks. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. pp. 4413–4421 (2018)6. Bostock, M., Ogievetsky, V., Heer, J.: D3: Data-driven documents. IEEE transac-tions on visualization and computer graphics (12), 2301–2309 (2011)7. Buchheim, C., Chimani, M., Gutwenger, C., J¨unger, M., Mutzel, P.: Crossings andplanarization. Handbook of Graph Drawing and Visualization pp. 43–85 (2013)8. Chen, K.T., Dwyer, T., Marriott, K., Bach, B.: Doughnets: Visualising networksusing torus wrapping. In: Proceedings of the 2020 CHI Conference on HumanFactors in Computing Systems. pp. 1–11 (2020)9. Chrobak, M., Goodrich, M.T., Tamassia, R.: Convex drawings of graphs in two andthree dimensions. In: Proceedings of the 12th annual symposium on Computationalgeometry. pp. 319–328 (1996)10. Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACMTransactions on Graphics (TOG) (4), 301–331 (1996)11. Demel, A., D¨urrschnabel, D., Mchedlidze, T., Radermacher, M., Wulf, L.: A greedyheuristic for crossing-angle maximization. In: Proceedings of the 26th InternationalSymposium on Graph Drawing and Network Visualization. pp. 286–299. Springer(2018)12. Devanny, W., Kindermann, P., L¨offler, M., Rutter, I.: Graph drawing contest re-port. In: Proceedings of the 25th International Symposium on Graph Drawing andNetwork Visualization. pp. 575–582. Springer (2017)13. Devkota, S., Ahmed, R., De Luca, F., Isaacs, K.E., Kobourov, S.: Stress-plus-x(spx) graph layout. In: Proceedings of the 27th International Symposium on GraphDrawing and Network Visualization. pp. 291–304. Springer (2019)14. Didimo, W., Liotta, G.: The crossing-angle resolution in graph drawing. ThirtyEssays on Geometric Graph Theory (2014)15. Duncan, C.A., Goodrich, M.T., Kobourov, S.G.: Balanced aspect ratio trees andtheir use for drawing very large graphs. In: Proceedings of the 6th InternationalSymposium on Graph Drawing. pp. 111–124. Springer (1998)16. Dwyer, T.: Scalable, versatile and simple constrained graph layout. Comput.Graph. Forum , 991–998 (2009)raph Drawing via Gradient Descent, ( GD ) , 821–8 (2006)18. Eades, P., Hong, S.H., Klein, K., Nguyen, A.: Shape-based quality metrics for largegraph visualization. In: Proceedings of the 23rd International Conference on GraphDrawing and Network Visualization. pp. 502–514. Springer (2015)19. Eades, P., Huang, W., Hong, S.H.: A force-directed method for large crossing anglegraph drawing. arXiv preprint arXiv:1012.4559 (2010)20. Ellson, J., Gansner, E., Koutsofios, L., North, S.C., Woodhull, G.: Graphvizopensource graph drawing tools. In: Proceedings of the 9th International Symposiumon Graph Drawing. pp. 483–484. Springer (2001)21. Gansner, E.R., Koren, Y., North, S.: Graph drawing by stress majorization. In:International Symposium on Graph Drawing. pp. 239–250. Springer (2004)22. Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques ofalgorithmic differentiation, vol. 105. SIAM (2008)23. Huang, W., Eades, P., Hong, S.H.: Larger crossing angles make graphs easier toread. Journal of Visual Languages & Computing (4), 452–465 (2014)24. Huang, W., Eades, P., Hong, S.H., Lin, C.C.: Improving multiple aesthetics pro-duces better graph drawings. Journal of Visual Languages & Computing (4),262 – 272 (2013)25. Kamada, T., Kawai, S.: An algorithm for drawing general undirected graphs. In-formation Processing Letters (1), 7 – 15 (1989)26. Kruiger, J.F., Rauber, P.E., Martins, R.M., Kerren, A., Kobourov, S., Telea, A.C.:Graph layouts by t-sne. Comput. Graph. Forum (3), 283–294 (2017)27. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika (1), 1–27 (1964)28. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information ProcessingSystems. pp. 8024–8035 (2019)29. Purchase, H.: Which aesthetic has the greatest effect on human understanding? In:Proceedings of the 5th International Symposium on Graph Drawing. pp. 248–261.Springer (1997)30. Radermacher, M., Reichard, K., Rutter, I., Wagner, D.: A geometric heuristic forrectilinear crossing minimization. In: The 20th Workshop on Algorithm Engineer-ing and Experiments. p. 129138 (2018)31. Schulz, A.: Drawing 3-polytopes with good vertex resolution. J. Graph AlgorithmsAppl. (1), 33–52 (2011)32. Shabbeer, A., Ozcaglar, C., Gonzalez, M., Bennett, K.P.: Optimal embedding ofheterogeneous graph data with edge crossing constraints. In: NIPS Workshop onChallenges of Data Visualization (2010)33. Shepard, R.N.: The analysis of proximities: multidimensional scaling with an un-known distance function. Psychometrika (2), 125–140 (1962)34. Smilkov, D., Thorat, N., Assogba, Y., Nicholson, C., Kreeger, N., Yu, P., Cai,S., Nielsen, E., Soegel, D., Bileschi, S., Terry, M., Yuan, A., Zhang, K., Gupta,S., Sirajuddin, S., Sculley, D., Monga, R., Corrado, G., Viegas, F., Wattenberg,M.M.: Tensorflow.js: Machine learning for the web and beyond. In: Proceedings ofMachine Learning and Systems 2019, pp. 309–321 (2019)35. Wang, Y., Wang, Y., Sun, Y., Zhu, L., Lu, K., Fu, C.W., Sedlmair, M., Deussen,O., Chen, B.: Revisiting stress majorization as a unified framework for interactive6 R. Ahmed, F. De Luca, S. Devkota, S. Kobourov, M. Liconstrained graph visualization. IEEE transactions on visualization and computergraphics (1), 489–499 (2017)36. Ware, C., Purchase, H., Colpoys, L., McGill, M.: Cognitive measurements of graphaesthetics. Information visualization (2), 103–110 (2002)37. Zheng, J.X., Pawar, S., Goodman, D.F.: Graph drawing by stochastic gradientdescent. IEEE transactions on visualization and computer graphics (9), 2738–2748 (2018)raph Drawing via Gradient Descent, ( GD ) The following table summarizes the objective functions used to optimize the ninedrawing criteria via different optimization methods. Property Gradient Descent Subgradient Descent Stochastic Gradient DescentStress (cid:80) i Adj Lov´asz hinge [5] betweenneighborhood prediction(Eq.4) and adjacencymatrix Adj