Structural Analysis of Sparse Neural Networks
SStructural Analysis of Sparse Neural Networks
Julian Stier and Michael Granitzer
University of Passau, Innstraße 41, 94032 Passau [email protected]
Abstract.
Sparse Neural Networks regained attention due to their po-tential of mathematical and computational advantages. We give motiva-tion to study Artificial Neural Networks (ANNs) from a network scienceperspective, provide a technique to embed arbitrary Directed AcyclicGraphs into ANNs and report study results on predicting the perfor-mance of image classifiers based on the structural properties of the net-works’ underlying graph. Results could further progress neuroevolutionand add explanations for the success of distinct architectures from astructural perspective.
Keywords: artificial neural networks, sparse network structures, small-world neural networks, scale-free, architecture performance estimation
Artificial Neural Networks (ANNs) are highly successful machine learning modelsand achieve human performance in various domains such as image processingand speech synthesis. The choice on architecture is crucial for their success –but ANN architectures, seen as network structures, have not been extensivelystudied from a network science perspective, yet. The motivation of studyingANNs from a network science perspective is manifold:First of all, ANNs are inspired from biological neural networks (notably thehuman brain) which have scale-free properties and “are shown to be small-worldnetworks” [25]. This is contradictory to most successful models in various ma-chine learning problem domains. However, with respect to their required neurons,those successful models have shown to be redundant by a magnitude. For exam-ple, Han et al. claim “on the ImageNet dataset, our method reduced the numberof parameters of AlexNet by a factor of 9 × ” and “VGG-16 can be reduced by13 × ” [8] through pruning. Thus, biology and structural searches such as e.g.pruning suggest to develop sparse neural network structures [24].Secondly, a look on the history of ANNs shows that, after major break-throughs in the early 90s, the research community already tried to find char-acteristic networks structures by e.g. constructive and destructive approaches.However, network structures have only been studied in graph theory by then andthe major remaining architectures from this research period are notably Convo-lutional Neural Networks (CNNs) [16] and Long Short-Term Memory Networks(LSTMs) [12]. New breakthroughs in the early 21st century led to recent trends a r X i v : . [ c s . N E ] O c t J. Stier et al. of new architectures such as Highway Networks [23] and Residual Networks [9].This suggests that further insights from revisiting and analysing the topologyof sparse neural networks could be gained, particularly from a network scienceperspective as done in Mocanu et al. [19], which also “argue that ANNs, too,should not have fully-connected layers”.Last but not least, many researchers reported various advantages of sparsestructures over unjustified densely stacked layers. Glorot et al. argued, that spar-sity is one of the factors for the success of Rectified Linear Units (ReLU) asactivation functions because “using a rectifying non-linearity gives rise to realzeros of activations and thus truly sparse representations” [7]. The optimizationprocess with ReLUs might not only find good representations but also betterstructures through sparser connections. Sparser connections can also decreasethe number of computations and thus time and energy consumption. Not onlyexists biological motivation but also mathematical and computational advan-tages have been reported for sparse neural network structures.Looking at the structural design and training of ANNs as a search problem,one can argue to apply structural regularisation when postulating distinctstructural properties. CNNs exploit sparsely connected neurons due to spatial re-lationships of the input features, which leads in conjunction with weight-sharingto very successful models. LSTMs overcome analytical issues in training by pos-tulating structures with slightly changed training behaviour. We argue, thatthere can be new insights gained from studying ANNs from a network scienceperspective.Sparse Neural Networks (SNNs), which we define as networks not being fullyconnecteted between layers, form another important, yet not well-understoodstructural regularisation. We mention three major approaches to study the influ-ence of structural regularisation to properties of SNNs: 1) studying fixed sparsestructures with empirical success or selected analytical advantages, 2) studyingsparse structures obtained in search heuristics and 3) studying sparse structureswith common graph theoretical properties.The first approach studies selected models with fixed structure analyticallyor within empirical works. Exemplarily, LSTMs arose from analytical work [11]on issues in the domain of time-dependent problems and have also been studiedempirically since then. In larger network structures, LSTMs and CNNs providefixed components which have been proven to be successful due to analyticalresearch and much experience in empirical work. These insights provide founda-tions to construct larger networks in a bottom-up fashion.On larger scale, sparse network structures can be approached by automaticconstruction, pruning, evolutionary techniques or even as a result of trainingregularisation or the choice of activation function. Despite the common motiva-tion to find explanations for resulting sparse structures, a lot of those approachesindeed obtain sparse structures but fail to find an explanation for them.Studying real-world networks has been addressed by the field of NetworkScience. However, as of now, Network Science has been hardly used to study tructural Analysis of Sparse Neural Networks 3 the structural properties of ANNs and to understand the regularisation effectsintroduced by sparse network structures.This article reports results on embedding sparse graph structures with charac-teristic properties into feed-forward networks and gives first insights into ourstudy on network properties of sparse ANNs.
Our Contributions comprise – a technique to embed Directed Acyclic Graphs into Artificial Neural Net-works (ANN) – a comparison of Random Graph Generators as generators for structures ofANNs – a performance estimator for ANNs based on structural properties of thenetworks’ underlying graph Related Work
A lot of related works in fields such as network science, graph theory, andneuroevolution give motivation to study Sparse Structured Neural Networks(SNNs). From a network science perspective, SNNs are Directed Acyclic Graphs(DAGs) in case of non-recurrent and Directed Graphs in case of Recurrent NeuralNetworks. Directed acyclic and cyclic graphs are built, studied and characterizedin graph theory and network science. Notably, already Erd¨os “aimed to show [..]that the evolution of a random graph shows very clear-cut features” [6]. Graphswith distinct degree distributions have been characterised as small-world and scale-free networks with the works of Watts & Strogatz [25] and Barab´asi & Al-bert [1]. Both phenomens are “not merely a curiosity of social networks nor anartefact of an idealized model – it is probably generic for many large, sparse net-works found in nature” [25]. However, from a biological perspective, the questionof how the human brain is organised, still remains open: According to Hilgetaget al. “a reasonable guess is that the large-scale neuronal networks of the brainare arranged as globally sparse hierarchical modular networks” [10].Concerning the combination of
ANNs and Network Science
Mocanu etal. claim that “ANNs perform perfectly well with sparsely-connected layers” andintroduce a training procedure
SET , which induces sparsity by magnitude-basedpruning and randomly adding connections within the training phase [19]. Whiletheir method is driven by the same motivation and inspiration, it is conceptu-ally fundamentally different from our idea of finding characteristic properties ofSNNs.The work of Bourely et al. can be considered very close to our idea of creatingSNNs before the training phase. They “propose Sparse Neural Network archi-tectures that are based on random or structured bipartite graph topologies” [4]but do not seem to base their construction on existing work in Network Sciencewhen transforming their randomly generated structures into ANNs.There exists a vast amount of articles on automatic methods which yield spar-sity. Besides well-established regularisation methods (e.g. L -regularisation),new structural regularisation methods can be found: Srinivas et al. “introduce J. Stier et al. additional gate variables to perform parameter selection” [22] and Louizos et al.“propose a practical method for L norm regularisation” in which they “prunethe network during training by encouraging weights to become exactly zero”[18]. Besides regularisation one can also achieve SNNs through pruning, con-struction and evolutionary strategies. All of those domains achieve successes tosome extent but seem to fail in providing explanations for why certain foundarchitectures succeed and others do not.The idea of predicting the model performance can already be found in asimilar way in the works of Klein et al., Domhan et al. and Baker et al.. Klein etal. exploit “information in automatic hyperparameter optimization by means of aprobabilistic model of learning curves across hyperparameter settings” [15]. Theycompare the idea with a human expert assessing the course of the learning curveof a model. Domhan et al. also “mimic the early termination of bad runs usinga probabilistic model that extrapolates the performance from the first part of alearning curve” [5]. Both are concerned with predicting the performance basedon learning curve, not on structural network properties.Baker et al. are closest to our method by “predicting the final performanceof partially trained model configurations using features based on network ar-chitectures, hyperparameters, and time-series validation performance data” [2].They report good R values (e.g. 0.969 for Cifar10 with MetaQNN CNNs) forpredicting the performance and did so on slightly more complex datasets such asCifar10, TinyImageNet and Penn Treebank. However, our motivation is to focusonly on architecture parameters and include more characteristic properties ofthe network graph than only “including total number of weights and number oflayers” which is independent of other hyperparameters and saves conducting anexpensive training phase.In the following chapters we give an introduction to Network Science, illus-trate how Directed Acyclic Graphs are embedded into ANNs, provide details ongenerated graphs and their properties and visualize results from predicting theperformance of built ANNs only with features based on structural properties ofthe underlying graph. A graph G = ( V, E ) is defined by its naturally ordered vertices V and edges E ⊂ V × V . The number of edges containing a vertex v ∈ V determines the degreeof v . The “distribution function P ( k ) [..] gives the probability that a randomlyselected node has exactly k edges” [1]. It can be used as a first characteristicto differ between certain types of graphs. For a “random graph [it] is a Poissondistribution with a peak at P ( (cid:104) k (cid:105) )”, with (cid:104) k (cid:105) being “the average degree of thenetwork” [1]. Graphs with degree distributions following a power-law tail P ( k ) ∼ k − γ are called scale-free graphs . Graphs with “relatively small characteristic pathlengths” are called small-world graphs [25]. tructural Analysis of Sparse Neural Networks 5 Graphs can be generated by Random Graph Generators (RGG). VariousRGGs can yield very distinct statistical properties. The most prominent modelto generate a random graph is the
Erd˝os-R´enyi- / Gilbert -model (ERG-model)which connects a given number of vertices randomly. While this ERG-modelyields a Poisson distribution for the degree distribution, other models have beendeveloped to close the gap of generating large graphs with distinct charactersticsfound in nature. Notably, the first RGGs producing such graphs have been the
Watts-Strogatz -model [25] and the
Barab´asi-Albert -model [1].
After obtaining a graph with desired properties by a Random Graph Generator,this graph is embedded into a Sparse Neural Network with following steps:1. Make graph directed (if not directed, yet),2. compute a layer indexing for all vertices,3. embed layered vertices between an input and output layer meeting the re-quirements of the dataset.The first step conducts a transformation into a Directed Acyclic Graph following an approach in Barak et al. [3] which “consider the class A of randomacyclic directed graphs which are obtained from random graphs by directing allthe edges from higher to lower indexed vertices”. Given a graph and a naturalordering of its vertices, a DAG is obtained by directing all edges from higherto lower ordered vertices. This can be easily computed by setting the upper (orlower) triangle of the adjacency matrix to zero.Next, a layer indexing is computed to assign vertices into different layers withina feed-forward network. Vertices in a common layer can be represented in aunified layer vector. The indexing function ind l ( v ) : V → N is recursively definedby v (cid:55)→ max ( { ind l ( s ) | ( s, v ) ∈ E inv } ∪ {− } ) + 1 which then defines a set oflayers L and a family of neurons indexed by I l for each layer l ∈ L .Finally, the layered DAG can be embed into an ANN for a given task,e.g. a classification task such as MNIST with 784 input and 10 output neurons.All neurons of the input layer are connected to neurons representing the verticesof the DAG with in-degree equal to zero (layer index zero). Succedingly, eachvertex gets represented as a neuron and is connected according to the DAG.The build process finishes by connecting all neurons of the last layer of the DAGwith neurons of the output layer. Following this approach, each resulting ANNhas at least two fully connected layers in size depending on the DAG vertices ofin-degree and out-degree equalling zero.Figure 1 visualizes the steps from a random generated graph to a DAG andembedded into an ANN classifier. J. Stier et al.(a) Sketched graph fromRandom Graph Genera-tor (RGG). (b) Transformed DirectedAcyclic Graph (DAG)from RGG. (c) Embedded DAG be-tween input and outputlayers of an Artificial Neu-ral Network classifier.
Fig. 1: Random (possibly undirected) graph, a directed transformation of it andthe final embedding of the sparse structure within a Neural Network Classifier.In this example the classifier would be used for a problem with four input andtwo output neurons.
In order to analyse the impact of the different structural elements, we con-ducted a supervised experiment predicting network performances on the struc-tural properties only. We then analysed the most important features involved inthe decision along with the prediction quality.For this experiment we created an artificial dataset graphs10k based on twoRandom Graph Generators (RGG). Each graph in the dataset is transformedinto a Sparse Neural Network (SNN), implemented in PyTorch [20]. The re-sulting model is then trained and evaluated on MNIST [17]. Based on obtainedevaluation measures, three estimator models – Ordinary Linear Regression, Sup-port Vector Machine and Random Forest – are trained by splitting graphs10k into a training and test set. The estimator models give opportunity to discussinfluence of structural properties to the SNN performances. graphs10k dataset
To investigate which structural properties influence the performance of an Arti-ficial Neural Network most, we created a dataset of graphs generated by
Watts-Strogatz - and
Barab´asi-Albert -models. The dataset comprises 10,000 graphs, ran-domly generated by having between 50 and 500 vertices. An exemplary property tructural Analysis of Sparse Neural Networks 7 F r e q u e n c y (a) Frequency distribution of number ofedges. e cc e n t r i c i t y _ v a r pearsonr = 0.42; p = 0 Jointplot for accuracy_test and eccentricity_var for graphs10k. (b) Jointplot of eccentricity variance vs.test accuracy
Fig. 2: : An exemplary frequency distribution of graphs10k . With a uniformlydistributed number of vertices, very different frequency distributions occur forother structural properties. : Correlation between variance in eccentricityand accuracy shown in a joint plot. Axes contain distributions of each feature.The test accuracy distribution (on the top y-axis) shows that there only existfew graphs between 0 . .
85 but three peaks at around 0 .
2, 0 .
35 and 0 . .The dataset contains 5018 Barab´asi-Albert-graphs and 4982 Watts-Strogatz-graphs. The graphs have between 97 and 4,365 edges and on average 1,399.17edges with a standard deviation of 954,12. In estimating the model performancesthe number of source and sink vertices are very prominent. Source vertices arethose with no incoming edges, sink vertices those with no outgoing edges. Onaverage the graphs have 79.75 source vertices with a standard deviation of 80.69and 9.56 sink vertices with a standard deviation of 17.53.Distributions within the graphs are reduced to four properties, namely min-imum, arithmetic mean, maximum and standard deviation or variance. For themean degree distribution, a vertex has on average 10.36 connected edges and this Note, that according to Karrer et al. “we do not at present know of any way tosample uniformly from the unordered ensemble” in the context of samling frompossibile orderings of random acyclic graphs [14].“Bayesian networks are composed of directed acyclic graphs, and it is very hardto represent the space of such graphs. Consequently, it is not easy to guarantee thata given method actually produces a uniform distribution in that space.” [13] J. Stier et al. arithmetic mean has a standard deviation of 4.47. The variance of the degreedistributions is on average 174.46 with a standard deviation of 259.08.After each graph was embedded in a SNN, its accuracy value for test andvalidation set was obtained and added to the dataset. More detailed statisticson other properties are given in Table 2.With the presented embedding technique it could be found, that graphs basedon the Barab´asi-Albert model could not achieve comparable performances tographs based on the Watts-Strogatz model. Only 2618 of Barab´asi-Albert modelsachieved over 0.3 in accuracy (less than 50%). Excluding the Barab´asi-Albertmodel did not change the estimation results in subsection 4.2, therefore statisticsare reported for all graphs combined. In future, the dataset will be enhanced tocomprise more RGGs.The test accuracy has three major peeks at arond 0.15, 0.3 and 0.94 as canbe seen in the frequency distribution on the x-axis of 2b, depicting a joint plotof the test accuracy and the variance of a graphs’ eccentricity distribution. graphs10k Ordinary Linear Regression (OLS), Support Vector Machine (SVM) and Ran-dom Forest (RF) are used to estimate model performance values given theirunderlying structural properties.The dataset was split into a train and test set with a train set size ratio of 0.7.In case of considering the whole dataset of 10,000 graphs the estimator modelsare trained on 7000 data points. The train-test-split was repeated 20 times withdifferent constants used as initialization seed for the random state.Different feature sets are considered to assess the influence of single features.The set Ω denotes all possible features as listed in Table 1, Ω np contains allfeatures except for the number of vertices, number of edges, number of sourcevertices and the number of sink vertices. These four features directly indicatenumbers of trainable parameters in the model (features with n o direct indicationto the number of p arameters). Ω op contains o nly those four p roperties.Features with variance information only are considered in Ω var , namely thevariances of the degree, eccentricity, neighborhood, path length, closeness andedge betweenness distributions. Compared to other properties of the distribu-tions – such as the average, minimum or maximum – features with varianceinformation have shown in experiments to have some influence on the networkperformance.A manually selected set Ω small contains a reduced set of selected features,namely number source vertices , number sink vertices , degree distribution var , den-sity , neighborhood var , path length var , closeness std , edge betweenness std , ec-centricity var . Table 1 provides an overview of used feature sets and resultingfeature importances for the RF estimator. The eccentricity of a vertex in a connected graph is the maximum distance to anyother vertex.tructural Analysis of Sparse Neural Networks 9
OLS achieved a R score of 0.8631 on average for feature set Ω over 20repetitions with a standard deviation of 0.0042. Pearson’s r for OLS is on average ρ = 0.9291. SVM with RBF kernel achieved on average 0.9356 with 0.0018 instandard deviation. RF achieved a R score of 0.9714 on average with a standarddeviation of 0.0009. None of the other considered estimators reaches the RF-estimator predicting the model performances.The feature importances of the RF estimator are calculated with sklearn [21]where “each feature importance is computed as the (normalized) total reductionof the criterion brought by that feature” . They are listed in Table 1 for eachused feature in scope of the used feature set. The number of sink vertices clearlyhave most influence to the performance of a MNIST-classifier. Taken from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html which references Breiman, Friedman,
Classifi-cation and regression trees , 1984 J . S t i e r e t a l. Model
Ω Ω np Ω op Ω var Ω small Ω min Random Forest R + − − − − + − + − OLS R − − − − − − lin R − − − − − − rbf R − − − − − − pol R − − − − − − Property RF Feature Importance number vertices 0.0009 0.0045 + − − − − − − − − − + − − − − − − − − − + − − − − + − − − − − − − − − − − − − − − − − − − − − − − Table 1: Structural properties for the SNNs’ underlying graphs from the graphs10k dataset and their influence as features forthe Random Forest model under different feature sets. tructural Analysis of Sparse Neural Networks 11
Features, which do not directly indicate the number of parameters, can beseen in the second column for Ω np and the three most important ones are high-lighted in boldface. The three most important features are the variances of de-gree, eccentricity and neighborhood distributions of the graph. Considering sixvariance features together with the number of source and sink vertices leads toa R value for RF of 0 . − . . R value of 0 . − . R of 0 . − . Ω var ). This gives indication, that e.g. in thecase of eccentricity a higher diversity leads to better performance. Higher vari-ances in those distributions imply different path lengths through the networkwhich can be found in architectures such as Residual Networks. This work presented motivations and approaches to study Artificial Neural Net-works (ANNs) from a network science perspective. Directed Acyclic Graphs withcharacteristic properties from network science are obtained by Random GraphGenerators (RGGs) and embedded into ANNs. A dataset of 10,000 graphs with
Watts-Strogatz - and
Barab´asi-Albert -models as RGGs was created to base ex-periments on. ANN models are trained on the presented structural embeddingtechnique and performance values such as the accuracy on a validation set wereobtained.With both, structural properties and resulting performance values, three esti-mator models, namely an Ordinary Linear Regression, a Support Vector Machineand a Random Forest (RF) model are built. The Random Forest model is mostsuccessful in predicting ANN model performances based on the structural prop-erties. Influence of the structural properties (features of the RF) on predictingANN model performances is measured.Clearly, the most important feature is the number of vertices and edges,directly determining the number of trainable parameters. There is, however,indication that the variance of distributions of properties such as the eccentricity,vertex degrees and path lengths of networks have influence for high performantmodels. This insight goes along with successful models such as Residual Networksand Highway Networks, which introduce more variance and thus lead to a moreensemble-like behaviour.More such characteristic graph properties will help explain differences ofvarious architectures. Understanding the interaction of graph properties couldalso be exploitet in the form of structural regularization – designing architec-tures or searching for them (e.g. in evolutionary approaches) could be driven bynetwork scientific knowledge.In future work, insights into the influence of structural properties on modelperformance will be hardened by more complex problem domains which as-sumably have more potential of revealing stronger differences. The approachwith structural properties will be extended to recurrent architectures to re-duce the gap between DAGs used in this work and directed cyclic networks, as they are found in nature. The performance prediction will also be integratedinto neuroevolutionary methods, most likely leading to a performance boostby early stopping and improved regularized search.
References
1. Albert, R., Barab´asi, A.L.: Statistical mechanics of complex networks. Reviews ofmodern physics (1), 47 (2002)2. Baker, B., Gupta, O., Raskar, R., Naik, N.: Accelerating neural architecture searchusing performance prediction (2018)3. Barak, A.B., Erd¨os, P.: On the maximal number of strongly independent verticesin a random acyclic directed graph. SIAM Journal on Algebraic Discrete Methods (4), 508–514 (1984)4. Bourely, A., Boueri, J.P., Choromonski, K.: Sparse neural networks topologies.arXiv preprint arXiv:1706.05683 (2017)5. Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparam-eter optimization of deep neural networks by extrapolation of learning curves. In:IJCAI. vol. 15, pp. 3460–8 (2015)6. Erd¨os, P., R´enyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung.Acad. Sci , 17–61 (1960)7. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Pro-ceedings of the Fourteenth International Conference on Artificial Intelligence andStatistics. pp. 315–323 (2011)8. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections forefficient neural network. In: Advances in neural information processing systems.pp. 1135–1143 (2015)9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016)10. Hilgetag, C.C., Goulas, A.: Is the brain really a small-world network? Brain Struc-ture and Function (4), 2361–2366 (2016)11. Hochreiter, S.: Untersuchungen zu dynamischen neuronalen netzen. Diploma, Tech-nische Universit¨at M¨unchen , 1 (1991)12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation (8), 1735–1780 (1997)13. Ide, J.S., Cozman, F.G.: Random generation of bayesian networks. In: Braziliansymposium on artificial intelligence. pp. 366–376. Springer (2002)14. Karrer, B., Newman, M.E.: Random graph models for directed acyclic networks.Physical Review E (4), 046110 (2009)15. Klein, A., Falkner, S., Springenberg, J.T., Hutter, F.: Learning curve predictionwith bayesian neural networks (2016)16. LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard,W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation net-work. In: Advances in neural information processing systems. pp. 396–404 (1990)17. LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits(1998)18. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through l , 2825–2830 (2011)22. Srinivas, S., Subramanya, A., Babu, R.V.: Training sparse neural networks. In:Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Con-ference on. pp. 455–462. IEEE (2017)23. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprintarXiv:1505.00387 (2015)24. Stier, J., Gianini, G., Granitzer, M., Ziegler, K.: Analysing neural network topolo-gies: a game theoretic approach. Procedia Computer Science , 234–243 (2018)25. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’networks. nature (6684), 440 (1998)4 J. Stier et al. nu m b e r _ v e r t i c e s pearsonr = 0.055; p = 3.6e-08 Jointplot for accuracy_test and number_vertices for graphs10k. (a) Correlation between test accuracyand number of vertices of the underlyinggraph. nu m b e r _ e d g e s pearsonr = -0.1; p = 1.1e-25 Jointplot for accuracy_test and number_edges for graphs10k. (b) Correlation between test accuracy andnumber of edges of the underlying graph.
Fig. 3 d e n s i t y pearsonr = -0.15; p = 2.6e-54 Jointplot for accuracy_test and density for graphs10k. (a) Correlation between test accuracy anddensity of the underlying graph. p a t h _ l e n g t h _ v a r pearsonr = 0.32; p = 6.3e-240 Jointplot for accuracy_test and path_length_var for graphs10k. (b) Correlation between test accuracy andpath length variance of the underlyinggraph.
Fig. 4 tructural Analysis of Sparse Neural Networks 15
Property Min Mean Max Std number vertices 50 269.66 490 129.56number edges 97 1399.17 4365 954.12number source vertices 1 79.7501 306 80.6862number sink vertices 1 9.5550 125 17.5298diameter 3 8.72 54 4.86density 0.0041 0.0277 0.1653 0.0256degree distribution mean 3.88
Table 2: Considered properties and their statistical distributions across the graphs10kgraphs10k