[PDF] Filtering Statistics on Networks

Abstract

We explored the statistics of filtering of simple patterns on a number of deterministic and random graphs as a tractable simple example of information processing in complex systems. In this problem, multiple inputs map to the same output, and the statistics of filtering is represented by the distribution of this degeneracy. For a few simple filter patterns on a ring we obtained an exact solution of the problem and described numerically more difficult filter setups. For each of the filter patterns and networks we found a few numbers essentially describing the statistics of filtering and compared them for different networks. Our results for networks with diverse architectures appear to be essentially determined by two factors: whether the graphs structure is deterministic or random, and the vertex degree. We find that filtering in random graphs produces a much richer statistics than in deterministic graphs. This statistical richness is reduced by increasing the graph's degree.

Full PDF

AArticle

Filtering Statistics on Networks

G. J. Baxter , R. A. da Costa *, S. N. Dorogovtsev and J. F. F. Mendes Department of Physics, University of Aveiro de & I3N,Campus Universitário de Santiago, 3810-193 Aveiro, Portugal * Correspondence: [email protected] August 17, 2020 submitted to Entropy

Abstract:

We explored the statistics of ﬁltering of simple patterns on a number of deterministic andrandom graphs as a tractable simple example of information processing in complex systems. In thisproblem, multiple inputs map to the same output, and the statistics of ﬁltering is represented by thedistribution of this degeneracy. For a few simple ﬁlter patterns on a ring we obtained an exact solutionof the problem and described numerically more difﬁcult ﬁlter setups. For each of the ﬁlter patternsand networks we found a few numbers essentially describing the statistics of ﬁltering and comparedthem for different networks. Our results for networks with diverse architectures appear to be essentiallydetermined by two factors: whether the graphs structure is deterministic or random, and the vertexdegree. We ﬁnd that ﬁltering in random graphs produces a much richer statistics than in deterministicgraphs. This statistical richness is reduced by increasing the graph’s degree.

Keywords: ﬁltering; information; degeneracy; entropy; relevance; resolution; complexity; complexnetworks

1. Introduction

Filtering is the processing of an input signal to produce an output signal according to some rule, basedon the content of the input. The ﬁlter does not add information, with the number of possible outputs beingless than (or at most equal to) the number of possible inputs. Thus, outputs are degenerate: multiple inputsmap to the same output. Even very simple ﬁlters can produce a complex distribution of degeneracies [1].This characteristic, of a nontrivial mapping of a conﬁguration space to a smaller set of ﬁnal conﬁgurations,also appears in sampling, compression and more general information processing [2,3], and in numerouscomplex systems, including the basins of attraction of local minima in spin glasses, and deep learningneural networks [4–6]. Understanding the statistics of degeneracies can give important insight into thesesystems. In a previous work [1], we showed that a simple ﬁltering problem produces analogous behaviourof the degeneracy distribution to these more complex systems, and that one can obtain exact results up tolarge system sizes that are simply not accessible in more complex problems.Numerous studies have shown that the heterogeneous structure of interactions between elements ofa complex system, usually represented as a complex network, can have a profound effect on the propertiesof the system [7]. Here we examine a simple ﬁltering process on a network. The input consists of thebinary states of nodes in a given network. The ﬁlter outputs a 1 for every instance of a particular patternof states on a node and its immediate neighbours, and a 0 when the pattern is absent. This generalises theﬁltering problem examined in Ref. [1] for binary inputs in a cyclical string (ring). The process applied to asmall graph is represented in Figure 1. We studied this problem on a variety of degree-regular graphs.We studied this problem on a variety of degree-regular graphs. We show that one may ﬁnd the exact

Submitted to

Entropy a r X i v : . [ n li n . C G ] A ug ersion August 17, 2020 submitted to Entropy degeneracy distribution corresponding to the complete set of all possible inputs, up to relatively largesystem sizes, for any given graph. Just as in our previous study on rings, we show that the principalcharacteristics of the degeneracy distribution are described asymptotically by three key numbers. Thesenumbers may be obtained exactly by simple arguments.

SR WR

Input:Outputs : Figure 1.

Application of different ﬁlters to a set of zeros and ones place on a graph. Each node of the inputand output graphs is in one of two states, namely 0 (open circles) or 1 (closed circles). In the SR ﬁlter, anoutput node is one only when the corresponding input node is one and all its neighbours are zero. Inthe WR ﬁlter, an output node is one when the corresponding input node is one and one or more of itsneighbours are zero.

This problem serves as a tractable simple model to explore information processing in complex systems.In a graph, the connections between nodes create complex interactions between the ﬁlter output at eachnode. We show that the degeneracy distribution correctly captures this complexity. In particular, theentropy of the degeneracy distribution, called the relevance [8] is lower in deterministically constructedgraphs, and higher in random graphs. We show that relevance is maximum when the graph degree takesits smallest value greater than two. We compared two different ﬁlters, and found that the stronger ﬁlter(detecting less easily satisﬁed conditions) is more informative, because it is more sensitive to the stateof neighboring nodes. Interestingly, as Figure 6 demonstrates, our results for regular graphs of diversearchitectures essentially depend only on a vertex degree.

2. Results

For orientation, we begin by studying nodes located on a ring. The input is a set of N strings of zeroesand ones { x i } , x i =

0, 1, of length n , assuming the periodic condition x = x n + . We consider the completeset of all possible unique inputs. Its size N is determined by the size n of inputs, N = n .The ﬁlter works as follows: every instance of a speciﬁc pattern in the input (a short sequence of onesand zeroes) is marked by a one in the corresponding position in the output. All other positions are markedwith zeroes. Multiple inputs correspond to the same output, creating a distribution of degeneracies of theoutputs. We illustrate the results from a simple example ﬁlter pattern in Figure 2 (a) and (b). We observecomplex degeneracy distributions reminiscent of those observed in, for example, Ref. [9].The ﬁlter pattern may be arbitrary, but for illustrative purposes we will consider in particular a familyof ﬁlters consisting of a string of ones with zeroes at either end: 010, 0110, 01110, etc. The length of theﬁlter, w , can be used as a crude control parameter to observe the effects on resolution and relevance (seebelow). For convenience, we use the notation 1 l to indicate a chain of l ones. Thus the ﬁlter of length w is ersion August 17, 2020 submitted to Entropy w −

0. In principle, for each of the 2 n possible inputs we can obtain, one by one, an output numerically.In practice, we use a more efﬁcient algorithm described in Ref. [1]. Other types of ﬁlter patterns on a ringmay be analyzed using the same methods. d N ( d ) (a) n = 36 d N c u m ( d ) (b) d N ( d ) (c) × d N c u m ( d ) (d) Figure 2.

Degeneracy distribution (a) and cumulative degeneracy distribution (b) for the ﬁlter 010 on aring, and for its generalization on a torus, which is a 1 with four neighboring 0’s [panels (c) and (d)]. N ( d ) for the full spectrum of degeneracies d for a variety of ﬁlters.The degeneracies d i , i =

1, ..., D , form a discrete spectrum of values where d D is the largest degeneracy,and d =

1. A few examples of the degeneracy distributions and cumulative degeneracy distributionsare shown in Figure 2. Here N cum ( d i ) ≡ ∑ Dj = i N ( d j ) . In particular, the total number of outputs is givenby M ( n ) = N cum ( d ) . The cumulative degeneracy distribution is broad, but decays more rapidly than apower law.The tail of the cumulative distribution has a notably complex structure resembling a staircase,with steep jumps between steps. The heights of these jumps are especially large in the region of highdegeneracies. Similar structures may be observed in real systems, see for example Figure 3 of Ref. [9]. Asshown in Ref. [1], when the number of ones in the output is few, and some or all of them are separated bylarge gaps, such outputs have very similar but not exactly equal degeneracies for ﬁnite n . These closelylocated degeneracies lead to the staircase structure observed in the cumulative distribution.Let us consider the evolution of the degeneracy distribution (and cumulative distribution) with inputsize n . The largest degeneracy d D ( n ) corresponds to the output with all zeroes, and for large n , grows as d D ( n ) ∼ = z nd , where the value of z d depends on the speciﬁc ﬁlter. Naturally N ( d D , n ) =

1. The numberof outputs with degeneracy 1 behaves as N ( n ) ∼ = z na . Meanwhile the total number of outputs, M ( n ) is asymptotically M ( n ) ∼ = z ng . Together, these three key constants, z d , z g and z a , delimit the asymptoticbehaviour of the degeneracy distribution [1]. We list these numbers for a selection of short ﬁlter patternson a ring in Table 1.Rather surprisingly, one may obtain these asymptotic behaviours, and exact expressions for theconstants z g , z d and z a through simple arguments. Each output consists of isolated ones separated bystrings of zeroes of various lengths. By careful consideration of how valid outputs for a larger n can beconstructed by adding speciﬁc segments to shorter outputs, one may construct recursive relations for the ersion August 17, 2020 submitted to Entropy key quantities M ( n ) , d D ( n ) and N ( n ) , whose asymptotics are given by z g , z d and z a . To demonstratethis, we focus on the particular family of ﬁlter patterns consisting of a chain of ones with a zero at eachend. The shortest such pattern is 010. Each member of this set may be indexed by the length of the ﬁlter, w ≥

3. The ﬁlter pattern length w determines the minimum number of zeroes, w −

2, between each one.We give the derivation of z d , z g and z a for any w in Section 4.2 below.2.1.2. Effect of ﬁlter lengthIn analogy with complex systems, we can consider each ﬁlter pattern as sampling the hidden stateof a complex system [1]. The length of these ﬁlters acts as a crude control parameter of our sampling.Intuitively, we expect shorter ﬁlters to be more informative. The resolution of a sampling process, deﬁnedas the entropy of a sample: H [ y ] = − N N ∑ i = log (cid:18) d i N (cid:19) = − ∑ d d N ( d , n ) N log (cid:18) dN (cid:19) (1)is a measure of the ability to distinguish, at the output, between different input states [8]. It takes itsmaximum value when there is a different output for each input. However in this case all outputs aredistinct, and so these ﬁlters are not informative about the system being sampled. As shown in Ref. [8], theinformativeness of a sample is captured by a different entropy measure, the relevance, deﬁned as H [ d ] = − ∑ d d N ( d , n ) N log (cid:18) d N ( d , n ) N (cid:19) . (2)Results for a variety of short ﬁlter patterns are given in Table 1. The family of ﬁlters composed of astring of ones with a zero at each end, 010, 0110, 01110, etc., are indicated in boldface in the Table. As canbe seen in the Table, the relevance is greater for shorter ﬁlters, but is actually zero for the shortest possibleﬁlters 0 and 1. The ﬁlter pattern 1 trivially reproduces the input, while 0 it’s inverse, and all outputs havedegeneracy one. Within the family of ﬁlters 01 w −

0, the relevance is maximised for w = N ( n ) , is either 0 (when n is odd) or 2 (when n is even), so z a =

1. This isbecause the only outputs that have degeneracy one are periodic sequences of alternating 0’s and 1’s —there are two of these sequences n is even, and none when n odd. The maximum degeneracy d D ( n ) forthis pattern grows by an integer factor of 4 for an increment in n of 5. In fact it can be written explicitly, d D ( n > ) = (cid:20) (cid:21) − mod ( n , − ) n /5 , (3)where the coefﬁcient of 4 n /5 equals ( ) = ( ) = ( ) = ( ) = ( ) = mod ( n , 5 ) =

0, 1, 2, 3, and 4, respectively. As aresult, the number z d , which gives the asymptotic behaviour of the maximum degeneracy d D , is equal to4 .As can be seen in Figure 5 (c), the degeneracy distribution of the ﬁlter 01 does not have thecharacteristic shape, and the broad tailed cumulative distribution seen in other ﬁlters. The ﬁlter pattern00 already produces more complexity, see Figure 5 (a). The degeneracy distribution and the cumulativedistribution already have the shape and complexity seen in longer ﬁlters [1]. Curiously N ( n ) = d D ( n ) + i n + ( − i ) n (where i is the imaginary unit) where the last two terms give 0, 2, 0, −

2, 0, 2, 0, −

2, 0, . . .for n =

3, 4, 5, 6, . . . . This means that z d = z a ≈ ersion August 17, 2020 submitted to Entropy

Table 1.

Values of the numbers z g , z d , and z a for different ﬁlters. Note that we also included ﬁlter patternsconsisting of all zeroes. For each ﬁlter we also give the relevance per node H [ d ] / n (in nats) calculatedfrom the degeneracy distribution and the resolution per node H [ y ] / n . For the sake of comparison, thestandard entropy of the inputs of this size is H / n = ln 2 = D for each pattern. Inputs of size n =

36 were used except for ﬁlters 00 and 10, forwhich n =

34, and 000 for which n =

35. Values for D for these three ﬁlters were extrapolated to n =

36 forcomparison with other results.pattern z g z d z a H [ d ] / n H [ y ] / n D (cid:111) (cid:41) (cid:111)  (cid:111) ersion August 17, 2020 submitted to Entropy

The largest degeneracies behave as ∼ = z nd for large n . The number z d quickly approaches 2 as the ﬁlterpattern length increases. Since N = n , this means that almost all outputs concentrate in a few outputs,and in the limit, in a single state, i.e. all outputs are the same and the ﬁlter patterns are not informative. Forthe shortest ﬁlter patterns, the value of z d falls rapidly, while the relevance increases, indicating a transitionto informative sampling. On the contrary, z g , which gives the total number of outputs M ( n ) , increaseswith decreasing ﬁlter length, as shorter ﬁlters have more possible outputs. Taken together, these resultsindicate that the maximally informative sampling for a given family of ﬁlters is the shortest pattern havinglength greater than 1. This behavior is analogous to the transition observed in more complex problems(see for example [10]).Note that one may also consider ﬁlters constructed as logical combinations of more than one pattern.For example, there are 3 kinds of ‘OR’ ﬁlters of size 2 + ( − ) /2 = n − , so z d = z g =

2. There are no outputs of degeneracy one, N ( n ) =

0. 00 OR 11 detects when the nextdigit is the same as the current one. The same reasoning as for the ﬁlter 01 OR 10 applies here: we canreconstruct the input completely from the output if we know a single digit of the input. Finally 11 OR 10(which is the same as 11 OR 01, 00 OR 10 and 00 OR 01) is equivalent to the ﬁlter 1 of length 1.

The process described in the previous Section may be generalised to an arbitrary graph as follows.The input consists of the binary status for each node in the graph. We ﬁlter for a particular condition ofthe state of a node and of its immediate neighbors. If the state of the node and its neighbours matchesthe ﬁlter pattern, the output for that node is 1, otherwise it is 0. We consider two examples: Firstly, weset the output to 1 if the selected node has state 1 and all of its neighbours have state 0 (we refer to thisﬁlter as the strong rule, or SR). This ﬁlter applied on a ring is equivalent to the pattern 010 discussed in theprevious Section. Secondly, we apply a less selective ﬁlter, outputting 1 if a node is in state 1 and any of itsneighbours has state 0 (we call this ﬁlter the weak rule, or WR). We illustrate the application of these twoﬁlter patterns to a small graph in Figure 1.These ﬁlters were applied to several families of degree-regular graphs. These were chosen to havea variety of structures and to vary in the degree of randomness in their construction, while being ofcomparable size and degree. We considered the following families of graphs: Small world graphs. Thesegraphs created by placing all nodes in a ring, and adding shortcuts between nodes to reach the desireddegree. The locations of shortcuts were either random – we use the code SW(q) for these graphs, where q is the graph degree – or in a deterministic way – SWB(q); Random regular graphs (RRG); Tori, which aretwo dimensional square lattices with cyclic boundary conditions; Cages. These are graphs deﬁned by twonumbers, the degree q and the shortest cycle length g . A (q,g)-cage is the graph fulﬁlling these propertieswhile having the smallest possible numbers n of nodes [11]. For each family of graphs we considereddifferent sizes, up to at least n =

30, and where possible, degrees, from q = q =

5. Finally weinvestigated the second and third Apollonian networks (Apollonian 2 and 3), which are the only graphshere that are not degree regular. ersion August 17, 2020 submitted to

Entropy d N ( d ) (a) RRG2 SR n = 30 d N c u m ( d ) (b) d N ( d ) (c) RRG3 SR n = 30 d N c u m ( d ) (d) d N ( d ) (e) RRG4 SR n = 30 d N c u m ( d ) (f) Figure 3.

Degeneracy distributions (left) and cumulative degeneracy distributions (right) for outputs of theSR ﬁlter on random regular graphs of degree 2 (a,b) 3 (c,d) and 4 (e,f). ersion August 17, 2020 submitted to

Entropy d N ( d ) (a) ring SR n = 30 d N c u m ( d ) (b) d N ( d ) (c) (3,8)-cage SR d N c u m ( d ) (d) d N ( d ) (e) torus 6 × d N c u m ( d ) (f) Figure 4.

Degeneracy distributions and cumulative degeneracy distributions for outputs of the SR ﬁlter onselected deterministic graphs of degree 2 (a,b) 3 (c,d) and 4 (e,f).

We give some examples of the resulting degeneracy distributions and cumulative degeneracydistributions, for the SR ﬁlter, in Figures 3, for random graphs, and 4, for deterministically generatedgraphs. Note that the distributions for random graphs correspond to a single realization of the graph.We see that there is a dramatic difference in the distribution for random graphs between degree two anddegree three. The degree two random regular graph necessarily consists of one or several closed rings,and the distribution is little different than that shown in Figure 2 (a). For degree three, there is a greatdeal of randomness in the formation of the graph, and this is reﬂected in the degeneracy distribution,which becomes much more dense, having a ﬁne structure not observed in deterministic graphs. For higherdegrees, the distribution becomes less broad, and as we will discuss below, this corresponds to a reducingrelevance with increasing degree.We have not included examples of the distributions for the "small world" graphs. The deterministicsmall world graphs, SWB(q), produce distributions almost indistinguishable from those for otherdeterministic graphs of the same degree, while the random small world graphs, SW(q), generatedegeneracy distributions very similar to those found for random regular graphs. For completeness,we give the degeneracy distributions and cumulative distributions for the same graphs using the WR ﬁlterin Figures A1 and A2 in Appendix A. ersion August 17, 2020 submitted to

Entropy d N ( d ) (a) n = 30 ‘ ’ on a ring d N c u m ( d ) (b) d N ( d ) (c) ‘ ’ on a ring n = 30 d N c u m ( d ) (d) d N ( d ) (e) Apollonian SR n = 16 d N c u m ( d ) (f) d N ( d ) (g) Apollonian WR n = 16 d N c u m ( d ) (h) Figure 5.

Degeneracy distributions and cumulative degeneracy distributions for outputs of the SR ﬁlter onselected deterministic graphs of degree 2 (a,b) 3 (c,d) and 4 (e,f).

We plot some examples of some less typical degeneracy distributions in Figure 5. These are the 00and 10 ﬁlters applied on a ring, and the SR ﬁlter applied to Apollonian networks (which are not degreeregular). ersion August 17, 2020 submitted to

Entropy

10 of 21 . . . . . . . q . . . . H [ d ] / n (a) . . . . . . . q . . . . . . . H [ y ] / n (b) RRG SRSW SRSWB SRcages SRtorus SR RRG WRSW WRSWB WRcages WRtorus WR q . . . . . n p d D ( n ) (d) q . . . . . . n p M ( n ) (e) q . . . n p N ( , n ) (f) q D ( n ) (c) n = 30 Figure 6.

Dependence of key observables related to the degeneracy distribution on graph degree q . (a)The relevance entropy H [ d ] scaled by system size n . (b) Resolution H [ y ] . (c) Total number of degeneracies D ( n ) for n =

30. (d) The n th root of the largest degeneracy d D ( n ) , which tends to z d . (e) The n th root ofthe number of outputs M ( n ) , tending to z g . (f) The n th root of the number of outputs of degeneracy onem N ( n ) , tending to z a . In Figure 6 we represent various quantities of interest as a function of graph degree, for the differentgraph families studied. We see that there is a clear separation in results between the two ﬁlters.The weak ﬁlter (WR) detects when a node has state 1 while having at least one immediate neighborwith state 0. This neighbour condition is more easily satisﬁed the larger the number of neighbours q . Thusfor large q , the number of possible outputs M ( n ) for the WR ﬁlter approaches the number of possibleinputs, 2 n . We see in panel (e) that indeed the n th root of M ( n ) , which tends to z g for large n , approaches2 for large q . By the same token, most outputs have a degeneracy of one, so the number of outputs ofdegeneracy one, N ( n ) also approaches 2 n ( z a approaching 2) for large q [panel (f)], with while the largestdegeneracy d D ( n ) (whose asymptotic behaviour is given by z d ) grows only slowly with n , [panel (d)]. Theresolution H [ y ] measures how well the ﬁlter distinguishes different inputs, and as we see in panel (b) ofFigure 6, and in agreement with the above observations, the resolution for the weak ﬁlter is high. Themaximum possible value of H [ y ] is n ln 2, corresponding to a value of H [ y ] / n = q = ersion August 17, 2020 submitted to Entropy

11 of 21

The correct measure of how informative a sample of the observable variables of a complex systemis about the underlying system is the relevance [8], H [ d ] . Such sampling is represented in our problemas the ﬁltering process, and the interactions of the system by the graph structure. The importance of therelevance is conﬁrmed by our results, as shown in Figure 6 (a). A higher relevance is measured in graphshaving some randomness in their structure, while deterministic and regular graphs have lower relevance.This is particularly true for the strong ﬁlter SR, which produces a signiﬁcantly higher relevance for randomregular graphs (RRG) and rings with random shortcuts (SW), compared with rings with deterministicshortcuts (SWB) and cages. The effect for the WR ﬁlter is much less pronounced.The highest relevance occurs at degree q =

3. The explanation for this is clear. As shown above, andin [1], smaller ﬁlters generally produce higher relevance, as there are more outputs than for larger ﬁlters,except in the extreme limit of perfect reproduction of the input (maximum resolution). Thus we wouldexpect lower values of q , which correspond to smaller SR ﬁlters, to have higher relevance. Meanwhile,and opposing this trend, graphs of degree q = q = q = q = q forwhich the graph is non deterministic. This echoes our ﬁnding for ﬁlters on rings, for which the maximumrelevance is found for the shortest ﬁlter which doesn’t trivially reproduce the input [1]. We show thedegeneracy ditribution for q = q . The largest degeneracy, d D , on the other hand, does become very large. In the limit of large q , alarge fraction of possible outputs give the same single output (all zeroes). In Figure 6 (c), the behaviour ofthe number of degeneracies, D ( n ) noticeably mirrors that of the relevance, H [ d ] . Note that data points forrandom graphs are averaged over several realisations of the graph.In Table 2 we list the key degeneracy distribution statistics for the SR ﬁlter, for all families of graphsstudied. Corresponding results for the WR ﬁlter may be found in Table A1. In addition to representingthe data highlighted in Figure 6 in the quantitative form, these tables demonstrate the size effects withexponentially rapid convergence to the inﬁnite n limit. In this work, we are mainly interested in regulargraphs (graphs where nodes have a uniform degree), because we can better isolate the effects of varyingthe graph’s degree. Nevertheless, for the sake of completeness, we also present results for a few examplesof non-regular graphs, namely Apollonian networks. In Tables 2 and A1, each group of rows delimitedby horizontal lines represents a different class of graphs. The four classes at the top of the tables, namelyApollonian networks, cage graphs, square lattices with periodic boundary conditions (torus), rings withdeterministic shortcuts, are deterministic graphs, while the two remaining classes represent randommodels, namely random regular graphs, and rings with random shortcuts. The numbers presented for therandom models result from averaging over 10 realizations sampled uniformly at random.We include results for graphs of several sizes for each type of graph. This allows one to see theconvergence of values with increasing n . Within the set of consecutive rows of each class, the graphs areordered by ascending degree, then by ascending number of nodes. The exception to this organization isthe two ﬁrst rows, which are for the non-regular Apollonian networks. All of these numbers, as well as thenumber of degeneracies D , for n =

30 are plotted in Figure 6. ersion August 17, 2020 submitted to

Entropy

12 of 21

Table 2.

Important values for the degeneracy distribution resulting from applying the strong rule (SR) ﬁlterto various graphs. The numbers n (cid:112) M ( n ) , n (cid:112) d D ( n ) and n (cid:112) N ( n ) approximate z g , z d and z a respectively.We also give the relevance per node H [ d ] / n and the resolution per node H [ y ] / n . Numbers for RRG( q ) andSW( q ) were obtained by averaging over 10 random realizations.graph n n (cid:112) M ( n ) n (cid:112) d D ( n ) n (cid:112) N ( n ) H [ d ] / n H [ y ] / n Apollonian 2 7 1.47236 1.94420 1.40854 0.08504 0.12919Apollonian 3 16 1.52380 1.94596 1.49013 0.08148 0.13005(3,5)-cage 10 1.54199 1.88916 1.42694 0.10463 0.22185(3,6)-cage 14 1.54904 1.88549 1.46952 0.09741 0.22302(3,7)-cage 24 1.54516 1.88688 1.42191 0.12412 0.22268(3,8)-cage 30 1.54618 1.88722 1.44630 0.08763 0.22254(4,5)-cage 19 1.48991 1.94458 1.37494 0.08094 0.13458(4,6)-cage 26 1.50129 1.94386 1.44997 0.05243 0.13497(5,5)-cage 1 30 1.44928 1.97192 1.34932 0.04164 0.07890(5,5)-cage 2 30 1.44984 1.97191 1.35558 0.04602 0.07891(5,5)-cage 3 30 1.44954 1.97192 1.35543 0.04201 0.07890(5,5)-cage 4 30 1.44964 1.97191 1.35280 0.05264 0.07891torus 3 × × × × × × × ersion August 17, 2020 submitted to Entropy

13 of 21

For fully connected graphs, both the strong and the weak rules produce trivial output and degeneracydistributions. Using the strong rule, for an output node y i to be 1, we must have x i = x j (cid:54) = i =

0. So, when there is a 1 in the output string, we have y i = x i . There are n of these outputs, and theirdegeneracy is 1. Since, there can be no more than a single 1 in the output string, the only other possibleoutput is a string of n zeros, which has degeneracy 2 n − n . In this case there are only two degeneracies inthe degree distribution d = d = n − n , and their frequencies are N ( d , n ) = n , and N ( d , n ) = y i to be 1, it is enough to have x i = x j (cid:54) = i =

0. Therefore, when one or more of the inputs x i is 0 the output isequal to the input. The only situation in which the output does not match the input is for an input stringof all 1’s, in which case the output is a strings of 0’s. The weak rule also produces only two degeneracies d = d =

2, with frequencies N ( d , n ) = n −

2, and N ( d , n ) = ( q , 3 ) -cagegraphs are fully connected graphs with q + ( q , 4 ) -cages are bipartite graphs with two fullyconnected layers of q nodes each. Bipartite graphs with two fully connected layers of the same size alsoresult in trivial degeneracy distributions in both the strong and weak rules. With the strong rule appliedto such a bipartite graph, for an output y i to be 1 we must have all inputs in the opposite layer to be x i =

0. Conversely, when one input of one of the layers is x i = y i = x i , andwhen there are 1’s in both layers of the input, the output is all 0’s. In this case the degeneracy distributionalso contains just two degeneracies, d = d = n − n /2 + , with frequencies N ( d , n ) = n /2 + and N ( d , n ) =

1, respectively (notice there are n /2 nodes in each layer). With the weak rule applied tosymmetrical fully connected bipartite graphs, for an output y i to be 1 it is enough to have just one x i = y i = x i . All inputswith at least one 0 in layer α and only 1’s in layer β produce an output consisting of all 0’s in layer α andall 1’s in layer β . Finally, if the input contains no 0’s in either layer, the output is y i = i . Therefore,we have d = d =

2, and d = n /2 −

1, with frequencies N ( d , n ) = n − n /2 − N ( d , n ) =

1, and N ( d , n ) =

1, respectively.From the trivial degeneracy distribution of these examples of graphs, i.e., fully connected and bipartitefully connected, we see that the entropies approach trivial limits for large system sizes. Namely, for thestrong rule, using Eqs. (1) and (2) for the output and degeneracy entropies, respectively, we see the in bothtypes of graphs H [ y ] and H [ d ] both approach 0, since the distribution is dominated by a single degeneracy d ∼ = n with N ( d , n ) =

1. With the weak rule,the entropy H [ y ] / n approaches ln 2 = H [ d ] approaches 0. In general, we expect that the entropies approach these limits was we increase the degree ofthe graphs generated by any model. Interestingly, this effect is already quite visible in Tables 2 and A1,when we compare the values of the entropy for different degrees within each class of graphs, even fordegrees up to only 5.

3. Discussion

In Ref. [1] we introduced a simple ﬁltering problem which produces a rich and complex distributionof output degeneracies. The input is a cyclic sequence of zeroes and ones (a ring), and the process outputsa one in any position where a particular short pattern occurs, and a zero otherwise. The tractability of theproblem means that we are able to give the complete degeneracy distribution, for the set of all possibleinputs, up to relatively large system sizes.In this paper, we have extended this problem to consider general graphs. The input is a digit 1 or 0assigned to each node of the graph, and the output for each node is 1 if the state of the node and those ersion August 17, 2020 submitted to

Entropy

14 of 21 of its immediate neighbours match a given ﬁlter pattern, and 0 otherwise. We demonstrate this processby calculating the full degeneracy distributions for various degree regular graphs with 30 or more nodes,using two example ﬁlter patterns. The weak (WR) pattern registers a 1 if the corresponding node has state1 and at least one of its neighbours has state 0. The strong (SR) pattern only registers 1 if the node is instate 1 and all of its neighbours are in state 0. We found degeneracy distributions having similar form andfeatures to those seen in the simpler problem of ﬁltering on a ring. We showed that three key features ofthe degeneracy distribution: the largest degeneracy d D ( n ) , the number of distinct outputs M ( n ) and thenumber of outputs having degeneracy one, N ( n ) behave as z nd , z ng and z na , respectively, where the threenumbers z d , z g and z a take values from 1 to 2 depending on the graph and the ﬁlter. We ﬁnd precise valuesfor these three numbers for all the graphs studied.The two ﬁlter examples used give quite different results, and have different behaviour with respectto graph degree. The key results are summarised by our main ﬁgure, Figure 6. The weak rule ﬁlter, WR,is only weakly sensitive to the neighborhood of a node, and hence the structure of the graph. For largedegree, it almost always produces an output matching the input. Thus the WR ﬁlter produces large valuesfor the ouput entropy, called the resolution, and small values for the degeneracy entropy, the relevance.The strong rule ﬁlter, SR, on the other hand, imposes a condition on all the neighbours of thenode where the ﬁlter is applied. This produces a much larger relevance (which is a measure of theinformativeness of the ﬁltering process) in random graphs, but much lower resolution, as the numberof unique outputs is restricted. The relevance is largest for the smallest graph degree not equal to two.Deterministically constructed graphs do not demonstrate the same peak in relevance, underlining theimportance of this measure for detecting complexity. For larger degree, the condition becomes morerestrictive, so the number of outputs is reduced. The resolution decreases with increasing q , but so doesthe relevance. The reason that the q = q =

2. The fact that results are largely determined by degree, indicates that it shouldbe possible to write a mean ﬁeld theory for the degeneracy distribution.Similar complexity is observed in various complex systems, particularly with regard to informationprocessing. In such systems, degeneracy distributions has been shown to be an important observation ofthe system. The entropy of this distribution, called the relevance, was shown [8] to be the relevant measureof complexity, and we showed that our simple problem reproduces many of the important qualitativephenomena observed in such systems. The ﬁltering problem is therefore a highly tractable problemilluminating some of the key features of information processing in more complex systems. The extensionof this problem to arbitrary graphs, makes the interactions between nodes more complex, and the analogywith the complex interactions of real complex systems more explicit.

4. Materials and Methods

The distributions shown in Figures 2-5, A1 and A2, and the numbers presented in the Tables 1, 2,and A1 and plotted in Figure 6 were experimentally obtained by considering all 2 n conﬁgurations of the n input binary variables x i individually. For a speciﬁed ﬁlter, or rule, we obtain the output variables y i corresponding to each input. From the frequency with which each output conﬁguration appears, we buildthe degeneracy distribution.For the sake of simplicity in the implementation of the computational experiments, we apply a basicindexing system to the output conﬁgurations. We start by initializing an array with 2 n positions populated ersion August 17, 2020 submitted to Entropy

15 of 21 with zeros, representing the frequency of observation of each output. Then, as we systematically runthrough all the possible inputs and calculate the corresponding outputs { y i } , we increment by 1 the valuein position ∑ i y i i of the array, where i =

0, 1, . . . , n −

1. In the end of this process, each position of thearray contains the frequency of its corresponding output. This method is memory intensive, and in somecases uses much more memory than strictly necessary, since most of the positions of the frequency arraywill remain unchanged after initialization (corresponding to non-realizable, or unobserved, outputs).It is relatively simple to develop methods that do not require so much memory, however they wouldnecessarily require more CPU resources, and have a larger time complexity. Notice that our method’s timecomplexity is linear with the number of input conﬁgurations 2 n . In the case of rings, a much more efﬁcientalgorithm may be used, as described in Ref. [1]. Here we show how the asymptotic behaviour of the degeneracy distribution may be obtained. Wefocus on the particular family of ﬁlter patterns consisting of a chain of 1s with a 0 at each end. The shortestsuch pattern is 010. Each member of this set may be indexed by the length of the ﬁlter, w ≥

3. Each outputconsists of isolated ones separated by strings of zeroes of various lengths. The ﬁlter pattern length w determines the minimum number of zeroes, w −

2, between each one.For w =

3, chains of three or fewer zeroes in the output can only be produced in one way. Thusoutputs containing only such chains of zeroes have degeneracy 1. Possible such output sequences can bebuilt up out of three kinds of building blocks, 01, 001, and 0001, put together in a ring of length n . We canthus ﬁnd the number of outputs of degeneracy 1, N ( n ) , by counting all possible ways of building a ringof length n out of these blocks. We can do this recursively. For every conﬁguration of length n −

2, we canobtain a valid conﬁguration of length n by inserting the block 01 to the right, say, of a particular position i in the ring. This gives all the conﬁgurations of length n with the block 01 to the right of i . Doing the samewith conﬁgurations of length n − i . Finally, repeating the procedure for conﬁgurations of length n − i . Since every block must be 01, 001, or 0001,the union of these three sets is the full set of conﬁgurations of degeneracy 1 in rings of n digits. Thus, wecan write N ( n ) = N ( n − ) + N ( n − ) + N ( n − ) . (4)Starting from the ﬁrst few values N (

1, 1 )= N (

1, 2 )= N (

1, 3 )= N (

1, 4 )=

6, (5)we could build up the sequence and ﬁnd N ( n ) for any n . However it is not necessary to iterate throughall values of n .The explicit solution of this linear difference equation (4) can be written in terms of the roots, z i , ofthe characteristic equation z = z + z + N ( n ) = z n + z n + z n + z n , (6)where the coefﬁcients of the powers of the roots z i , all equal to one, are found form the initial condition,Eq. (5). The root z ≡ z a = n asymptotics of N ( n ) .For w ≥

4, it becomes possible for there to be chains of ones in the input that are shorter than that inthe ﬁlter pattern. This means that only sequences of w − w − w or more zeroes in the output can be produced in more than one way. One may therefore ersion August 17, 2020 submitted to Entropy

16 of 21 extend an input of degeneracy 1 only by inserting blocks of length w − w . Hence the recursion for N ( n ) becomes N ( n ) = N ( n − w + ) + N ( n − w ) . (7)The corresponding characteristic equation is z w = z +

1. (8)For large n , then, N ( n ) ∼ = z na (9)Where z a corresponds to the dominant solution of Eq. (8).The total number of possible outputs may be derived in a similar way. The presence of a 1 at agiven position in the output corresponds uniquely to w ﬁxed digits at the same position in the input. Anydegeneracy therefore arises in the parts of the input corresponding to strings of zeroes in the output. Thetotal number of possible outputs, M ( n ) , is then the number of ways of arranging isolated ones in a chainof length n , subject to this constraint. For every output of length n −

1, we can create an output of length n by inserting an additional 0. The same is not true for the digit 1, however. Any 1 in the output mustbe accompanied by a sequence of w − w − into any valid output of length n − ( w − ) in a position immediately following asequence of w − M ( n ) = M ( n − ) + M ( n − w + ) ,with initial conditions M ( n = w ) = M ( n < w ) =

1. The elements of the sequence may be written interms of the roots of the characteristic equation [12–14] z w − = z w − +

1. (10)Then z g corresponds to the largest root of this equation. We list values for various ﬁlter lengths (as well asfor some other ﬁlter patterns) in Table 1.The entire degeneracy distribution may be built up by considering chains of zeroes of different lengthsin the output, and the number of different possible corresponding sections of the input. Let an output with m ≥ m strings of zeroes with lengths (cid:96) , (cid:96) , ..., (cid:96) m . Then the degeneracy of this output equals d = m ∏ i = ˜ d ( (cid:96) i ) . (11)Here ˜ d ( (cid:96) ) is the number of input strings of length (cid:96) , having the ﬁrst and last digits 0, that generate anoutput string of (cid:96) zeroes. This number plays an important role in our problem, similar to prime numbersin number theory, so we call the ˜ d ( (cid:96) ) prime degeneracies . Suppose that the output contains µ (cid:96) strings ofzeroes of length (cid:96) , (cid:96) = w − w − w , ..., where m + ∑ (cid:96) ≥ w − (cid:96) µ (cid:96) = n . (12)Then Eq. (11) may be rewritten d = ∏ (cid:96) ≥ w − [ ˜ d ( (cid:96) )] µ (cid:96) (13)for m ≥ ersion August 17, 2020 submitted to Entropy

17 of 21

The prime degeneracies ˜ d ( (cid:96) ) can be obtained recursively by taking into account three points:(i) Relevant input conﬁgurations of length (cid:96) are obtained by inserting 0 or 1 into each relevantconﬁguration of length (cid:96) − (cid:96) beginning and/or ending with 01 w − w − w − (cid:96) − w + (cid:96) zeroes, that cannot beobtained by inserting a single digit into relevant input strings of length (cid:96) − (cid:96) beginning with 01 w − w − (cid:96) − w between their ﬁrst and second positions.Following these rules, the degeneracy of a string of (cid:96) zeroes at the output, prime degeneracy ˜ d ( (cid:96) ) , canbe written recursively as a linear difference equation:˜ d ( (cid:96) ) = d ( (cid:96) − ) − ˜ d ( (cid:96) − w + ) + ˜ d ( (cid:96) − w ) (14)with the initial condition ˜ d ( ) = ˜ d ( ) =

1, ˜ d ( (cid:96) ) = (cid:96) − for 3 ≤ (cid:96) < w and d ( w ) = w − −

1. The solutionof Eq. (14) may be explicitly expressed in terms of the complex roots of the characteristic equation z w = z w − − z +

1. (15)giving ˜ d ( (cid:96) ) = C z (cid:96) + C z (cid:96) + C z (cid:96) + ... + C w z w (cid:96) . (16)The largest real root of Eq. (15), z , say, dominates for large (cid:96) , and we identify it as z d :˜ d ( (cid:96) ) ∼ = C z (cid:96) d . (17)The case of the periodic output of length n with all digits 0 has to be considered separately. Considerone digit of the input, at an arbitrary position. The number of input conﬁgurations where this digit is 0and the resulting output has only zeroes is given by ˜ d ( n + ) , because the periodicity of the input meansthat this digit 0 plays the role of both ﬁrst and last digit of the conﬁgurations of a string of n + + ∑ i (cid:54) = w − i ˜ d ( n − i ) , where the sum over i accounts for the conﬁgurations where the digit is in a group of i consecutive ones whose length is not w −

2, plus one conﬁguration with all input digits equal to 1. Thus the degeneracy of the output with allzeroes is given by d D ( n ) = + ˜ d ( n + ) + n − ∑ i = i (cid:54) = w − i ˜ d ( n − i ) , (18)which is the largest possible degeneracy of an output of a given length. Applying the recursion relationfor prime degeneracies ˜ d , Eq. (14) to the terms on the right-hand side of Eq. (18) we ﬁnd that the largestdegeneracy d D ( n ) satisﬁes the same difference equation as Eq. (14) though with different initial condition d D ( n ) = d D ( n − ) − d D ( n − w + ) + d D ( n − w ) (19) ersion August 17, 2020 submitted to Entropy

18 of 21 with the initial condition d D ( n ) = n for n < w , and d D ( w ) = w − w . For large n , the solution isdominated by a single solution, d D ( n ) ∼ = z nd . (20) Author Contributions:

All authors contributed equally substantially in all parts and aspects of the work.

Funding:

This work was developed within the scope of the project i3N, UIDB/50025/2020 & UIDP/50025/2020,ﬁnanced by national funds through the FCT/MEC. This work was also supported by National Funds through FCT, I.P. Project No. IF/00726/2015. R. A. d. C. acknowledges the FCT Grant No. CEECIND/04697/2017.

Conﬂicts of Interest:

The authors declare no conﬂict of interest.

Appendix A Further results for the weak rule ﬁlter

Here we plot degeneracy distributions, cumulative distributions, and tabulate measures for the weakrule ﬁlter, WR, for comparison with those given for the strong rule, SR, in the main body of the text above. d N ( d ) (a) ring WR n = 30 d N c u m ( d ) (b) d N ( d ) (c) (3,8)-cage WR d N c u m ( d ) (d) d N ( d ) (e) torus 6 × d N c u m ( d ) (f) Figure A1.

Degeneracy distributions (left) and cumulative degeneracy distributions (right) for outputs ofthe WR ﬁlter on selected deterministic graphs of degree 2 (a,b) 3 (c,d) and 4 (e,f). ersion August 17, 2020 submitted to

Entropy

19 of 21 d N ( d ) (a) RRG2 WR n = 30 d N c u m ( d ) (b) d N ( d ) (c) RRG3 WR n = 30 d N c u m ( d ) (d) d N ( d ) (g) RRG4 WR n = 30 d N c u m ( d ) (h) Figure A2.

Degeneracy distributions and cumulative degeneracy distributions for outputs of the SR ﬁlteron random regular graphs of degree 2 (a,b) 3 (c,d) and 4 (e,f). ersion August 17, 2020 submitted to

Entropy

20 of 21

Table A1.

Important values for the degeneracy distribution resulting from applying the weak rule(WR) ﬁlter to various graphs. The numbers n (cid:112) M ( n ) , n (cid:112) d D ( n ) and n (cid:112) N ( n ) approximate z g , z d and z a respectively. We also give the relevance per node H [ d ] / n and the resolution per node H [ y ] / n . Numbersfor RRG( q ) and SW( q ) were obtained by averaging over 10 random realizations.graph n n (cid:112) M ( n ) n (cid:112) d D ( n ) n (cid:112) N ( n ) H [ d ] / n H [ y ] / n Apollonian 2 7 1.95461 1.21901 1.91660 0.11519 0.66045Apollonian 3 16 1.94788 1.43435 1.91189 0.09594 0.64711(3,5)-cage 10 1.91202 1.21481 1.84295 0.13705 0.62974(3,6)-cage 14 1.91394 1.34590 1.83757 0.12271 0.63149(3,7)-cage 24 1.91348 1.25055 1.83511 0.11462 0.63259(3,8)-cage 30 1.91330 1.34897 1.83337 0.10559 0.63275(4,5)-cage 19 1.95248 1.21101 1.91027 0.08217 0.65878(4,6)-cage 26 1.95322 1.37995 1.91085 0.07188 0.65902(5,5)-cage 1 30 1.97461 1.16392 1.95220 0.04494 0.67453(5,5)-cage 2 30 1.97461 1.18854 1.95219 0.04495 0.67453(5,5)-cage 3 30 1.97461 1.21540 1.95220 0.04496 0.67453(5,5)-cage 4 30 1.97461 1.17585 1.95220 0.04495 0.67453torus 3 × × × × × × × ersion August 17, 2020 submitted to Entropy

21 of 21

References

1. Baxter, G.; da Costa, R.; Dorogovtsev, S.; Mendes, J. Complex distributions emerging in ﬁltering andcompression.

Physical Review X , , 011074.2. Song, J.; Marsili, M.; Jo, J. Emergence and relevance of criticality in deep learning. arXiv preprint arXiv:1710.11324 .3. Baek, S.K.; Bernhardsson, S.; Minnhagen, P. Zipf’s law unzipped. New Journal of Physics , , 043004.4. Hartmann, A.K.; Weigt, M. Phase Transitions in Combinatorial Optimization Problems ; Whiley-VCH, Weinheim,2005.5. Mézard, M.; Parisi, G.; Virasoro, M.

Spin Glass Theory and Beyond: An Introduction to the Replica Method and ItsApplications ; World Scientiﬁc, Singapore, 1987.6. Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On large-batch training for deep learning:Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 .7. Dorogovtsev, S.N.; Goltsev, A.V.; Mendes, J.F.F. Critical phenomena in complex networks.

Rev. Mod. Phys. , , 1275. doi:10.1103/RevModPhys.80.1275.8. Cubero, R.J.; Jo, J.; Marsili, M.; Roudi, Y.; Song, J. Minimally sufﬁcient representations, maximally informativesamples and Zipf’s law. arXiv preprint arXiv:1808.00249 .9. Marsili, M.; Mastromatteo, I.; Roudi, Y. On sampling and modeling complex systems. J. Stat. Mech.: Theory andExperiment , , P09003.10. Cubero, R.; Marsili, M.; Roudi, Y. Minimum Description Length codes are critical. Entropy , , 755.11. Meringer, M. Fast generation of regular graphs and construction of cages. Journal of Graph Theory , , 137.12. Hoggatt Jr., V.E. Fibonacci and Lucas Numbers ; Houghton Mifﬂin, Boston, MA, 1969.13. Graham, R.L.; Knuth, D.E.; Patashnik, O.; Liu, S.

Concrete Mathematics: A Foundation for Computer Science ;Addison-Wesley Publishing Company, Reading, Massachusetts, 1994.14. Koshy, T.

Fibonacci and Lucas Numbers with Applications ; John Wiley & Sons, Inc., Hoboken, New Jersey, 2019.c (cid:13)