[PDF] Random Graphs with Prescribed K-Core Sequences: A New Null Model for Network Analysis

Abstract

In the analysis of large-scale network data, a fundamental operation is the comparison of observed phenomena to the predictions provided by null models: when we find an interesting structure in a family of real networks, it is important to ask whether this structure is also likely to arise in random networks with similar characteristics to the real ones. A long-standing challenge in network analysis has been the relative scarcity of reasonable null models for networks; arguably the most common such model has been the configuration model, which starts with a graph G and produces a random graph with the same node degrees as G. This leads to a very weak form of null model, since fixing the node degrees does not preserve many of the crucial properties of the network, including the structure of its subgraphs. Guided by this challenge, we propose a new family of network null models that operate on the k-core decomposition. For a graph G, the k-core is its maximal subgraph of minimum degree k; and the core number of a node v in G is the largest k such that v belongs to the k-core of G. We provide the first efficient sampling algorithm to solve the following basic combinatorial problem: given a graph G, produce a random graph sampled nearly uniformly from among all graphs with the same sequence of core numbers as G. This opens the opportunity to compare observed networks G with random graphs that exhibit the same core numbers, a comparison that preserves aspects of the structure of G that are not captured by more local measures like the degree sequence. We illustrate the power of this core-based null model on some fundamental tasks in network analysis, including the enumeration of networks motifs.

Full PDF

RRandom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis Katherine Van Koevering [email protected] University

Austin R. Benson [email protected] University

Jon Kleinberg [email protected] University

ABSTRACT

In the analysis of large-scale network data, a fundamental opera-tion is the comparison of observed phenomena to the predictionsprovided by null models: when we find an interesting structurein a family of real networks, it is important to ask whether thisstructure is also likely to arise in random networks with similarcharacteristics to the real ones. A long-standing challenge in net-work analysis has been the relative scarcity of reasonable nullmodels for networks; arguably the most common such model hasbeen the configuration model, which starts with a graph 𝐺 and pro-duces a random graph with the same node degrees as 𝐺 . This leadsto a very weak form of null model, since fixing the node degreesdoes not preserve many of the crucial properties of the network,including the structure of its subgraphs.Guided by this challenge, we propose a new family of networknull models that operate on the 𝑘 -core decomposition. For a graph 𝐺 , the 𝑘 -core is its maximal subgraph of minimum degree 𝑘 ; and thecore number of a node 𝑣 in 𝐺 is the largest 𝑘 such that 𝑣 belongs tothe 𝑘 -core of 𝐺 . We provide the first efficient sampling algorithm tosolve the following basic combinatorial problem: given a graph 𝐺 ,produce a random graph sampled nearly uniformly from among allgraphs with the same sequence of core numbers as 𝐺 . This opensthe opportunity to compare observed networks 𝐺 with randomgraphs that exhibit the same core numbers, a comparison thatpreserves aspects of the structure of 𝐺 that are not captured bymore local measures like the degree sequence. We illustrate thepower of this core-based null model on some fundamental tasks innetwork analysis, including the enumeration of networks motifs. CCS CONCEPTS • Theory of computation → Graph algorithms analysis . KEYWORDS k-core, motif, Markov chain

ACM Reference Format:

Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg. 2021. Ran-dom Graphs with Prescribed 𝐾 -Core Sequences: A New Null Model forNetwork Analysis. In Proceedings of the Web Conference 2021 (WWW ’21),April 19–23, 2021, Ljubljana, Slovenia.

ACM, New York, NY, USA, 11 pages.https://doi.org/10.1145/3442381.3450001

This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3450001

Random graphs have long played a central role in the area of net-work analysis, and one of their crucial uses has been as null models :a way of producing families of synthetic graphs that match observednetwork data on specific basic properties. Armed with effective nullmodels, we can take an observed network phenomenon and askwhether a random graph with similar characteristics would exhibitthe same phenomenon or not.This comparison to random-graph baselines is an essential strat-egy, but of course the challenge is to define what we mean by arandom graph “with similar characteristics.” In these types of anal-yses, a widely-used null model — arguably the ubiquitous default— is the configuration model : given an observed network 𝐺 , it gen-erates random graphs sampled uniformly at random from amongall graphs with the same degree sequence as 𝐺 . The configurationmodel has provided a powerful way of asserting that observed prop-erties of real networks are not simply a consequence of the nodedegrees, in that they would be unlikely in a random graph with thesame degree sequence [14, 30].Despite the widespread use of the configuration model, it is well-understood to be an extremely weak null model, particularly forany question involving local rather than global structure. In partic-ular, a random graph with a given degree sequence will typicallyhave very little non-trivial local structure in the neighborhood ofany given node 𝑣 , and very little non-trivial community structure.Thus, real networks will almost always look very different from thepredictions of a random draw from the configuration model on anyquestion involving structures like local motifs or dense communi-ties; and these are some of the main questions for which peopleseek out random graphs as baselines.Given these limitations of the configuration model, researchershave sought other null models in which we sample uniformly ornear-uniformly over different families of graphs defined by charac-teristics of a given real network. Stanton and Pinar, for example,show how to sample from graphs that match an observed network 𝐺 not just in its degree sequence but in the pairs of degrees ( 𝑑 𝑖 , 𝑑 𝑗 ) arising from the edges ( 𝑖, 𝑗 ) of 𝐺 [38]. This increases the speci-ficity of the null model, but it continues to lack non-trivial localor community structure. An interesting recent step toward nullmodels designed to exhibit local structure was taken by Orsini etal. [31], who generalized and put into practice the 𝑑𝐾 -series hierar-chy of random graph models [22], where the lowest levels matchthe degree sequence or degree correlations and higher-levels — the2.1-series and 2.5-series — also match statistics on triangles suchas the average clustering coefficient or the sequence of clusteringcoefficients. This approach comes with the obstacle, however, thereare not any practical algorithms for uniformly sampling from thesesubsequent levels that match more than just degrees and pairs of a r X i v : . [ c s . S I] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg degrees; as a result, while they constitute valuable heuristics, theyare not designed to provide guarantees on near-uniform samplingfrom the associated family of graphs.Thus, a basic question has remained: given an observed graph 𝐺 , can we construct a null model by sampling from a family ofgraphs matching characteristics of 𝐺 in such a way that the result-ing random samples exhibit non-trivially rich local structure andcommunity structure? The present work: A null model based on the 𝑘 -core. Inthis paper, we provide a new approach to this question, by showinghow to uniformly sample from graphs that match 𝐺 in its k-core properties. The resulting samples provide random-graph baselineswith richer graph-theoretic structure than the configuration model,and we show that they can lead to potentially different conclusionswhen employed as null models.To formulate our approach, we begin with some basic definitions.Given a graph 𝐺 and a number 𝑘 , the 𝑘 -core of 𝐺 is the (unique)maximal subgraph of 𝐺 in which every node has degree at least 𝑘 ;it can be found efficiently by iteratively deleting nodes of degreestrictly less than 𝑘 in 𝐺 . (For sufficiently large 𝑘 , 𝐺 will have nosubgraph of minimum degree 𝑘 , and hence the 𝑘 -core of 𝐺 for theselarge 𝑘 will be the empty graph.) Building on this definition, wesay that the core-value 𝑐 𝑣 of a node 𝑣 is the largest 𝑘 such that 𝑣 belongs to the 𝑘 -core of 𝐺 .A long line of work in network analysis has shown that suc-cessive 𝑘 -cores of 𝐺 , for 𝑘 = , , , ... , provides considerable in-formation about the local structure of 𝐺 , including the regionswhere it exhibits denser connectivity [4, 11, 19, 23, 35]. This in-formation is equivalently captured by the sequence of core-values 𝑐 ≥ 𝑐 ≥ · · · ≥ 𝑐 𝑛 of the 𝑛 nodes of 𝐺 .Given this, we ask the following question: by analogy with theconfiguration model, which samples uniformly from all graphsmatching the degree sequence of 𝐺 , can we sample uniformly (ornear-uniformly) from all graphs matching the sequence of corenumbers of 𝐺 ? We could do this in theory by brute-force rejectionsampling, so our goal is to develop reasonable algorithms for gener-ating such samples. This type of sampler would provide a genuinelynew type of null model, by producing random graphs that match anobserved 𝐺 on richer forms of structure than the degree sequence. Sampling a random graph with a given core-value sequence.

We answer this question affirmatively, by providing a method fornear-uniform sampling from graphs with a given core-value se-quence. We provide an overview of our strategy here, and givedetails in the subsequent sections.Our basic approach is to define a Markov chain whose statespace is the set of all graphs with the given core-value sequence,and whose transitions are a set of graph transformations that pre-serve the core-values. The crux of the method, and the heart of ouranalysis, is the definition of a sufficiently rich set of local transfor-mations such that sequences of these transformations, composedtogether, are able to transform a starting graph 𝐺 into any othergraph with the same core-value sequence. Applying random trans-formation to an underlying graph thus produces a Markov chainon the set of all graphs of a given core-value sequence. Our re-sults establish that the Markov chain is strongly connected; and byadding appropriate probabilities on the “identity transformation” that leaves the graph unchanged, we can also ensure that the chainis aperiodic and has a uniform stationary distribution. Thus, bygenerating random trajectories in this Markov chain, we can samplenearly-uniformly from the set of all graphs with a given core-valuesequence.As part of the analysis of this sampling procedure, we solve aproblem of combinatorial interest in its own right. When we gen-erate our Markov chain based on a given graph 𝐺 , then 𝐺 itselfprovides a starting state for traversing the chain. But if we startinstead from a given core-value sequence 𝑐 ≥ 𝑐 ≥ · · · ≥ 𝑐 𝑛 , thenwe face the following fundamental question: is the state space asso-ciated with ( 𝑐 , 𝑐 , . . . , 𝑐 𝑛 ) non-empty? That is, do there exist anygraphs with this core-value sequence? And if so, can we constructone? For degree sequences in simple graphs without loops or paral-lel edges, the corresponding realizability question — characterizingwhether there exists a simple graph with a given degree sequence— is the subject of a famous theorem of Erdös and Gallai [6, 13] andthe constructive Havel–Hakimi algorithm [16, 17]. We provide acorresponding constructive characterization for the realizability ofcore-value sequences in simple graphs, and this gives us a start-ing point in the Markov chain when provided with a core-valuesequence as input.Through computational experiments, we demonstrate some ofthe basic properties of the samples produced by this Markov chain,including how they differ systematically from the output of theconfiguration model. We then demonstrate our methods in thecontext of a motif-counting application; the question here is whetherthe frequencies of particular small subgraphs in a given graph 𝐺 are significantly higher, significantly lower, or indistinguishablefrom the abundance of these subgraphs in a random-graph baseline.We show that a comparison to random graphs matching the degreesequence of 𝐺 may potentially lead to different conclusions thanthis same comparison to random graphs matching the core-valuesequence of 𝐺 ; this points to some of the value in having multiplenull models based on the different families of random graphs.It is useful to note a few additional points about these results.First, there is a large collection of additional families of randomgraphs that have been studied extensively in network analysis,including stochastic block models, preferential attachment graphs,Kronecker graphs, and many others. It would be interesting to relateour family of random graphs with a given core-value sequence tothese. But there is also an important distinction to be drawn in howthese families are generally used in practice: they are typically usedas generative models specified by optimizing a constant numberof parameters and then generating graphs whose size 𝑛 may bearbitrarily large. In contrast, our approach is more closely alignedwith models — such as the configuration model and more recentapproaches such as the 𝑑𝐾 -series — based on uniform or near-uniform sampling from a family of graphs obtained by matchinga base graph 𝐺 on a number of parameters (such as degrees orcore-values) that are linear in the number of nodes.Finally, we also note the following important open question.While we prove that random walks in our Markov chain will con-verge to the uniform stationary distribution on graphs of a fixedcore-value sequence, it is an open question whether this chain canbe proven to be rapidly mixing . This question aligns in interestingways with the fact that despite recent progress, we still do not have andom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis WWW ’21, April 19–23, 2021, Ljubljana, Slovenia a full understanding of the mixing properties of Markov chainson graphs with fixed degree sequences either [14]. The questionsin this area are quite challenging, though computational evidenceis consistent with the premise that these chains tend to mix wellin practice [25, 38]. As in those cases, our computational experi-ments also suggest that random walks are sampling our state spaceeffectively in practice, indicating the utility of our Markov-chainmethods. Establishing provable bounds is thus a valuable and po-tentially quite challenging further question, and recent techniquesin the theory of rapidly mixing Markov chains might be valuablehere. There are a large number of random graph models that are usedfor network analysis, and we refer to surveys by Sala et al. [33] andDrobyshevskiy and Turdakov [12] for a more expansive discussion.The models most relevant to our paper are those that are employedas “null models,” where the goal is to sample uniformly from theset of all graphs satisfying a certain property and then evaluatehow likely other properties are under the null. The configurationmodel , which samples uniformly from the set of graphs with aprescribed degree sequence, is broadly used [1–3, 14, 27, 28, 30].There are several variants of the configuration model for dealingwith simple graphs, self-loops, and multi-edges; these details and ahost of applications are covered in depth in the survey by Fosdicket al. [14]. Furthermore, there are a number of configuration-typemodels for other relational data models such as hypergraphs [5] andsimplicial complexes [40]. The Chung–Lu model is similar to theconfiguration model but samples from from graphs whose expecteddegree sequence is the same as the one that is given [7–9].The space of graphs with a fixed degree sequence is a specialcase of the more general 𝑑𝐾 -graphs, which specifies degree cor-relation statistics for subgraphs of size 𝑑 [22] (the configurationmodel corresponds to 𝑑 = ). Pinar and Stanton [38] developed auniform sampler for the 𝑑 = case, which generates graphs witha prescribed joint degree distribution. Further generalizations ofthe 𝑑𝐾 -graphs include those with prescribed degree correlationsand clustering statistics [10, 15, 31]. All of these techniques rely onMCMC samplers, but those for the 𝑑 ≥ cases or these generalized 𝑑𝐾 -graphs do not guarantee uniform samples. We also use MCMCsampling, but we can guarantee that the stationary distribution isuniform over the space of graphs with a specified 𝑘 -core sequence.A major application of null models is the determination of im-portant small subgraph patterns, often called network motifs [18, 24,26, 34, 37]. In these applications, small subgraphs are counted in thereal network and the null model, and those appearing much moreor less in the data compared to the null are deemed interesting forstudy. We include a set of experiments that revisits network motifsto see which are significant under our 𝑘 -core null model. For generating a random graph with a given core-value sequence c = 𝑐 ≥ 𝑐 ≥ · · · ≥ 𝑐 𝑛 , we will proceed as follows. First, wedefine the state space S c to be the set of all graphs with core-valuesequence equal to c . In this section, as in the rest of the paper, all graphs are undirected and simple , with no self-loops or paralleledges.We will define a set of moves that apply to a graph 𝐺 ∈ S c ; eachmove transforms 𝐺 into another graph 𝐺 ′ ∈ S c (where possibly 𝐺 ′ = 𝐺 ). The moves are defined such that if there is a move from 𝐺 to 𝐺 ′ , there is also one from 𝐺 ′ to 𝐺 . This allows us to define anundirected graph H c on the state space S c , in which 𝐺 and 𝐺 ′ areconnected by an edge (or potentially by several parallel edges) ifthere is a move that transforms 𝐺 into 𝐺 ′ .Let Δ be the maximum number of legal moves out of any one 𝐺 ∈ H c . We now define a random walk with self-loops as follows:For a graph 𝐺 with 𝐷 ≤ Δ legal moves out of it, the random walkremains at 𝐺 with probability − 𝐷 /( Δ ) , and with probability 𝐷 / Δ , it chooses one of the 𝐷 legal moves out of 𝐺 .Our main technical result is to show that for any two graphs 𝐺 , 𝐺 ∈ S c , it is possible to apply a sequence of moves that tran-forms 𝐺 into 𝐺 . This means that the undirected graph H c wehave defined is connected, and so the random walk we have definedconverges from any starting point to a unique stationary distribu-tion that (by the definition of the transition probabilities) is uniformon S c . We can therefore run the Markov chain from an arbitrarystarting point, and the graph we have after 𝑡 steps will becomearbitrarily close to a uniform graph with core-value sequence c as 𝑡 → ∞ .For the starting point, we can either use a given input graph, orwe can start directly from a core-value sequence c and construct agraph that realizes this sequence, if one exists. We show first howto efficiently perform this latter operation, constructing a graphfrom a core-value sequence. Given a sequence c = 𝑐 ≥ 𝑐 ≥ · · · ≥ 𝑐 𝑛 , how can we efficientlydetermine if there is a graph that has this as its core-value sequence,and to construct such a graph if one exists? Erdos and Gallai solvedthe analogous problem for degree sequences [6, 13], and here wegive an efficient algorithm for core-value sequences.Since core-values are define by degrees of subgraphs, it is usefulto have some initial terminology for degree sequences as well. Recallthat a graph is called 𝑑 -regular if all of its node degrees are equalto 𝑑 . We observe the following. (2.1) If 𝑑 is an even number, there exist 𝑑 -regular graphs on everynumber of nodes 𝑛 ≥ 𝑑 + . If 𝑑 is an odd number, there exists a 𝑑 -regular graph on 𝑛 ≥ 𝑑 + nodes if and only if 𝑛 is even. Proof. There are many natural constructions; here is one thatis easy to describe. We label the nodes , , . . . , 𝑛 − and interpretaddition modulo 𝑛 (thus imagining the nodes organized in clockwiseorder). When 𝑑 is even, connect each node 𝑖 to the 𝑑 / nodes oneither side of it in this order: 𝑖 − 𝑑 / , 𝑖 − ( 𝑑 / ) + , . . . 𝑖 + ( 𝑑 / ) .When 𝑑 is odd and 𝑛 is even, connect each node 𝑖 to the nodes 𝑖 − ( 𝑑 − )/ , 𝑖 − (( 𝑑 − )/ ) + , . . . 𝑖 + (( 𝑑 − )/ ) as well as the“antipodal” node in the clockwise order, 𝑖 + ( 𝑛 / ) .Finally, we note that in any graph, the sum of the degrees of allnodes must be an even number (since every edge is counted twice), WW ’21, April 19–23, 2021, Ljubljana, Slovenia Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg and therefore when 𝑑 is odd, any 𝑑 -regular graph must have aneven number of nodes. □ It will be useful to be able to talk about “almost regular” graphswhen 𝑑 is odd and 𝑛 is odd, so we say that a graph 𝐺 is 𝑑 -uniform if (i) 𝑑 is even and 𝐺 is 𝑑 -regular; or (ii) 𝑑 is odd, 𝐺 has an evennumber of nodes, and 𝐺 is 𝑑 -regular; or (iii) 𝑑 is odd, 𝐺 has anodd number of nodes, and 𝐺 consists of a single node of degree 𝑑 + with all other nodes having degree 𝑑 . By slightly extendingthe construction from the proof of (2.1) to handle case (iii) in thisdefinition as well, we have (2.2) For all 𝑑 and all 𝑛 ≥ 𝑑 + , there exists a 𝑑 -uniform graph on 𝑛 nodes. We now consider the set of 𝑐 -cores of 𝐺 , for 𝑐 = , , , . . . , whereagain the 𝑐 -core Γ 𝑐 is the unique maximal subgraph of minimumdegree 𝑐 . (In cases where it is clear from context, we will sometimesuse Γ 𝑐 to denote the set of nodes in the 𝑐 -core, as well as the sub-graph itself.) The following construction procedure for the 𝑐 -coresof 𝐺 will be useful in the proofs as well. • We first define Γ to be all of 𝐺 . • Having constructed Γ 𝑐 for a given 𝑐 , we then repeatedlydelete any node of degree at most 𝑐 from Γ 𝑐 , updating thedegrees as we go, until no more deletions are possible. (Notethat while all nodes in Γ 𝑐 have degree at least 𝑐 at the start ofthis deletion process, some degrees in Γ 𝑐 might drop below 𝑐 in the middle of the process.) Once the deletions from Γ 𝑐 have stopped, all of the remaining nodes have degree atleast 𝑐 + . Let 𝐻 be this subgraph of 𝐺 . 𝐻 has minimumdegree 𝑐 + ; and since no node deleted so far can belongto any subgraph of minimum degree 𝑐 + , we see that 𝐻 is the unique maximal subgraph with this property. Thus 𝐻 = Γ 𝑐 + . • We proceed in this way until we encounter a 𝑐 for which Γ 𝑐 is empty; at that point, we define 𝑐 ∗ = 𝑐 − , and declare Γ 𝑐 ∗ to be the top core of 𝐺 . • We will refer to the order in which the nodes were deletedfrom 𝐺 in this process as a core deletion order ; note that thereis some amount of freedom in choosing the order in whichnodes are deleted, and all such orders constitute valid coredeletion orders.We first consider the case in which all core-values in an 𝑛 -nodegraph 𝐺 are the same number 𝑐 . Note that in this case, we musthave 𝑛 ≥ 𝑐 + , since each node must have at least 𝑐 neighbors.Conversely, as long as 𝑛 ≥ 𝑐 + , we observe that a 𝑐 -uniform graphon 𝑛 nodes has all core-values equal to 𝑐 . Thus we have a firstrealization result for core-values, for the case where all values arethe same. (2.3) For a core-value sequence c = 𝑐 ≥ · · · ≥ 𝑐 𝑛 where all 𝑐 𝑖 = 𝑐 ,there exists a graph with this core-value sequence c if and only if 𝑛 ≥ 𝑐 + . Now, we consider an arbitrary core-value sequence c = 𝑐 ≥· · · ≥ 𝑐 𝑛 . As in (2.3), the highest 𝑐 + values must be the same inorder for node to have a sufficient number of neighbors in thetop core Γ 𝑐 . Thus, suppose 𝑐 𝑐 + = 𝑐 .Now, suppose | Γ 𝑐 | = 𝑛 , where 𝑛 ≥ 𝑐 + . Let 𝐻 be an 𝑛 -uniform graph on the nodes , , . . . , 𝑛 . For each node 𝑗 > 𝑛 , we attach it to an arbitrary set of 𝑐 𝑗 nodes in 𝐻 , resulting in a graph 𝐺 on the nodes , , . . . , 𝑛 . We now claim (2.4) The graph 𝐺 has core-value sequence c = 𝑐 ≥ · · · ≥ 𝑐 𝑛 . Proof. By construction, the 𝑛 nodes 𝑖 with ≤ 𝑖 ≤ 𝑛 all have 𝑐 𝑖 = 𝑐 ; they all belong to 𝐻 and hence have core-value equal to 𝑐 . For 𝑗 > 𝑛 , note that it belongs to the subgraph induced on thenodes { , , . . . , 𝑗 } ; since the minimum degree in this subgraph is 𝑐 𝑗 , we have 𝑗 ∈ Γ 𝑐 𝑗 . But since the degree of 𝑗 is 𝑐 𝑗 , we also have 𝑗 ∉ Γ 𝑐 𝑗 + , and hence the core-value of 𝑗 is 𝑐 𝑗 , as required. □ From (2.4) it follows that 𝐺 realizes the given core-value sequence c . Since the only assumption on c was that 𝑐 𝑐 + = 𝑐 , we have thefollowing theorem about realization of core-value sequences. (2.5) A sequence c = 𝑐 ≥ · · · ≥ 𝑐 𝑛 is the core-value sequence of asimple graph if and only if 𝑐 𝑐 + = 𝑐 ; and when this condition holds,there is an efficient algorithm to construct a graph with core-valuesequence equal to c . In the previous subsection, we showed how to construct a singlemember of the state space S c consisting of all graphs with core-value sequence c = 𝑐 ≥ · · · ≥ 𝑐 𝑛 . We now define a move set on thisstate space, providing ways of transforming a given graph in S c into other graphs in S c . For each move that transforms a graph 𝐺 to 𝐺 ′ , there will also be a move transforming 𝐺 ′ to 𝐺 ; thus, the graph H c on S c in which 𝐺 and 𝐺 ′ are adjacent when there is a movetransforming one directly into the other is an undirected graph.Let 𝐺 be a graph with core-value sequence c . We note that sortingthe nodes in the decreasing sequence of their indices 𝑛, 𝑛 − , . . . , , constitutes a core deletion order for 𝐺 , and we will use this fact atcertain points in the analysis.The first set of moves is • Move 1. Add and Delete. For any nodes ( 𝑖, 𝑗 ) not connectedby an edge in 𝐺 , we can add the edge ( 𝑖, 𝑗 ) provided that nocore-values are affected. Similarly, for an edge ( 𝑖, 𝑗 ) of 𝐺 , wecan delete ( 𝑖, 𝑗 ) provided that no core-values are affected. Given that we only add or delete edges when the core-values areunaffected, the resulting graph 𝐺 ′ is also in S c by definition.The remaining moves alter multiple edges at once, while pre-serving all core-values. The second set of moves is • Move 2. Move Endpoint. Let ℎ, 𝑖, 𝑗 be nodes of 𝐺 such that 𝑐 𝑗 < min ( 𝑐 ℎ , 𝑐 𝑖 ) , with ( ℎ, 𝑗 ) an edge of 𝐺 and ( 𝑖, 𝑗 ) not anedge of 𝐺 . We delete ( ℎ, 𝑗 ) and insert ( 𝑖, 𝑗 ) . We claim (2.6) If 𝐺 ∈ S c and we apply an instance of Move Endpoint involv-ing nodes ℎ, 𝑖, 𝑗 , then the resulting graph 𝐺 ′ is also in S c . Proof. Consider the core deletion order 𝑛, 𝑛 − , . . . , , in 𝐺 ;we consider nodes in this same order in 𝐺 ′ and analyze their core-values. Note that 𝑗 > max ( ℎ, 𝑖 ) since 𝑐 𝑗 < min ( 𝑐 ℎ , 𝑐 𝑖 ) .First, all nodes 𝑗 ′ > 𝑗 have the same edges into { , , . . . , 𝑗 ′ − } in both 𝐺 and 𝐺 ′ , so all of them will get the same core-value andcan be deleted in the same order. Next, 𝑗 has the same number ofedges into { , , . . . , 𝑗 − } in both 𝐺 and 𝐺 ′ , so it can still be deleted andom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis WWW ’21, April 19–23, 2021, Ljubljana, Slovenia when we encounter it in this order in 𝐺 ′ , and it will get the samecore-value as as well. Finally, once 𝑗 is deleted, the subgraphs of 𝐺 and 𝐺 ′ induced on the set of nodes { , , . . . , 𝑗 − } are identical,and so the ordering 𝑗 − , 𝑗 − , . . . , , forms a core deletion orderin both.From this, it follows that the sequence of core-values is the samein 𝐺 and 𝐺 ′ , and hence the Move Endpoint operation preserves thecore-value sequence. □ The third set of moves is • Move 3. Core Collapse and Core Expand. Let ℎ, 𝑖, 𝑗 be nodesof 𝐺 with 𝑐 ℎ > 𝑐 𝑖 and 𝑐 𝑖 = 𝑐 𝑗 . If ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) are bothedges of 𝐺 but ( 𝑖, 𝑗 ) is not, the Core Collapse operation deletes ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) and inserts ( 𝑖, 𝑗 ) , provided that no core valuesare affected. Analogously, if ( 𝑖, 𝑗 ) is an edge of 𝐺 but ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) are not, the Core Expand operation deletes ( 𝑖, 𝑗 ) andinserts ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) , again provided that no core values areaffected.We will also allow “half-move” versions of Core Collapse andCore Expand, again only in the case where no core values areaffected: in the half-move version of Core Collapse, only one of ( ℎ, 𝑖 ) or ( ℎ, 𝑗 ) is deleted; and in the half-move version of CoreExpand, only one of ( ℎ, 𝑖 ) or ( ℎ, 𝑗 ) is inserted. This concludes the description of the moves. We now analyzetheir global properties in the state space S c . Recall that our strategy is to use the set of moves specified in theprevious subsection to define an undirected graph H c on the statespace S c of all graphs with core-value sequence c . We now showthat H c is connected — that is, for any graphs 𝐺 , 𝐺 ∈ S c , there isa sequence of moves that transforms 𝐺 into 𝐺 . If we then definea random walk on H c with each edge out of a given state chosenuniformly, and self-loop probabilities at each state set as at the startof the section, the resulting process is connected and aperiodic,with a uniform stationary distribution that it converges to fromany starting point.It therefore remains only to establish the connectivity of H c . Todo this, we consider two arbitrary graphs 𝐺 and 𝐺 in S c , and wedescribe a path connecting 𝐺 and 𝐺 in H c . In order to do this, itis useful to recall a small amount of terminnology: the top core , asbefore, consists of the nodes with the highest core-value 𝑐 . Supposethat there are 𝑛 such nodes; that is, 𝑐 𝑛 = 𝑐 and 𝑐 𝑛 + < 𝑐 . Let 𝑉 = { , , . . . , 𝑛 } be the set of nodes in the top core. Finally, forsimplicity of exposition, we will assume for most of this discussionthat 𝑐 > . This condition applies to all the intended applicationsof our methods, since graphs with 𝑐 ≤ are much simpler instructure than the networks we work with in general. Moreover,the assumption 𝑐 > can be removed with additional work; at theend of the section we describe how to achieve analogous results forthe remaining cases of 𝑐 = and 𝑐 = .We construct the path from 𝐺 to 𝐺 in a sequence of steps.Since all of our moves have analogues that perform them in the“reverse” direction, we can describe the construction of this pathworking simultaneously from both its endpoints at 𝐺 and 𝐺 . Step 1: Linking all edges to the top core.

We first apply asequence of moves to 𝐺 designed to produce a graph 𝐺 ′ that hasthe same core-value sequence c , in which all edges have at leastone end in the set 𝑉 .For a number 𝑐 , we use Γ 𝑐 as before to denote the 𝑐 -core. Weconsider the nodes following the order of a core deletion sequence 𝑛, 𝑛 − , . . . , , . When we get to a node 𝑖 , it has degree 𝑐 𝑖 by thedefinition of a core elimination sequence. If 𝑐 𝑖 < 𝑐 , then we con-sider each of 𝑖 ’s incident edges ( 𝑖, 𝑗 ) in turn, and process this edgeaccording to the following set of cases. • If 𝑐 𝑗 = 𝑐 , then we do not need to do anything, since theedge ( 𝑖, 𝑗 ) already has one end in the top core 𝑉 . • If 𝑐 > 𝑐 𝑗 > 𝑐 𝑖 , then we apply Move Endpoint to delete ( 𝑖, 𝑗 ) and replace it with an edge ( ℎ, 𝑖 ) for any node ℎ ∈ 𝑉 that isnot currently a neighbor of 𝑖 . Such a node ℎ must exist since | 𝑉 | ≥ 𝑐 + while the degree of 𝑖 is 𝑐 𝑖 < 𝑐 . By (2.6), allcore-values are preserved by this operation. • If 𝑐 𝑗 = 𝑐 𝑖 and the degree of node 𝑗 is equal to 𝑐 𝑗 , then weapply the full version of the Core Expand operation, replac-ing the edge ( 𝑖, 𝑗 ) with two edges ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) to anynode ℎ ∈ 𝑉 that is not a neighbor of either. (By applying asequence of Move Endpoint operations prior to this Core Ex-pand operation, we can ensure that there is at least one node ℎ ∈ 𝑉 that is not a neighbor of either 𝑖 or 𝑗 .) We claim that 𝑖 and 𝑗 still have core-values equal to 𝑐 𝑖 after this operation:their core-values are at least 𝑐 𝑖 since the nodes in Γ 𝑐 𝑖 stillhave minimum degree 𝑐 𝑖 ; and their core-values are at most 𝑐 𝑖 since their degrees are equal to 𝑐 𝑖 . Since all other nodeshave the same core-values before and after this operation,the core-value sequence of the graph has been preserved. • If 𝑐 𝑗 = 𝑐 𝑖 and the degree of node 𝑗 is greater than 𝑐 𝑗 , then weapply the half-move version of the Core Expand operation,replacing the edge ( 𝑖, 𝑗 ) with the single edge ( ℎ, 𝑖 ) for anynode ℎ ∈ 𝑉 that is not a neighbor of 𝑖 . In this case toowe claim that that 𝑖 and 𝑗 still have core-values equal to 𝑐 𝑖 after this operation. As before, their core-values are atleast 𝑐 𝑖 since the nodes in Γ 𝑐 𝑖 still have minimum degree 𝑐 𝑖 .The core-value of 𝑖 is at most 𝑐 𝑖 since its degree is 𝑐 𝑖 . Thecore-value of 𝑗 is at most 𝑐 𝑗 since we have removed an edgeincident to it, which cannot raise its core-value. Since allother nodes have the same core-values before and after theoperation, the core-value sequence of the graph has beenpreserved in this case as well.We apply this process to each edge incident to node 𝑖 in turn; andwe proceed node-by-node through the core deletion sequence inthis way.At the end of this procedure, we have the desired graph 𝐺 ′ : ithas the same core-value sequence c , and all its edges have at leastone end in the set 𝑉 . We apply the same process to 𝐺 as well,producing a graph 𝐺 ′ that also has the property that every edgehas at least one end in 𝑉 . Step 2: Converting the top core to a 𝑐 -uniform graph. Start-ing from 𝐺 ′ , we next apply a sequence of moves so that the edgeswith at least one end outside of 𝑉 remain the same, but the sub-graph induced on 𝑉 becomes 𝑐 -uniform. Note that this will pre-serve the core-value sequence, since all nodes in 𝑉 in will still have WW ’21, April 19–23, 2021, Ljubljana, Slovenia Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg core-value equal to 𝑐 . It will also uniquely determine the degreesequence of 𝑉 , since the degree sequence of a 𝑑 -uniform graphgraph on 𝑛 nodes is uniquely determined by 𝑑 and 𝑛 : it consistsentirely of the value 𝑑 when at least one of 𝑑 or 𝑛 is even; and itconsists of a single instance of 𝑑 + and all other values equal to 𝑑 when both 𝑑 and 𝑛 are odd.To make the subgraph on 𝑉 𝑐 -uniform, it suffices to apply asequence of moves resulting in the following property: ( ∗ ) Either (i) all degrees in the subgraph induced on 𝑉 are equal to 𝑐 , or (ii) one node in the subgraph on 𝑉 has degree 𝑐 + , and all others have degree 𝑐 . An extension of our point in the previous paragraph is the following:which of cases (i) or (ii) occurs is determined by 𝑐 and 𝑛 : sincethe sum of the degrees of all nodes in the subgraph on 𝑉 must beeven, we will be in case (i) when at least one of 𝑐 or 𝑛 is even,and otherwise we will be in case (ii).To achieve property (∗) starting from 𝐺 ′ , we first delete anyedge if it joins two nodes 𝑖 and 𝑗 in 𝑉 that both have degree strictlygreater than 𝑐 . Since 𝑖 and 𝑗 still belong to a subgraph of minimumdegree 𝑐 , their core-values are still at least 𝑐 ; and since the deletionof the edge can’t have increased their core-values, they are stillat most 𝑐 as well. After this, we may assume that there are noedges joining any nodes in 𝑉 where both ends have degree strictlygreater than 𝑐 .Next, consider any node ℎ in 𝑉 of degree at least 𝑐 + . By thetransformations in the previous paragraph, all of its neighbors havedegree equal to 𝑐 . Let 𝑆 be this set of neighbors. Each node in 𝑆 hasan edge to at most 𝑐 − other nodes in 𝑆 , and so there is at leastone pair of nodes in 𝑆 , say 𝑖 and 𝑗 , that are not joined by an edge.We apply the following transformation: We first add the edge ( 𝑖, 𝑗 ) ,and then we delete the edges ( ℎ, 𝑖 ) and ( ℎ, 𝑗 ) . After this sequenceof three Add and Delete moves, the degrees of 𝑖 and 𝑗 remain thesame, and the degree of ℎ has been reduced by two. Since all threenodes ℎ, 𝑖, 𝑗 — as well as all other nodes of 𝑉 — still have degree atleast 𝑐 , all core-values in 𝑉 remain 𝑐 . The final thing we mustverify is that in the middle of this sequence, after adding the edge ( 𝑖, 𝑗 ) , we did not increase any core values strictly above 𝑐 , therebytaking our constructed path out of the state space S c . To show this,suppose that after adding ( 𝑖, 𝑗 ) (thereby increasing their degreesto 𝑐 + ), we delete 𝐺 − 𝑉 and all nodes of degree at most 𝑐 in 𝑉 . By the guarantee from the previous paragraph that there wereno edges connecting two nodes of degree greater than 𝑉 in 𝐺 , theresulting subgraph of 𝐺 consists of a set of isolated nodes, togetherwith a triangle on { ℎ, 𝑖, 𝑗 } . By our assumption that 𝑐 > (in fact, itis sufficient here that 𝑐 > ), no node in this subgraph has degreegreater than 𝑐 , and hence the graph after the addition of the edge ( 𝑖, 𝑗 ) continues to have an empty ( 𝑐 𝑖 + ) -core.If we repeatedly apply the operation in the previous paragraph,we arrive at a point where the subgraph on 𝑉 only has nodes ofdegrees 𝑐 and 𝑐 + , and there are no edges between any of thenodes of degree 𝑐 + . Finally, we perform a sequence of moves toreduce the number of nodes of degree 𝑐 + to at most one. Thus,suppose there are two nodes ℎ and ℓ that each have degree 𝑐 + .There are two cases to consider:(i) If there is a node 𝑖 that is a neighbor of one of ℎ, ℓ but notthe other — say that 𝑖 is a neighbor of ℎ but not ℓ — then we add the edge ( 𝑖, ℓ ) followed by deleting the edge ( ℎ, 𝑖 ) .After doing this, ℎ has degree 𝑐 and ℓ has degree 𝑐 + ;by applying the procedure in the previous paragraph, wecan then reduce the degree of ℓ to 𝑐 while preserving allother node degrees. In this way, we have strictly reduced thenumber of nodes of degree 𝑐 + .(ii) Suppose that the neighbor sets of ℎ and ℓ in 𝑉 are the same.Let 𝑇 be this set of common neighbors of ℎ and ℓ . We have | 𝑇 | = 𝑐 + , each node in 𝑇 has degree 𝑐 , and for eachnode, two of its edges go to ℎ and ℓ , so at most 𝑐 − edgesgo to other nodes in 𝑇 . Thus there is a pair of nodes in 𝑇 ,say 𝑖 and 𝑗 , that are not joined by an edge. We add the edge ( 𝑖, 𝑗 ) and then delete the edges ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) ; as above,this preserves all core-values after each move, and strictlyreduces the number of nodes of degree 𝑐 + .Since we can apply at least one of these two cases to strictly reducethe number of nodes of degree 𝑐 + whenever the number of suchnodes is at least two, we can iteratively perform this reduction untilthe number of nodes of degree 𝑐 + is at most one.We have therefore arrived at the desired outcome: a graph 𝐺 ′′ that agrees with 𝐺 ′ on all edges not contained entirely in 𝑉 , andwith the property that the subgraph on 𝑉 is 𝑐 -uniform. We per-form the same process on 𝐺 ′ , arriving at a graph 𝐺 ′′ whose sub-graph on 𝑉 is also 𝑐 -uniform. Step 3: Transforming one 𝑐 -uniform top core into another. For a set of nodes 𝑆 in a graph 𝐺 , let 𝐺 [ 𝑆 ] denote the subgraphof 𝐺 induced on 𝑆 . Since the subgraphs 𝐺 ′′ [ 𝑉 ] and 𝐺 ′′ [ 𝑉 ] areboth 𝑐 -uniform, their multisets of degrees are the same. If eachcontains a node of degree 𝑐 + , we choose an arbitrary bijection 𝜋 from { , , . . . , 𝑛 } to itself that maps the node of degree 𝑐 + in 𝐺 ′′ [ 𝑉 ] to the node of degree 𝑐 + in 𝐺 ′′ [ 𝑉 ] . Henceforth we cantake this bijection as implicit, and assume for simplicity that thenode of degree 𝑐 + (if any) is the same in 𝐺 ′′ [ 𝑉 ] and 𝐺 ′′ [ 𝑉 ] .Since the degree sequences of 𝐺 ′′ [ 𝑉 ] and 𝐺 ′′ [ 𝑉 ] are the same,it is known via results on the switch chain [14] that we can transformone of these subgraphs into the other by a sequence of moves ofthe following form: find four nodes { ℎ, 𝑖, 𝑗, 𝑘 } for which ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) are edges but ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) are not, and replace the edges ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) with ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) . In our move set we do not havethis operation available as a single move, but we can accomplishit by first adding the edges ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) and then deleting theedges ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) . As before, we simply need to verify that inthe middle of this sequence of two Add operations and two Deleteoperations, we do not cause any nodes to achieve a core-valuegreater than 𝑐 . To establish this, suppose that after the two Addoperations, we delete all nodes outside 𝑉 together with all nodesin 𝑉 of degree 𝑐 . The only nodes remaining are the four nodes { ℎ, 𝑖, 𝑗, 𝑘 } together with the node 𝑚 of degree 𝑐 + (if there is one),and the edges ( ℎ, 𝑖 ) , ( 𝑗, ℓ ) , ( ℎ, 𝑗 ) , and ( 𝑖, ℓ ) , as well as any edgesbetween { ℎ, 𝑖, 𝑗, 𝑘 } and 𝑚 . Since this 5-node subgraph is not thecomplete graph 𝐾 (since it lacks the edges ( ℎ, ℓ ) and ( 𝑖, 𝑗 ) ), it hasan empty 4-core. By our assumption that 𝑐 > , this means thatthere is no subgraph of minimum degree 𝑐 + after deleting allnodes of degree at most 𝑐 , and hence no node acquires a core-valueof greater than 𝑐 via our sequence of moves. andom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis WWW ’21, April 19–23, 2021, Ljubljana, Slovenia By applying a sequence of these switch moves, implemented assequences of two Add moves and two Delete moves each, we canthus produce a graph 𝐺 𝑜 that agrees with 𝐺 ′′ on all edges with atleast one end outside 𝑉 , and such that the subgraphs 𝐺 𝑜 [ 𝑉 ] and 𝐺 ′′ [ 𝑉 ] are isomorphic. Step 4: Concatenating the Subpaths.

The graphs 𝐺 𝑜 and 𝐺 ′′ are almost the same: their induced subgraphs on 𝑉 are isomorphic,and for each node 𝑗 > 𝑛 , the node 𝑗 has degree 𝑐 𝑗 in both, with all 𝑐 𝑗 edges going to nodes in 𝑉 . The ends of these 𝑐 𝑗 edges from 𝑗 to 𝑉 might be different in 𝐺 𝑜 and 𝐺 ′′ , but by applying a sequence ofMove Endpoint operations, we can shift the endpoints of 𝑗 ’s edgesto 𝑉 so that they become the same in the two graphs. Applyingsuch operations to every 𝑗 > 𝑛 , we can thus transform 𝐺 𝑜 to 𝐺 ′′ by a sequence of Move Endpoint operations for the edges from eachnode 𝑛 + , 𝑛 + , . . . , 𝑛 into 𝑉 .Finally, we can concatenate all the subpaths in H c that we havedefined using our set of moves. This concatenation provides thepath from 𝐺 to 𝐺 in H c : it goes via the intermediate graphs 𝐺 , 𝐺 ′ , 𝐺 ′′ , 𝐺 𝑜 , 𝐺 ′′ , 𝐺 ′ , 𝐺 and the paths between each consecutive pair of graphs on this listusing the sequences of moves describes in this subsection.Recall from the beginning of this section that if 𝐷 ( 𝐺 ) is thenumber of moves out of a graph 𝐺 ∈ S c , and Δ = max 𝐺 ∈S c 𝐷 ( 𝐺 ) ,we define a uniform random walk on the graph H c in which the self-loop probability at 𝐺 is − 𝐷 ( 𝐺 )/( Δ ) . We have thus establishedthat (2.7) The graph H c defined by our move set on the collection of allgraphs of core-value sequence c is connected. Moreover, the randomwalk on H c based on the self-loop probabilities we have defined hasthe property that it converges to a uniform stationary distributionfrom any starting point. Handling the case 𝑐 ≤ . As noted at the start of this sub-section, the exposition has assumed that the highest core-value 𝑐 satisfies the assumption (mild in practice) that 𝑐 > . We nowshow how with additional work we can remove this assumptionand still achieve comparable results.First, consider the case in which the highest core-value 𝑐 sat-isfies 𝑐 = . The only place in the analysis where we use theassumption that 𝑐 > is in Step 3 when we use two Add movesfollowed by two Delete moves to simulate the single switch move that replaces two edges ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) with ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) ; weneed to ensure that no node increases its core-value when we dothis. To handle the case 𝑐 = , we can thus simply enhance theMarkov chain by including switch moves in the top core: when (i)the set of four nodes { ℎ, 𝑖, 𝑗, 𝑘 } is a subset of the top core, (ii) ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) are edges and (iii) ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) are not edges, then weallow a single move that replaces the edges ( ℎ, 𝑖 ) and ( 𝑗, ℓ ) with ( ℎ, 𝑗 ) and ( 𝑖, ℓ ) . This preserves all core-values even when 𝑐 ≤ .With this extra set of moves including switch moves in the topcore, we now have a graph H ′ c with more edges than H c , and theanalysis above shows that that H ′ c is connected when 𝑐 = . Arandom walk on H ′ c is thus sufficient to generate random graphswith a given core-value sequence when the highest core-value is 2.Finally, the case 𝑐 = has a particularly simple structure: thecore-value sequence, for some 𝑘 , has 𝑘 nodes with core-value and 𝑛 − 𝑘 nodes with core-value . Any 𝐺 with this core-value sequencehas 𝑛 − 𝑘 isolated nodes and 𝑘 nodes that form a union of trees, eachof size at least 2. We can sample directly from this set of graphs,without recourse to the Markov chain developed here, by adaptingan algorithm for generating uniform spanning trees [39]: we firstsample from the size distribution of components and then samplespanning trees of complete graphs of the chosen sizes. In the previous section, we established that the Markov chain de-fined by the random walk on H c will converge to a uniform station-ary distribution from any starting point. We now discuss some ofthe computational considerations involved in running the Markovchain so as to be able to sample from it.The basic set up for computationally running this Markov chainhas two steps. A graph is input in the form of a SparseMatrix. Thecore numbers are then calculated and an array of core values fromlargest to smallest is created. The nodes are then renamed from 1to 𝑛 such that each node is distinct and the node name refers to theindex of their core-value in the core array. This results in nodesnamed such that nodes with larger core-values have smaller names.We then do a number of transition steps. Each transition step isidentical, except for the graph being processed. The transition steptakes in several values: the graph, the core array and an estimatedupper bound on the highest degree of any node in the Markovchain. We then estimate an upper bound on how many possibletransitions there are from this graph to other graphs. We do this bysoliciting an upper bound on the number of possible transitions foreach type of move - note that no two moves will ever give the sameexact resultant graph. This is done by proposing many moves, notall of which are necessarily possible. We sum these upper boundsto get an upper bound on the total number of transitions from thisgraph. If that upper bound is larger than our estimated upper boundon the largest degree in the Markov chain, then we double thatestimate and start over.Next, we randomly select a number between 0 and the estimateddegree upper bound. If the number is larger than our possibletransition estimate, then we "self loop" and draw again. Otherwise,we choose proportionally randomly among the moves, and thenselect a random proposed move. If it is not a possible move we "selfloop" and draw again. Otherwise, we apply the move, then call thetransition function again. This rejection sampling method is usedbecause calculating all possible moves is both memory intensiveand time consuming. Having now established the basic method for generating randomgraphs with a given core-value sequence, we provide a set of com-putational experiments showing how it can serve as a null modelfor network analysis tasks, parallel to the ways in which the config-uration model that fixes node degrees is used. We will see that insome cases, the conclusions from our core-value null model form

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg

150 175 200 225 250

Number of triangles F r a c t i o n Autonomous Systems configurationk-coredata 0 25 50 75 100

Number of triangles F r a c t i o n Protein Interactions

500 600 700 800

Number of triangles F r a c t i o n Lawyer Friendships

Number of triangles F r a c t i o n Power Grid

Figure 1: Distribution of the number of triangles from 50random samples of graphs with a 𝑘 -core sequence given bya real-world graph dataset and 50 random samples of graphswith a degree sequence given by a real-world graph dataset.The 𝑘 -core samples have more triangles, and often the num-ber of triangles in the dataset is within the range those ob-served in the random samples. fundamental contrasts with the conclusions that would be reachedusing the configuration model. We begin with an application where it is natural to expect thatthe contrast between the configuration model and the core-valuemodel might be apparent: in the frequency of small subgraphs.When we are assessing the abundance of a particular subgraph inreal network data, we may want to compare it to the frequency ofthis same subgraph in a randomized version of the network thatpreserves some invariant. The configuration model, by fixing onlythe node degrees, destroys most of the local structure, and hencecan make particular small subgraphs seem highly frequent in thereal network data as a result. Intuitively, our core-value model canbe viewed as preserving enough local structure to maintain thecore decomposition; will this give a different view of the abundanceof small subgraphs? We show here that it does in general.We begin by considering perhaps the simplest family of smallsubgraphs: triangles on three nodes. After this, we move on to ananalysis of small motifs more generally. In both cases, the core-value model leads to different conclusions than the configurationmodel in several important respects.

Triangle-based statistics

For our computational experiments here and in a number of thesubsequent analyses, we use four graph datasets: an autonomoussystems network [21], a protein structure network [24], a friendshipnetwork of lawyers working at the same firm [20], and a powergrid [36]. For each dataset, we run our Markov-chain sampler for a Code and data for all the results in this section may be obtained from the followinglink: T r i a n g l e d e g r ee Autonomous Systems configurationk-coredata 0 20 40 60 80051015 T r i a n g l e d e g r ee Protein Interactions T r i a n g l e d e g r ee Lawyer Friendships T r i a n g l e d e g r ee Power Grid

Figure 2: Triangle degree sequences (given by the number oftriangles adjacent to a given node) from 50 random samplesof graphs with a 𝑘 -core sequence given by a real-world graphdataset and 50 random samples of graphs with a degree se-quence given by a real-world graph dataset. The 𝑘 -core sam-ples tend to match the triangle sequence more closely. number of steps equal to 100 times the number of edges in the graph,with input 𝑘 -core sequence given by the dataset. We repeated this 50times to get 50 random graphs with a prescribed 𝑘 -core distribution.We then compare the statistics of the resulting graph to theoutput of the configuration model. For this, we use 50 samples froma Markov-chain configuration model sampler for vertex-labeledsimple graphs, using the double edge swap procedure described byFosdick et al. [14]. As noted earlier in this section, one weakness of the configura-tion model is that it destroys local structure, and we observe thiseven on the small datasets considered here. Specifically, the totalnumber of triangles in the configuration model samples is far belowthe number of triangles in the corresponding datasets (Figure 1).The random samples from the prescribed 𝑘 -core sequence havemore triangles than those in the configuration model samples. More-over, the distribution of the total number of triangles straddles thenumber of triangles in the autonomous systems dataset. Thus, theobserved number of triangles in this datasets is unsurprising giventhe 𝑘 -core sequence . In other words, we would not reject the nullmodel of a random graph sampled uniformly at random from thespace of graphs with the given 𝑘 -core sequence, just based on thestatistic of the number of triangles.In addition to the total number of triangles, we also measurethe triangle degree sequence in these random samples and comparethem to the datasets (Figure 2). Here, the triangle degree of a nodeis the number of triangles in which it participates. We see that thetriangle degree sequences given by the 𝑘 -core sequence null modelmore closely match those of the data. Note that the Markov-chain approach is the standard strategy for generating fixed-degree graphs because we are trying to produce simple graphs; more basic directapproaches yield graphs with self-loops and parallel edges. andom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis WWW ’21, April 19–23, 2021, Ljubljana, Slovenia S R P Autonomous Systems configurationk-core 0 1 2 3 4 5 60.60.40.20.00.20.40.6 S R P Lawyer Friendships

Figure 3: Subgraph ratio profile (SRP) plots under the 𝑘 -coreand configuration null models for four node subgraphs andtriangles. The x-axis in the SRP plots are indexed by theseven subgraphs at the bottom. Taken together, the results of this subsection provide evidencethat our 𝑘 -core-based null model offers a substantially differentbaseline than the configuration model. In particular, for the datasetsconsidered here, the core-based null model produces random sam-ples with a larger number of triangles that capture some of the localstructure in the graph. We will see in the next subsection that thissame principle applies for motif analysis more generally. Motif analysis

A longstanding application of null models for network analysis isthe identification of important or unusual small subgraph patternscalled network motifs [26]. The main idea is to count the number ofoccurrences of several small subgraphs in a given dataset as wellas in several random samples from a null model. “Motifs” are thensubgraphs that appear significantly more or less often than in thenull. Historically, the employed null model is the configurationmodel [14, 24, 26]. Here, we consider both the configuration modeland our 𝑘 -core-based model as null models.In Figure 3 we consider the results of counting six differentmotifs consisting of six distinct (non-induced) subgraphs on fournodes each, as well as a motif consisting of the triangle so that wecan view the results of the previous subsection in this context aswell. To decide whether the number of copies of a given subgraphappears significantly more or less frequently than in a randombaseline, a canonical approach is to the use the subgraph ratio profile(SRP) , which essentially measures a normalized difference betweenthe frequencies of the subgraph in the real network and in therandom baseline. (We refer readers to Milo et al. [24] for the precisedefinition.) As a result, a positive SRP for a given subgraph indicatesthat the subgraph occurs more frequently in the real data than in arandom baseline, while a negative SRP indicates that it occurs lessfrequently. Positive SRP values are thus taken as evidence that thecorresponding subgraph is a meaningfully abundant motif in thenetwork data.Viewed in this context, we see that the SRP can be defined usingany random-graph model that fixes some aspect of the structure ofthe real network. While the configuration model that fixes degreesis the standard approach, we can also define SRP values using thecore-value model and ask whether we arrive at similar conclusions.

200 300 400 500

Number of edges F r a c t i o n Autonomous Systems k-coredata 200 300 400 500

Number of edges F r a c t i o n Protein Interactions

200 300 400 500

Number of edges F r a c t i o n Lawyer Friendships

200 300 400 500

Number of edges F r a c t i o n Power Grid

Figure 4: Distribution of the number of edges from 50MCMC samples of graphs with a 𝑘 -core sequence given by areal-world graph dataset. As expected, the number of edgesin the random samples is different than in the original data,but the difference is not drastic. As we see in Figure 3, the SRP values based on the core-value modelare in fact quite different for two of our datasets, on autonomoussystems and the social network on lawyers. In particular, we seethat many SRP values are on opposite sides of across the twomodels, showing that a number of conclusions can change when wemove a core-based null model. Moreover, these changes generallygo in the conjectured direction based on the preservation of localstructure: if we believe that the core-value model destroys less ofthe local structure in a network relative to the configuration model,then we would expect lower (and potentially negative) SRP values,and this is what see for many of the subgraphs in Figure 3. Theresults thus point to the crucial role in the choice of null model forinterpreting these subgraph frequency questions — a type of issuethat becomes feasible to ask given an efficient way to generate nullgraphs with fixed core-value sequences. To understand how the core-value model behaves in these types ofapplications, it is natural to explore some of its basic properties aswell. Perhaps the most fundamental set of properties concern basiccounts of edges and degrees.When sampling based on a 𝑘 -core description given by a dataset,a major difference with the configuration model is that the numberof edges in the random sample can differ from those in the dataset.For a simple example, consider a 4-cycle and the graph obtained byadding one additional edge to the 4-cycle — all nodes in both graphshave a core value equal to two, but they differ in the number ofedges. Here, we examine the distribution in the number of edges inrandom samples generated by our algorithm, where the core-valuesequence is generated by a real-world dataset. For power grids and protein networks, there isn’t enough meaningful four-nodestructure to produce clear results using either baseline.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Katherine Van Koevering, Austin R. Benson, and Jon Kleinberg D e g r ee Autonomous Systems k-coredata 0 20 40 60 805101520 D e g r ee Protein Interactions D e g r ee Lawyer Friendships D e g r ee Power Grid

Figure 5: Degree sequences from 50 MCMC samples ofgraphs with a 𝑘 -core sequence given by a real-world graphdataset. The degree sequences of the random samples aresimilar (but not identical) to the degree sequences in the real-world data.Table 1: Network assortativity 𝑟 with respect to several at-tributes in the Lawyers dataset. We list the z-score of theassortativity statistic with respect to 50 samples from theconfiguration and 𝑘 -core-based model. z-scoreAttribute 𝑟 configuration 𝑘 -coreStatus 0.55 21.29 3.92Office Location 0.21 5.53 8.72.Gender 0.12 2.50 0.31Law School 0.05 1.80 0.79Type of Practice 0.04 1.29 1.71We use the same datasets and sampling procedure that we em-ployed in the previous subsections. Figure 4 shows the number ofedges in the resulting samples. We see that, for a given dataset,all of the random samples have a number of edges that is greaterthan or equal to the original data. Thus, the total number of edgesin these datasets over the space of graphs with the same 𝑘 -coresequence is concentrated above the observed number of edges. Atthe same time, though, the number of edges in the random sampleis not drastically different.We also compare the degree sequence of the random samples tothose in the original data (Figure 5). The degree sequences largelyresemble those in the original data, but are not exactly the same.Often, the samples from our algorithm produce graphs with a largermaximum degree than the empirical autonomous systems dataset. As a final investigation, we consider whether or not attribute-basedassortativity is preserved under the configuration and core-valuenull models. The lawyers dataset has several attributes on each node,and we measure the network assortativity 𝑟 [29] for status at the firm (partner or associate), office location, gender, law school, andtype of practice (litigation or corporate). Assortativity is positivefor all of the attributes, i.e., there is a tendency for edges to appearbetween two nodes sharing the same attribute (Table 1).As a baseline, we measure the assortativity levels under 50 sam-ples of the configuration model and the core-value model and com-pute the same 𝑧 -score as for the motif analysis. The assortativityscores are higher in the data than in both the null models (all ofthe 𝑧 -scores in Table 1 are positive). For example, office locationassortativity is overwhelmingly significant under either null. This isunsurprising, as neither null model is designed to capture mesoscalemodular, community, or cluster structure within the network, andseveral of the attributes are known to correspond to meaningfulcluster structure [32].At the same time, evaluating significance based on 𝑧 -scores forsome attributes could lead to different conclusions based on thechoice of null model and the desired significance level. For example,the gender assortativity in the network is . , which is about 2.5standard deviations above the mean with respect to the configura-tion model, but only 0.31 standard deviations above the mean withrespect to the core-value model. Thus, gender assortativity mayseem insignificant under the core-value null but significant underthe configuration model null. The 𝑘 -core decomposition is a fundamental graph-theoretic conceptthat assigns each node 𝑣 a core-value equal to the largest 𝑐 suchthat 𝑣 belongs to a subgraph of 𝐺 of minimum degree 𝑐 . Draw-ing on this concept, we have proposed a new family of randomgraphs that can serve as a class of null models in network analysis,obtained by randomly sampling from the set of all graphs with agiven core-value sequence. Our sampling method exploits the richcombinatorial structure of the 𝑘 -core decomposition; we constructa novel Markov chain on the set of all graphs of a given core-valuesequence, show that the state space is connected with respect tothis transition, and establish that the chain can be used to generatenear-uniform samples from this set of graphs.The approach opens up a number of intriguing further direc-tions of potential theoretical and empirical interest. One questionnoted earlier is to try establishing bounds on the mixing rate ofthe Markov chain we have defined. Such questions are in generalquite challenging, since the mixing even of simpler chains remainsopen; we note that many of these chains have proved valuable forsampling even in the absence of provable guarantees. A secondquestion, related to our solution of the realizability question forcore-value sequences, is to study extremal questions over the set ofgraphs realizing a given core-value sequence; for example, what isthe minimum or maximum number of edges that a graph with agiven core-value sequence can have? Finally, in a more empiricaldirection and motivated by our findings on network motifs, it willbe interesting to characterize the kinds of network properties forwhich the configuration model and our core-value model producesystematically different results. Such comparisons can begin to pro-vide insight into the broader consequences of our choice of nullmodels in network analysis. andom Graphs with Prescribed 𝐾 -Core Sequences:A New Null Model for Network Analysis WWW ’21, April 19–23, 2021, Ljubljana, Slovenia ACKNOWLEDGMENTS

The authors thank Haobin Ni for his thoughtful insight. This re-search was supported in part by ARO Award W911NF19-1-0057,ARO MURI, NSF Award DMS-1830274, a Simons Investigator Award,a Vannevar Bush Faculty Fellowship, AFOSR grant FA9550-19-1-0183, and grants from JP Morgan Chase & Co. and the MacArthurFoundation.

REFERENCES [1] Yael Artzy-Randrup and Lewi Stone. Generating uniformly distributed randomnetworks.

Physical Review E , 72(5):056708, 2005.[2] Edward A Bender and E Rodney Canfield. The asymptotic number of labeledgraphs with given degree sequences.

Journal of Combinatorial Theory, Series A ,24(3):296–307, 1978.[3] Béla Bollobás. A probabilistic proof of an asymptotic formula for the number oflabelled regular graphs.

European Journal of Combinatorics , 1(4):311–316, 1980.[4] Shai Carmi, Shlomo Havlin, Scott Kirkpatrick, Yuval Shavitt, and Eran Shir. Amodel of internet topology using k-shell decomposition.

Proceedings of theNational Academy of Sciences , 104(27):11150–11154, 2007.[5] Philip S Chodrow. Configuration models of random hypergraphs.

Journal ofComplex Networks , 8(3):cnaa018, 2020.[6] Sheshayya A Choudum. A simple proof of the Erdos-Gallai theorem on graphsequences.

Bulletin of the Australian Mathematical Society , 33(1):67–70, 1986.[7] Fan Chung and Linyuan Lu. The average distances in random graphs with givenexpected degrees.

Proceedings of the National Academy of Sciences , 99(25):15879–15882, 2002.[8] Fan Chung and Linyuan Lu. Connected components in random graphs withgiven expected degree sequences.

Annals of combinatorics , 6(2):125–145, 2002.[9] Fan Chung, Linyuan Lu, and Van Vu. The spectra of random graphs with givenexpected degrees.

Internet Mathematics , 1(3):257–275, 2004.[10] Pol Colomer-de Simón, M Angeles Serrano, Mariano G Beiró, J Ignacio Alvarez-Hamelin, and Marián Boguná. Deciphering the global organization of clusteringin real complex networks.

Scientific reports , 3:2517, 2013.[11] Sergey N Dorogovtsev, Alexander V Goltsev, and Jose Ferreira F Mendes. K-coreorganization of complex networks.

Physical review letters , 96(4):040601, 2006.[12] Mikhail Drobyshevskiy and Denis Turdakov. Random graph modeling: A surveyof the concepts.

ACM Computing Surveys (CSUR) , 52(6):1–36, 2019.[13] Paul Erdös and Tibor Gallai. Graphs with given degrees of vertices, math.

Mat.Lapok , 11:264–274, 1960.[14] Bailey K Fosdick, Daniel B Larremore, Joel Nishimura, and Johan Ugander. Config-uring random graph models with fixed degree sequences.

SIAM Review , 60(2):315–355, 2018.[15] Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2.5 k-graphs: fromsampling to generation. In , pages 1968–1976.IEEE, 2013.[16] S Louis Hakimi. On realizability of a set of integers as degrees of the vertices ofa linear graph. i.

Journal of the Society for Industrial and Applied Mathematics ,10(3):496–506, 1962.[17] Václav Havel. A remark on the existence of finite graphs.

Casopis Pest. Mat. ,80:477–480, 1955.[18] Lauri Kovanen, Márton Karsai, Kimmo Kaski, János Kertész, and Jari Saramäki.Temporal motifs in time-dependent networks.

Journal of Statistical Mechanics:Theory and Experiment , 2011(11):P11005, 2011.[19] Ricky Laishram, Ahmet Erdem Sariyüce, Tina Eliassi-Rad, Ali Pinar, and SuchetaSoundarajan. Measuring and improving the core resilience of networks. In

Proceedings of the 2018 World Wide Web Conference , pages 609–618, 2018.[20] Emmanuel Lazega.

The collegial phenomenon: The social mechanisms of cooperationamong peers in a corporate law partnership . Oxford University Press on Demand,2001.[21] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densifi-cation laws, shrinking diameters and possible explanations. In

Proceedings of theeleventh ACM SIGKDD international conference on Knowledge discovery in datamining , pages 177–187, 2005.[22] Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. Systematictopology analysis and generation using degree correlations.

ACM SIGCOMMComputer Communication Review , 36(4):135–146, 2006.[23] Fragkiskos D Malliaros, Christos Giatsidis, Apostolos N Papadopoulos, andMichalis Vazirgiannis. The core decomposition of networks: Theory, algorithmsand applications.

The VLDB Journal , 29(1):61–92, 2020.[24] Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, InbalAyzenshtat, Michal Sheffer, and Uri Alon. Superfamilies of evolved and designednetworks.

Science , 303(5663):1538–1542, 2004.[25] Ron Milo, Nadav Kashtan, Shalev Itzkovitz, Mark EJ Newman, and Uri Alon. Onthe uniform generation of random graphs with prescribed degree sequences. arXiv preprint cond-mat/0312028 , 2003.[26] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii,and Uri Alon. Network motifs: simple building blocks of complex networks.

Science , 298(5594):824–827, 2002.[27] Michael Molloy and Bruce Reed. A critical point for random graphs with a givendegree sequence.

Random structures & algorithms , 6(2-3):161–180, 1995.[28] Michael Molloy and Bruce Reed. The size of the giant component of a randomgraph with a given degree sequence.

Combinatorics probability and computing ,7(3):295–305, 1998.[29] Mark EJ Newman. Mixing patterns in networks.

Physical review E , 67(2):026126,2003.[30] Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. Random graphswith arbitrary degree distributions and their applications.

Physical review E ,64(2):026118, 2001.[31] Chiara Orsini, Marija M Dankulov, Pol Colomer-de Simón, Almerima Jamakovic,Priya Mahadevan, Amin Vahdat, Kevin E Bassler, Zoltán Toroczkai, MariánBoguná, Guido Caldarelli, et al. Quantifying randomness in real networks.

Naturecommunications , 6(1):1–10, 2015.[32] Leto Peel, Daniel B Larremore, and Aaron Clauset. The ground truth aboutmetadata and community detection in networks.

Science advances , 3(5):e1602548,2017.[33] Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, andZhao Ben Y. Measurement-calibrated graph models for social network exper-iments. In

Proceedings of the 19th international conference on World wide web ,pages 861–870, 2010.[34] Shai S Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. Network motifsin the transcriptional regulation network of escherichia coli.

Nature genetics ,31(1):64–68, 2002.[35] Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. Corescope: Graph miningusing k-core analysis—patterns, anomalies and algorithms. In , pages 469–478. IEEE, 2016.[36] Seung-Woo Son, Heetae Kim, David Olave-Rojas, and Eduardo Álvarez Miranda.Node information of chilean power grid with tap, Oct 2018.[37] Olaf Sporns and Rolf Kötter. Motifs in brain networks.

PLoS Biol , 2(11):e369,2004.[38] Isabelle Stanton and Ali Pinar. Constructing and sampling graphs with a pre-scribed joint degree distribution.

Journal of Experimental Algorithmics (JEA) ,17:3–1, 2012.[39] David Bruce Wilson. Generating random spanning trees more quickly than thecover time. In Gary L. Miller, editor,

Proceedings of the Twenty-Eighth AnnualACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA,May 22-24, 1996 , pages 296–303. ACM, 1996.[40] Jean-Gabriel Young, Giovanni Petri, Francesco Vaccarino, and Alice Patania.Construction of and efficient sampling from the simplicial configuration model.