[PDF] Efficient Coding in the Economics of Human Brain Connectomics

Abstract

In systems neuroscience, most models posit that brain regions communicate information under constraints of efficiency. Yet, metabolic and information transfer efficiency across structural networks are not understood. In a large cohort of youth, we find metabolic costs associated with structural path strengths supporting information diffusion. Metabolism is balanced with the coupling of structures supporting diffusion and network modularity. To understand efficient network communication, we develop a theory specifying minimum rates of message diffusion that brain regions should transmit for an expected fidelity, and we test five predictions from the theory. We introduce compression efficiency, which quantifies differing trade-offs between lossy compression and communication fidelity in structural networks. Compression efficiency evolves with development, heightens when metabolic gradients guide diffusion, constrains network complexity, explains how rich-club hubs integrate information, and correlates with cortical areal scaling, myelination, and speed-accuracy trade-offs. Our findings elucidate how network structures and metabolic resources support efficient neural communication.

Full PDF

EEﬃcient Coding in the Economics of Human Brain Connectomics

Dale Zhou , Christopher W. Lynn , Zaixu Cui , Rastko Ciric , Graham L. Baum , TylerM. Moore , David R. Roalf , John A. Detre , Ruben C. Gur , Raquel E. Gur ,Theodore D. Satterthwaite † , and Danielle S. Bassett † Neuroscience Graduate Group, Perelman School of Medicine, University of Pennsylvania,Philadelphia, PA 19104, USA Department of Physics & Astronomy, College of Arts and Sciences, University ofPennsylvania Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania Department of Bioengineering, Schools of Engineering and Medicine, Stanford University,Stanford, CA 94305 Department of Psychology and Center for Brain Science, Harvard University, CambridgeMA USA Penn-Children’s Hospital of Philadelphia Lifespan Brain Institute Department of Neurology, Perelman School of Medicine, University of Pennsylvania Department of Bioengineering, School of Engineering and Applied Sciences, University ofPennsylvania Department of Electrical & Systems Engineering, School of Engineering and AppliedSciences, University of Pennsylvania Santa Fe Institute, Santa Fe, NM 87501 USA To whom correspondence should be addressed: [email protected] † Co-senior authorsJanuary 16, 2020

Abstract: 150 wordsMain text: 5,466 words8 ﬁgures12 supplementary ﬁgures 1 a r X i v : . [ q - b i o . N C ] J a n bstract In systems neuroscience, most models posit that brain regions communicate information under con-straints of eﬃciency. Yet, metabolic and information transfer eﬃciency across structural networks arenot understood. In a large cohort of youth, we ﬁnd metabolic costs associated with structural pathstrengths supporting information diﬀusion. Metabolism is balanced with the coupling of structures sup-porting diﬀusion and network modularity. To understand eﬃcient network communication, we develop atheory specifying minimum rates of message diﬀusion that brain regions should transmit for an expectedﬁdelity, and we test ﬁve predictions from the theory. We introduce compression eﬃciency, which quan-tiﬁes diﬀering trade-oﬀs between lossy compression and communication ﬁdelity in structural networks.Compression eﬃciency evolves with development, heightens when metabolic gradients guide diﬀusion,constrains network complexity, explains how rich-club hubs integrate information, and correlates withcortical areal scaling, myelination, and speed-accuracy trade-oﬀs. Our ﬁndings elucidate how networkstructures and metabolic resources support eﬃcient neural communication. Introduction

Darwin described the law of compensation as the concept that “to spend on one side, nature is forced toeconomise on the other side,” [1]. In the economics of brain connectomics, natural selection optimizes net-work architecture for versatility, resilience, and eﬃciency under constraints of metabolism, materials, space,and time [2, 1, 3]. Networks – composed of nodes representing cortical regions and edges representing whitematter tracts – strike evolutionary compromises between costs and adaptations [2, 1, 4, 5, 6, 7], wherebydisruptions may contribute to the development of neuropsychiatric disorders [8, 9, 10, 11]. To understandhow the brain eﬃciently balances resource constraints with pressures of information processing, models ofinformation diﬀusion in brain networks are necessary. Such models have gained traction [12, 13, 14, 5, 15, 16],but it is unknown how a network of brain regions eﬃciently transmits messages to targets in the presence ofcountless alternative routes that are spatially embedded in diverse architectures of connectivity [3, 16, 17].Novel brain network communication models are needed because the predominant theories of shortest pathrouting and diﬀusion have been criticized as infeasible or ineﬃcient [3, 15, 16]. In shortest path routing,neural signals travel from source to target using either the fewest connections or the shortest spatial dis-tance [16]. Shortest path routing assumes biologically infeasible global information of path length or greedyselection of distances. Diﬀusion models assume an ineﬃcient process of random propagation from sourceto target. In contrast to these models, the eﬃcient coding hypothesis proposes that the brain representsinformation in a metabolically economical or compressed form by taking advantage of redundancy in thestructure of information [18, 2]. Coding eﬃciency characterizes low-dimensional neural representations anddynamics supporting cognition [19, 20, 21]. New models should therefore demonstrate metabolic and infor-mation transfer eﬃciency that predictably diﬀer according to variation in brain network structure across theprotracted development of structural connectivity [3, 12, 5, 22, 17, 7, 16].We develop a brain network communication model of eﬃcient coding by information diﬀusion (Figure 1A).We apply our model to 1,042 youth (aged 8-23 years) in the Philadelphia Neurodevelopmental Cohort whounderwent diﬀusion tensor imaging (DTI) and arterial-spin labeling (ASL; see Supplementary Figure 1) [23].To operationalize metabolic expenditure, we use ASL, which measures cerebral blood ﬂow (CBF) and iscorrelated with glucose expenditure and ATP consumption (Figure 1B) [24]. We join work modeling eﬃcientcoding with rate-distortion theory [25], a branch of information theory that provides the mathematical foun-dations of lossy compression [26]. By assuming that the minimal amount of noise is achieved by signals that3iﬀuse along shortest paths [12, 14, 16], we calculate the optimal rate of signal transmission to communi-cate between brain regions with an expected transmission ﬁdelity in the capacity-limited structural network.Speciﬁcally, we deﬁne the expected signal distortion as the probability of not propagating along the shortestpath. In developing the framework, we seek to understand how network structure and metabolic resourcessupport and constrain the eﬃcient transmission of information.To evaluate the validity of our eﬃcient diﬀusion model, we assess ﬁve published predictions of rate-distortiontheory and information diﬀusion (Figure 1C) [27, 25]. As we will describe in detail, hypotheses of informationdiﬀusion models posit how network structure guides propagating signals in support of metabolic eﬃciency,transmission ﬁdelity, and information integration [28, 12, 13, 29]. Hypotheses of rate-distortion theory positthat the trade-oﬀ between message ﬁdelity and compression governs predictable diﬀerences in the eﬃciencyof information broadcasting across networks [27, 25]. In evaluating the validity of our eﬃcient diﬀusionmodel, we introduce compression eﬃciency , which quantiﬁes how much structural networks prioritize lossycompression versus communication ﬁdelity. To demonstrate the utility of our model, we use compressioneﬃciency to test the hypothesis that diﬀusing information is integrated and broadcast by the brain’s highlyconnected regions or hubs [29]. Finally, we use compression eﬃciency to explain individual variation inthe speed-accuracy trade-oﬀ of cognitive performance, and we contrast its explanatory power with that ofcompeting measures [30]. Our model advances the current understanding of how eﬃciency, noise, and infor-mation integration are associated with metabolic resources and network architecture.

We sought to distinguish how brain metabolism is associated with structural signatures of shortest pathrouting versus diﬀusion signaling. Although shortest path routing is hypothesized to reduce metabolic cost,existing evidence for this hypothesis remains sparse [28]. To quantify the extent to which a person’s brain isstructured to support shortest path routing, we used the global eﬃciency , a commonly computed measure ofthe average shortest path strength between all pairs of brain regions. Intuitively, global eﬃciency representsthe ease of information transfer by the strength of direct connections in a network. As an operationalizationof metabolic running cost, we considered CBF, which is correlated with glucose consumption. To test the4 igure 1:

The eﬃcient diﬀusion model and associated hypotheses. (A)

To model eﬃcient coding, we apply rate-distortion theory, a branch of information theory that provides the foundations of lossy compression, to a previouslyproposed network measure called resource eﬃciency [12]. Resource eﬃciency models probabilistic diﬀusion by shortestpaths, solving for the number of random walkers required for at least one to propagate by the shortest path given anexpected probability. By assuming that the minimal amount of noise is achieved by signals that diﬀuse along shortestpaths, we calculate the optimal information rate to communicate between brain regions with an expected transmissionﬁdelity in the rate-limited (or capacity-limited) structural network. (B)

Across the brain, regional CBF measured from1,042 participants in our study correlates with the regional cerebral metabolic rate for glucose acquired from publishedmaps of 33 healthy adults (Pearson’s correlation coeﬃcient r = 0 . , df = 358 , p SPIN < . , replicating prior ﬁndingsand supporting the operationalization of metabolic expenditure using CBF [24]. We use CBF to investigate the relationshipbetween metabolic demands and network organization supporting either shortest path routing or diﬀusion. (C) Importantly,our model generates ﬁve predictions from the rate-distortion theory and information diﬀusion literature [25, 27, 29]. First,information transfer should produce a characteristic rate-distortion gradient in empirical and artiﬁcial networks, whereexponentially increasing information rates are required to minimize signal distortion. Second, information transfer eﬃciencyshould improve with manipulations of the system architecture designed to facilitate signal propagation, where informationcosts decrease when message diﬀusion is biased with regional diﬀerences in metabolic rates. Third, the information rateshould vary as a function of the costs of error, with discounts when costs are low and premiums when costs are high. Fourth,brain network complexity should ﬂexibly support communication regimes of varying costs and ﬁdelity, where a high-ﬁdelityregime predicts information rates that exponentially increase as the network grows more complex, and a low-ﬁdelity regimepredicts asymptotic information rates indicative of lossy compression. Fifth and ﬁnally, structural hubs should integrateincoming signals to eﬃciently broadcast information, where hubs (compared to other brain regions) have more compressedinput rates and higher transmission rates for equivalent input-output ﬁdelity. igure 2: Metabolic running costs support brain network architectures for diﬀusion. (A)

Global eﬃciency increaseswith development while cerebral perfusion declines ( F global eﬃciency, age = 50 , estimated df = 3 . , p < × − ; F CBF,age = 69 . , estimated df = 3 . , p < × − ). Contrary to prior reports, the relationship between CBF andglobal eﬃciency is confounded by age ( r = 0 . , df = 1039 , p = 0 . ). Therefore, the claim of reduced metabolic costassociated with shortest path routing is weakened. (B) All brain regions are accessible to signals randomly walking across5 edges or more. Bluer nodes represent greater normalized node strengths, reﬂecting the accessibility of the brain regionto a random walker diﬀusing along the structural connectome with diﬀering walk lengths. Shortest path routing predictsthat only shortest path walks of length 5 or less are associated with CBF. Diﬀusion models predict that longer randomwalk paths are also associated with CBF. (C)

Global CBF is negatively correlated with the strengths of structural pathswith lengths of at least 7 when controlling for age, sex, age-by-sex interactions, degree, density, and in-scanner motion( t = − . to − . , estimated model df = 11 . , FDR-corrected p < . ). Metabolic expenditure is related tostructure supporting signal diﬀusion, rather than shortest path routing. (D) Regional CBF is positively correlated withhigh-integrity connections comprising the structural scaﬀolds supporting diﬀering walk lengths ( ρ = 0 . to . , df = 358 ,FDR-corrected p < . ), controlling for age, sex, age-by-sex interaction, degree, density, and in-scanner motion. Asterisksdenote statistical signiﬁcance following correction for multiple comparisons. Together, these data suggest that metaboliccost is associated with a regional proﬁle of white matter path strengths that support diﬀusing messages. Individuals withgreater path strengths tend to have lower global metabolic expenditure, while brain regions with greater path strengthstend to have greater metabolic expenditure. p -value reﬂecting signiﬁcance is denoted p SPIN (Method 7.7.6). We replicated thepreviously reported linear association with glucose consumption (Figure 1B; Pearson’s correlation coeﬃcient r = 0 . , df = 358 , p < . p SPIN < . r = − . , df = 1039 , p < . F = 50, estimated df = 3 . p < × − ), while CBF was negatively correlated with age( F = 69 .

22, estimated df = 3 . p < × − ), and after controlling for age we do not ﬁnd a signiﬁcantrelationship between global eﬃciency and CBF ( r = 0 . , df = 1039 , p = 0 . t = − .

59 to − .

81, estimated model df = 11 . p < . ρ = 0 .

10 to 0 . df = 358,7DR-corrected p < . Communication between brain regions or modules requires reliable broadcasting of information with anexpected ﬁdelity. Although our data does not link metabolic expenditure to shortest path routing, com-munication of information diﬀusing along shorter paths should nevertheless confer advantages in speed andsignal ﬁdelity compared to longer paths. To test this hypothesis, we investigated whether brain metabolismis associated with network structures that support diﬀusion over shorter paths. Speciﬁcally, we assessed theassociation between CBF and path transitivity, a measure of the density of connections re-accessing shortestpaths, thereby guiding diﬀusion along eﬃcient pathways (Figure 3A). Prior reports have demonstrated thatpath transitivity in structural networks is positively correlated with fMRI BOLD functional connectivity[13], a ﬁnding that we replicate in our own data (Supplementary Figure 2). Path transitivity requires moreconnections and presumably incurs greater metabolic running costs associated with both the structural con-nections and increased functional connectivity [31]. When considering variation across individuals, we ﬁndthat greater path transitivity is associated with greater CBF (Supplementary Figure 3A; t = 2 .

27, estimatedmodel df = 11 . p = 0 .

02; controlling for age, sex, age-by-sex interaction, degree, density, and in-scannermotion). This result suggests that brain networks may strike a compromise between metabolic cost and thesignaling advantages of path transitivity. Next, we sought to assess whether the relationship between brainmetabolism and path transitivity was moderated by development. We found that the interaction betweenpath transitivity and age was positively associated with CBF ( F = 24 .

6, estimated df = 3 . p < × − ;Supplementary Figure 3B). Increased metabolic expenditure associated with greater path transitivity wasprominent during adolescence, when global CBF tends to decrease [32].We expanded our analysis of compromises between brain metabolism and network topology by consideringmultiple trade-oﬀs. Speciﬁcally, we considered variations in metabolic cost, path transitivity, and modu-larity across individuals (Figure 3B). We found that the relationship between path transitivity and global8 igure 3: Trade-oﬀs between modularity and diﬀusion architecture locally optimize metabolic running cost. (A)

Greater path transitivity enhances the ability of diﬀusing signals to re-access the shortest path when the number of closedpaths (triangles) returning to nodes on the path is high; path transitivity of the brain’s structural network is statisticallyassociated with fMRI BOLD activity [13]. Modularity is a feature of brain organization across species whereby regionscluster into highly intraconnected communities; modularity is thought to confer eﬃcient use of physical materials, segregateinformation representation, and support eﬃcient information transfer among brain regions. (B)

The 3-dimensional ﬁtnesslandscape conveys adaptive trade-oﬀs associated with brain metabolism in the context of evolutionary constraints oneﬃciency. (C)

When we consider variation across individuals, we ﬁnd that economical network architectures are adaptivelybalanced with metabolism. Metabolic cost is associated with the interaction of path transitivity and modularity ( t = 2 . ,estimated model df = 13 . , p = 0 . ), controlling for age, sex, age-by-sex interaction, degree, density, and in-scannermotion. The mean path transitivity and modularity across all individuals (red dotted lines projected from frequencyhistograms) approach a saddle point (critical point), deﬁned as a point on the surface that is both a relative minimumand a maximum along diﬀerent axes. The saddle point suggests that adaptive compromises in network architecture areconstrained by dual objectives. Along one axis, the objective is minimizing metabolic expenditure by coupling modularitywith path transitivity. Along the other axis, the objective is maximizing metabolic expenditure by decoupling modularityfrom path transitivity. This landscape suggests the existence of compromises which balance adaptations of functionalﬂexibility and spatial eﬃciency with material and metabolic costs. t = 2 .

56, estimated model df = 13 . p = 0 . To understand how the brain balances the transmission rate of diﬀusing signals and signal distortion acrossdiﬀerent network architectures, we propose a measure called compression eﬃciency , which synthesizes short-est path routing and diﬀusion (Figure 1A and 4A). We have thus far described how individual diﬀerences inmetabolic running costs and brain architecture suggest that the brain communicates by diﬀusion. We for-malize a rate-distortion model of eﬃcient diﬀusion by assuming that the minimal amount of noise is achievedby signals that diﬀuse along shortest paths (Figure 1A and 4B). We deﬁne distortion as the probability ofa diﬀusing signal not taking the shortest path. To understand how the brain balances information rate anddistortion, we measure resource eﬃciency: the number of resources required for at least one resource torandomly walk along the shortest path to a target cortical region, with an expected probability (Figure 4C;Method 7.5.5). Just as rate-distortion theory predicts the minimum information rate needed to achieve aspeciﬁed signal distortion transmitting through a capacity-limited channel, resource eﬃciency predicts theminimum number of resources needed to achieve a speciﬁed level of signal distortion resulting from diﬀusionacross the structural connectome (Figure 4D). The information-theoretic trade-oﬀ between information rateand signal distortion is deﬁned by individually diﬀerent rate-distortion gradients (Figure 4E). As distor-tion increases, the information rate decays exponentially. By analogy with rate-distortion theory, here weconsider the extent to which the brain’s structural connectome prioritizes compression versus ﬁdelity. Werefer to this tradeoﬀ as the compression eﬃciency (Figure 4E), and deﬁne it as the slope of the exponentialrate-distortion gradient (Method 7.5.6). 10 igure 4:

Neurodevelopment places a premium on ﬁdelity. (A)

Rate-distortion theory is a mathematical frameworkthat deﬁnes the required amount of information for an expected level of signal distortion during communication throughcapacity-limited channels. Loss functions, such as the mean squared error, are deﬁned to map encoded and decoded signals.In a given channel, compressed messages demand lower information rates at the cost of ﬁdelity. (B)

In brain networks,we deﬁne the distortion function as the probability of a diﬀusing signal not taking the shortest path. This deﬁnition isreasonable because the temporal delay, signal mixing, and decay introduced by longer paths collectively increase distortion. (C)

The Galton board depicts the problem of determining the minimum number of resources (diﬀusing messages) that thestarting node should prepare to transmit in order for one resource to propagate by the speciﬁed path given an expectedprobability. Formalizing the shortest path distortion function allows analytical solutions to determine the correspondingminimum rate of information using the metric of resource eﬃciency. Recall that the resource eﬃciency solves for thenumber of randomly walking resources (or messages) required for at least one walker to propagate by the shortest pathwith an expected probability. (D)

In applying our theory to the structural connectome, we observe a characteristic curvethat is consistent with that predicted by the rate-distortion function. Just as an artiﬁcial pixelated image is governedby the same information-theoretic rules as a naturalistic image, the synthetic random networks exhibit the rate-distortiongradient also observed in brain networks, supporting the ﬁrst prediction of rate-distortion theory. The blue curve depictsthe loess ﬁt of the mean rate-distortion function across all individuals. The black curve depicts the mean rate-distortionfunction for Erd¨os-Reny´ı random networks whose edges maintain the weights from empirical measurements. The resourcesrequired of brain connectomes and random networks diﬀer by distortion ( F = 10 × , df = 29120 , p < × − ),highlighting the established graph theoretical notion that random networks increase shortest path accessibility but demanda heftier materials cost. (E) We measure the diﬀering slopes of the semi-log plot of the rate-distortion gradients asdiﬀering compression eﬃciency . Then, we interpret variation across individuals in the language of rate-distortion theory.For example, consider two brain networks functioning at the same low level of distortion. The brain network with theﬂatter slope between resources and distortion has greater compression eﬃciency because the network architecture confersresource discounts. In comparison, the brain network with a steeper slope has reduced compression eﬃciency because thenetwork architecture pays a premium for the same expected ﬁdelity. (F)

When we consider variation across individuals,we ﬁnd that compression eﬃciency decreases with age and diﬀers by sex (Supplementary Figure 4; F = 27 . , estimated df = 2 . , p < . ), suggesting that neurodevelopment places a premium on high-ﬁdelity network communication.

11o evaluate the roles of resource eﬃciency and rate-distortion theory in the brain, we assess ﬁve previ-ously published predictions of rate-distortion theory and information diﬀusion (Figure 1C) [25, 29]. Theﬁrst prediction of rate-distortion theory is that communication systems should produce an information ratethat is an exponential function of distortion. Moreover, artiﬁcial networks should be governed by the sameinformation-theoretic rules as empirical networks. To test this prediction, we computed the resource ef-ﬁciency of each individual, with the probability of diﬀusion along the shortest path ranging from 10% to99.9% (Figure 4D). We designed artiﬁcial communication systems as Erd¨os-Reny´ı random networks (Method7.6), which predominantly transfer information by short paths [33]. We observed an exponential gradientin individual brain networks and the Erd¨os-Reny´ı random networks, consistent with the ﬁrst prediction ofrate-distortion theory. Furthermore, the random networks, which are composed of more short connectionsthan empirical brain networks, incurred a decreased resource cost compared to empirical brain networks(Figure 4D and Supplementary Figure 4; F = 10 × , df = 29120 , p < × − ), consistent with theintuition that a greater prevalence of short connections in the random network translates to greater like-lihood of shortest path diﬀusion [33]. Rate-distortion trade-oﬀs vary as a function of age and sex, whereindividual diﬀerences in compression eﬃciency (Figure 4E) were negatively correlated with age ( F = 27 . df = 2 . p < . F = 6 × , df = 29120, all p -values corrected using the Holm-Bonferroni method for family-wise error rate; p < .

05 at 60% distortion, p < .

01 at 50% distortion, and p < .

001 at distortion less than or equal to40%). The diﬀerences arise from reduced resource requirements introduced by additional information fromregional CBF (attract: t = 20 . df = 1993 . p < . t = 22 . df = 1970 . , p < . igure 5: Metabolic chemotaxis supports eﬃcient coding. (A)

Chemotactic diﬀusion was modeled as a biased randomwalk with transition probabilities modiﬁed by regional CBF. A biased random walk attracting diﬀusing signals to regionsof high CBF was modeled by constructing a biased structural connectivity matrix. The weight of an edge between region i and region j was given by the average, normalized CBF between pairs of brain regions multiplied by the weight of thestructural connection. For biased random walks repelling diﬀusing signals from regions of high CBF, the weight of anedge was given by (1 minus the normalized inter-regional CBF) multiplied by the weight of the structural connection. (B) The second prediction of rate-distortion theory is that providing additional chemotactic information to the informationprocessing system will reduce information cost. Speciﬁcally, we hypothesized that diﬀusion along metabolic gradientswould increase compression eﬃciency. The alluvial diagram depicts our expected results: if metabolism acts as a mediumsupporting eﬃcient coding, then we should observe reduced minimum resources in chemotactic diﬀusion compared tounbiased diﬀusion. (C)

The variation of predicted minimum resources required for distortion levels less than 60% is explainedby the interaction of the distortion level with the type of chemotactic diﬀusion (*** : p < . ), supporting the secondprediction of rate-distortion theory. According to the third prediction of rate-distortion theory, we should expect asymmetriesabout the rate-distortion gradient where there are asymmetric costs of error. Speciﬁcally, the cost of error should begreater during neural dynamics requiring high-ﬁdelity communication compared to low-ﬁdelity communication. The levelof distortion and type of random walk fully explain variance in resource eﬃciency ( R = 0 . , F = 6 × , df = 29120 , all p -values corrected using the Holm-Bonferroni method for family-wise error rate; ∗ : p < . , ∗∗ : p < . , ∗∗∗ : p < . ).Across individual brain networks, the resources required for 0.1% distortion are greater than predicted by the rate-distortiongradient, reﬂecting a premium placed on very high ﬁdelity signaling. At greater levels of distortion, the resources required areless than predicted by the rate-distortion gradient. The premium cost of high-ﬁdelity communication compared to discountsof low-ﬁdelity communication supports the third prediction of rate-distortion theory. (D) In agreement with our hypothesisthat chemotaxis supports eﬃcient coding, the number of required resources decreased in chemotactic diﬀusion comparedto unbiased diﬀusion ( t unbiased,attract = 20 . , df = 1993 . ; t unbiased,repulse = 22 . , df = 1970 . ; ***: p < . ). Figure 6:

Compression eﬃciency constrains brain network complexity. (A)

High versus low ﬁdelity communicationregimes make diﬀerent predictions regarding the minimum resources required for an expected distortion and system com-plexity. We deﬁne system complexity as the network size, or the number of functionally and cytoarchitectonically distinctmodules. In the high ﬁdelity regime, the brain should place a premium on high ﬁdelity signaling by allocating a mono-tonically increasing information rate to achieve low distortion despite increasing system complexity. In the low ﬁdelityregime, the brain should place a premium on lossy compression by asymptotically capping the information rate despiteincreasing system complexity. (B)

Resources scale monotonically with network size, consistent with the prediction of ahigh ﬁdelity regime. Individual brain networks were reparcellated using the Lausanne brain atlases with 83, 129, 234, 463,and 1,015 parcels, respectively. Together, the monotonically increasing resource demand highlights a trade-oﬀ betweenhigh ﬁdelity communication and network complexity, indicating an additional evolutionary constraint on brain network sizeand complexity. (C)

Shortest path complexity is the average number of nodes comprising path transitivity. Resources scaleasymptotically with inter-individual diﬀerences in path transitivity, consistent with a low-ﬁdelity regime for lossy compres-sion and storage savings. The non-linear model outperforms a linear model (non-linear

AIC = 7902 , linear

AIC = 7915 ;non-linear

BIC = 7964 , linear

BIC = 7968 ). Lossy compression of a signal by diﬀering densities of path transitivity isperhaps akin to image compression by pixel resolution. Reduced path transitivity, or pixel resolution, tends to result inmore compression, while greater path transitivity supports greater ﬁdelity.

In addition to high-ﬁdelity communication, a ﬂexible system of communication may also transfer informationin a low-ﬁdelity regime to restrict information rates in noisy environments. We sought to investigate theproperties of network architecture that support lossy compression, consistent with predictions of a low ﬁdelity15egime. If the shortest path represents the structure supporting highest ﬁdelity, then we hypothesized thatpath transitivity (Figure 3A), as longer approximations of shortest paths, supports lossy compression and alow ﬁdelity regime. To remain consistent with the method of existing predictions [27], the complexity of theshortest path was deﬁned as the number of nodes comprising the local detours re-accessing the shortest pathin the measure of path transitivity. We found that the number of resources begins to plateau non-linearlyas a function of shortest path complexity, consistent with a low-ﬁdelity regime (Figure 6C). Model selectioncriteria support the non-linear form compared to a linear version of the same model (non-linear

AIC = 7902,linear

AIC = 7915; non-linear

BIC = 7964, linear

BIC = 7968). The non-linear ﬁt of these data suggestthat path transitivity supports neural communication that is tolerant to noise, consistent with the conceptionof path transitivity as local detours from the highest-ﬁdelity shortest path.

Motivated by our ﬁndings corroborating the validity of compression eﬃciency and that neurodevelopmentplaces a premium on ﬁdelity (Figure 4F), we sought to understand the association between compressioneﬃciency and evolutionary properties of cortical organization. Evolutionarily new connections may supporthigher-order and ﬂexible information processing [4], emerging from disproportionate expansion of the as-sociation cortex. The association cortex also contains reduced cortical myelin compared to sensorimotorcortices [34], which promotes eﬃcient transmission and propagation speed while preserving communicationﬁdelity [2]. To explore how compression eﬃciency relates to cortical areal expansion and myelination, weused published maps of cortical myelination (estimated using published maps of T2/T1w MRI measureswith histological validation [34]) and areal scaling (estimated as allometric scaling coeﬃcients deﬁned bythe non-linear ratios of surface area change to total brain size change over development; Figure 7B). Tostudy the compression eﬃciency of brain regions sending or receiving messages, we computed the send andreceive compression eﬃciency of brain regions (Figure 7A; Method 7.5.5). We found that brain regions withgreater sender compression eﬃciency tend to have greater myelin content (Figure 7C; r = 0 . df = 358, p SPIN, Holm-Bonferroni = 0 . r = − . df = 358, p SPIN, Holm-Bonferroni = 0 . r = 0 . df = 358, p SPIN, Holm-Bonferroni = 0 . igure 7: Adaptive advantages of compression eﬃciency align with patterns of cortical myelination and areal scaling.(A)

Sender compression eﬃciency diﬀers regionally across the cortex and describes the number of diﬀusing messagesrequired to transmit information with speciﬁed signal ﬁdelity (Supplementary Figure 8). Receiver compression eﬃciencydescribes the number of messages required to receive information with an expected signal ﬁdelity. Regional values wereaveraged across individuals. (B)

Brain regions diﬀer in myelination and non-linear spatial scaling ratios of surface areachange to total brain size change over development. Regional values were obtained from published maps [34, 7]. (C)

Corticalregions with greater levels of myelin tend to have greater sender compression eﬃciency ( r = 0 . , p SPIN, Holm-Bonferroni =0 . ), consistent with myelin reducing conduction delay and promoting the eﬃcient trade-oﬀ between signal rate andﬁdelity to reduce the transmission rate while preserving ﬁdelity. Cortical areal scaling in neurodevelopment reﬂects patternsof evolutionary remodeling. Brain regions that have higher sender compression eﬃciency tend to disproportionately expandin relation to total brain size during neurodevelopment ( r = 0 . , df = 358 , p SPIN, Holm-Bonferroni = 0 . ). We observedthat cortical regions with the lowest receiver compression eﬃciency, placing a premium on information processing ﬁdelity,tend to disproportionately expand in relation to whole brain growth ( r = − . , df = 358 , p SPIN, Holm-Bonferroni = 0 . ).Positively scaling regions that prioritize compression-eﬃcient broadcasting of messages arriving with high ﬁdelity may reﬂectevolutionary expansion of brain regions with high information processing capacity, whereas negatively scaling regions thatprioritize high-ﬁdelity broadcasting of compressed messages may permit other modes of material, spatial, and metaboliccost eﬃciency. The ﬁfth and ﬁnal hypothesis of our model posits that the structural hubs of the brain’s highly interconnectedrich club supports information integration of diﬀusing signals [29]. To explain the hypothesized informationintegration roles of rich-club structural hubs, we investigated the compression eﬃciency of messages diﬀusinginto and out of hub regions compared to that of other regions. In order to identify the rich-club hubs, wecomputed the normalized rich-club coeﬃcient and identiﬁed 43 highly interconnected structural hubs (Figure8A). Next, we computed the send and receive compression eﬃciency of rich-club hubs compared to all otherregions. In support of their hypothesized function, we found that the rich-club hubs tend to receive reducedrates of messages compared to other regions (Wilcox rank sum test, W = 12829 , p < . W = 64 , p < . igure 8: Compression eﬃciency explains the integrative role of rich-club hubs and individual diﬀerences in cognitiveeﬃciency. (A)

The rich club consists of highly interconnected structural hubs, thought to be the backbone of the brainconnectome. Across participants, the rich club was comprised of 43 highly connected brain regions. (B)

The receivercompression eﬃciency of rich clubs was greater than other brain regions ( W = 12829 , p < . ), suggesting that theorganization of the structural network prioritizes integration and compression of information arriving to rich-club hubs.The sender compression eﬃciency of rich clubs was reduced compared to other brain regions, supporting the ﬁdelity ofinformation broadcasting ( W = 64 , p < . ). (C) Individuals having non-rich-club regions with decreased compressioneﬃciency tended to exhibit greater eﬃciency (speed-to-accuracy trade-oﬀ) of complex reasoning (all p -values corrected usingthe Holm-Bonferroni method; t = − . , estimated model df = 10 . , p = 2 × − ), executive function ( t = − . ,estimated model df = 10 . , p = 0 . ), and social cognition ( t = − . , estimated model df = 10 . , p = 0 . ),controlling for age, sex, age-by-sex interaction, degree, density, and in-scanner motion. Non-rich-club compression eﬃciencywas not associated with memory eﬃciency ( t = − . , estimated model df = 9 . , p = 0 . ). Individuals with less rich-club compression eﬃciency tended to exhibit greater eﬃciency of complex reasoning ( p = 1 × − ), memory eﬃciency( p = 0 . ), social cognition ( p = 0 . ), and executive function ( p = 0 . ). Overall, individuals with brain structuralnetworks prioritizing ﬁdelity tended to perform with greater accuracy and/or speed in various cognitive functions. Asterisksdepict signiﬁcance following family-wise error correction. (D) Individual diﬀerences in complex reasoning eﬃciency werenegatively associated with individual diﬀerences in compression eﬃciency ( t = − . , estimated model df = 11 . , p < . ), controlling for global eﬃciency, age, sex, age-by-sex interaction, degree, density, and in-scanner motion. SeeSupplementary Figure 9 for additional scatterplots of the correlation between cognitive eﬃciency and compression eﬃciency.For comparison to a commonly used metric of shortest-path information integration, we include global eﬃciency in themodel, which was positively correlated with complex reasoning eﬃciency ( t = 2 . , estimated model df = 11 . , p < . ).Global eﬃciency was not partially correlated with compression eﬃciency ( r = − . , df = 997 , p = 0 . ), controlling forage, sex, age-by-sex interaction, degree, density, and motion. Hence, reduced compression eﬃciency prioritizing ﬁdelityexplains individual diﬀerences in complex reasoning eﬃciency. p -values corrected usingthe Holm-Bonferroni family-wise error method; t = − .

72, estimated model df = 10 . p = 2 × − ),memory ( t = − .

60, estimated model df = 9 . p = 0 . t = − .

80, estimated model df = 10 . p = 0 . t = − .

55, estimated model df = 10 . p = 0 . t = − .

72, estimated model df = 10 . p = 2 × − ), executivefunction ( t = − .

85, estimated model df = 10 . p = 0 . t = − .

30, estimatedmodel df = 10 . p = 0 . t = − .

95, estimated model df = 11 . p < . t = 2 . df = 11 . p < . To constrain the expansive theoretical space of communication models, we investigated how principles ofevolutionary eﬃciency constrain models of brain network communication [18, 2, 36, 37, 3, 12, 5, 38, 15, 16].Speciﬁcally, we considered the brain structural connectome as a capacity-limited information channel per-forming lossy compression. We found metabolic expenditure correlated with structural signatures indicativeof diﬀusion models, but not shortest path routing [3]. In developing an eﬃcient diﬀusion model of com-munication, we introduced the notion of compression eﬃciency, which describes the prioritization of eithercommunication ﬁdelity or lossy compression in structural networks. Five predictions of rate-distortion theoryand information diﬀusion adapted from prior literature corroborated our ﬁndings, supporting the validity of20ompression eﬃciency [29, 27, 25]. Broadly, our work advances the study of brain network communicationeﬃciency, information integration, and neural noise by reframing brain network communication as diﬀusingmessages governed by rate-distortion and eﬃcient coding theories.Shortest path routing as a model of brain network communication was not supported by our data, consistentwith the common acknowledgment that it is infeasible to expect a signal to have global knowledge of networkstructure to compute shortest paths [3, 5, 15, 16]. Rather, our observations agreed with the hypotheses thatresult from information diﬀusion along structural paths. We found that individuals whose brains are struc-tured with high-integrity paths tended to have reduced metabolic cost, joining similar prior reports [34]. Inour investigation of multiple trade-oﬀs between network structure and metabolic cost, we discovered thatbrain metabolism reached a critical point as a function of path transitivity and modularity. Speciﬁcally, weobserved a saddle point, where network structure was coupled or decoupled. In the decoupled axis, brainnetworks organized with, for example, high modularity and low path transitivity tended to exhibit optimalmetabolic savings. In the coupled axis where increases in modularity are linked with increases in path transi-tivity, brain networks can achieve locally optimal metabolic savings around the average brain network, wheredeviations incur metabolic costs. Brain networks may reconﬁgure to place premiums on network structuresthought to support functional versatility and resilience at the expense of cost eﬃciency, or vice versa [3, 16].Turning from shortest-path based measures, such as global eﬃciency and betweenness centrality, our ﬁndingsmotivate future studies of information integration in structural brain networks that instead adopt metrics ofthe structural signatures and processes of information diﬀusion that emphasize neuroanatomically speciﬁcprocesses [39, 15, 16]. For example, we found that chemotactic diﬀusion along metabolic gradients sup-ports eﬃcient coding by enhancing compression eﬃciency, oﬀering a potential biological medium for greedynavigation by shortest spatial distances [16]. Chemotactic attraction models the increased neural activityassociated with metabolic expenditure [31], whereas repulsion models information bottlenecks redirectingﬂow away from congestion to less metabolically costly routes [14]. Furthermore, we found that compressioneﬃciency was associated with cognitive eﬃciency above and beyond the contributions of global eﬃciency;the latter having been previously reported to explain variation in ﬂuid intelligence [30]. Global eﬃciencyremains a useful metric of local connection strength, pairwise wiring cost trade-oﬀs, and shortest path struc-ture accelerating diﬀusion [36, 3, 14], but falls short of explaining processes of information integration.21e oﬀer an explanation of integrative processes arising from the connectivity of rich-club hubs. By mea-suring asymmetric send-receive message diﬀusion [40, 41] and modeling transmission rate as a function ofexpected ﬁdelity, we showed that hubs are compression-eﬃcient receivers and high-ﬁdelity senders. Thisﬁnding adds to the understanding of hubs as sources and sinks for the early spreading of diﬀusing signals[14]. Rich-club hubs develop early and underpin information integration and broadcasting, possibly oﬀset-ting high metabolic, spatial, and material costs [29, 42, 9, 41]. However, an adaptationist explanation assuch is challenging to falsify [38]. We introduced compression eﬃciency in the context of eﬃcient codingto reframe explanations of evolutionary adaptation in terms of a heritable capacity to develop hubs underconstraints of whole-brain network eﬃciency [38, 41]. Prior ﬁndings of greater metabolic costs in rich-clubregions were exploratory [29] and conﬂict with research supporting the metabolic eﬃciency of the myeli-nated long-distance connections prevalent in the rich-club [3, 34, 6]. Despite our well-powered analysis andreplication of several other ﬁndings (Supplementary Figures 2, 6, and 10) [24, 32, 13, 22], we were unableto replicate observations of high metabolic costs in the rich-club (Supplementary Figure 11). Although fur-ther investigation of the metabolic costs of rich-club hubs is warranted, our ﬁndings nevertheless reinforcea wealth of evidence emphasizing the importance of the development, resilience, and function of hubs incognition and psychopathology [29, 9, 31, 14, 6, 10].With the objective of eﬃciency, developmental processes may balance compression eﬃciency, cortical scal-ing, and myelination to adapt to diﬀering environments. Cortical areal scaling highlights the problem ofallocating limited materials, space, and metabolic resources to the disproportionate changes in surface areaof brain regions in relation to total brain size [7]. Brain regions that prioritize high-ﬁdelity broadcasting ofcompressed messages may save space, materials, and metabolic resources with decreased scaling in propor-tion to the growth of the whole brain. For example, we found this property of compression-eﬃcient inputsand high-ﬁdelity outputs in rich-club hubs, which appear in an adult conﬁguration at birth [43]. In contrast,brain regions prioritizing compression-eﬃcient broadcasting of high-ﬁdelity messages tended to dispropor-tionately expand, and brain networks prioritizing communication ﬁdelity tended to support greater cognitiveeﬃciency. These novel ﬁndings converge with theories positing that evolutionarily new connections supporthigher-order and ﬂexible information processing [4, 34, 7], and that plastic white matter microarchitecturesupports reasoning ability and speed [44]. Indeed, we found monotonically increasing information processingcosts and capacity with greater network complexity. In addition to spatial scaling, developing brain networksmay use myelination to modify connection strengths and eﬃciency [45, 34]. We found that brain regions22ith greater myelination tended to have greater sender compression eﬃciency, consistent with evidence thatmyelin promotes propagation speed and eﬃcient transmission rates while preserving communication ﬁdelity[2, 34]. The objective of eﬃcient coding in brain networks can be achieved by balancing communication ﬁ-delity and lossy compression in developmentally plastic brain networks and rich-club hubs [37, 32, 42, 6, 46].We suggest that compression eﬃciency may represent an information processing constraint on brain size andcomplexity. Brain systems viewed as information processors exhibit recurring compromises between infor-mation eﬃciency and other resource costs at the cellular [18, 2] and circuit levels of the brain [47, 19]. At theneuronal level, an optimal strategy for distributed coding is to reduce population size while distributing ac-tivity among a fraction of cells [18, 2]. Brain networks may reach a similar compromise through informationprocessing constraints on complexity (i.e. size) of the network and its modules, and increasing the number ofendogenously active components, such as in the default-mode system. Eﬃcient coding predicts that bit ratevaries as a function of the number and redundancy of synapses [2, 27]. Transmitting the same message acrossmany parallel paths improves ﬁdelity and increases bit rates, but information rate increases sublinearly withthe number of paths because the system is highly redundant, incurring greater metabolic costs [2]. We sim-ilarly found that individuals with greater path transitivity—more redundant and lossy alternative paths tothe shortest, direct paths—tended to require sublinearly increasing computational costs and tended to havegreater global metabolic expenditure. Taken together, our eﬃcient diﬀusion model addresses the notableabsence of biologically plausible and eﬃcient inter-regional brain network communication models [3, 16].Our work admits several theoretical and methodological limitations. First, regionally aggregated brain sig-nals are not discrete Markovian messages and do not have goals like reaching speciﬁc targets. As in recentwork, our model introduced a deliberately simpliﬁed but useful abstraction of macro-scale brain networkcommunication [14]. Second, although we used resource eﬃciency in light of prior methodological decisionsand information theory benchmarks [12], compression eﬃciency can be implemented using alternative ap-proaches (Supplementary Figure 12). Several methodological limitations should also be considered. Theaccurate reconstruction of white-matter pathways using DTI and tractography remains limited [48]. More-over, non-invasive measurements of CBF with high sensitivity and spatial resolution remain challenging. Weacquired images using an ASL sequence providing greater sensitivity and approximately four times higherspatial resolution than prior developmental studies of CBF [32]. Lastly, our data was cross-sectional, limitingthe inferences that we could draw about neurodevelopmental processes.23n summary, our study advances understanding of the adaptive trade-oﬀs in brain metabolism and architec-ture that support eﬃcient diﬀusion processes. In addition to advancing the biophysical realism of informationtransmission, our information-theoretical model naturally admits future applications to measurements of en-tropy, which have provided insight into information ﬂow of brain activity [47, 39]. Our model may be appliedto study neural circuits of Bayesian integration in brain networks, as rate-distortion models of perceptionand cognition have been suggested as extensions of conventional Bayesian approaches [25]. Moreover, ourwork distinguishing a low- and high-ﬁdelity regime suggests our framework could be used to investigate dual-system models of information processing bounded by resource and capacity limitations that characterize fastbut error-prone versus slow but deliberate regimes [49]. In the complementary learning systems theory,we posit that the hippocampus acts as a hub in plastic cortical networks which pass, distort, and recon-struct compressed signals [50]. Compression eﬃciency of hippocampal and sensory pathways should predictthe speed, accuracy, and eﬃcient cognitive coding of high-dimensional visuospatial stimuli in sensorimotorlearning [25, 19]. Such studies could illuminate how the representational structure of information drives theselective loss of redundant or core information in convolutional feedforward network models of sensorimotorinformation processing where triangular structural motifs of path transitivity resemble feedforward loops[13, 19]. Lastly, our ﬁndings invite further development and application of well-studied information routingmodels and coding schemes to brain network communication [26, 51, 40]. The compression eﬃciency modelis a useful starting point for the development of more sophisticated approaches of eﬃcient systems-levelinformation transfer, and is also a novel tool to test leading hypotheses of dysconnectivity [8], hubopathy[9, 10], disrupted information integration [52], and neural noise [53] in neuropsychiatric disorders.

D.Z wrote the paper. C.W.L., T.D.S., and D.S.B. edited the paper. D.Z. developed the theory with inputfrom C.W.L, T.D.S., and D.S.B. R.C., G.L.B., and Z.C. preprocessed the data. D.Z. performed the analysiswith input from Z.C. T.M.M. performed data preprocessing and interpretation of statistical models. D.R.preprocessed the data.. J.D. developed imaging acquisition methods. R.E.G. acquired funding for datacollection and performed data collection. R.C.G. provided expertise in cognitive phenotyping. D.Z., T.D.S,and D.S.B. designed the study. D.S.B. and T.D.S. acquired funding to support theory development and dataanalysis, and contributed to theory and data interpretation.24

Acknowledgments

We acknowledge helpful discussions with Richard Betzel, David Lydon-Staley, Lorenzo Caciagli, Adon Rosen,and Bart Larsen. The work was largely supported by the John D. and Catherine T. MacArthur Foundation,the ISI Foundation, the Paul G. Allen Family Foundation, the Alfred P. Sloan Foundation, the NSF CAREERaward PHY-1554488, NIH R01MH113550, NIH R01MH112847, and NIH R21MH106799. Secondary supportwas also provided by the Army Research Oﬃce (Bassett-W911NF-14-1-0679, Grafton-W911NF-16-1-0474)and the Army Research Laboratory (W911NF-10-2-0022). The content is solely the responsibility of theauthors and does not necessarily represent the oﬃcial views of any of the funding agencies.

The authors declare that they have no competing interests.

As described in detail elsewhere [23], diﬀusion tensor imaging (DTI) and arterial-spin labeling (ASL) datawere acquired for the Philadelphia Neurodevelopmental Cohort (PNC), a large community-based study ofneurodevelopment. The subjects used in this paper are a subset of the 1,601 subjects who completed thecross-sectional imaging protocol. We excluded participants with health-related exclusionary criteria (n=154)and with scans that failed a rigorous quality assurance protocol for DTI (n=162) [54]. We further excludedsubjects with incomplete or poor ASL and ﬁeld map scans (n=60). Finally, participants with poor qualityT1-weighted anatomical reconstructions (n=10) were removed from the sample. The ﬁnal sample contained1042 subjects (mean age=15.35, SD=3.38 years; 467 males, 575 females). Study procedures were approvedby the Institutional Review Board of the Children’s Hospital of Philadelphia and the University of Penn-sylvania. All adult participants provided informed consent; all minors provided assent and their parent orguardian provided informed consent. 25 .2 Cognitive Assessment

All participants were asked to complete the Penn Computerized Neurocognitive Battery (CNB). The batteryconsists of 14 tests adapted from tasks typically applied in functional neuroimaging, and which measure cog-nitive performance in four broad domains [23]. The domains included: (1) executive control (i.e., abstractionand ﬂexibility, attention, and working memory), (2) episodic memory (i.e., verbal, facial, and spatial), (3)complex cognition (i.e., verbal reasoning, nonverbal reasoning, and spatial processing), (4) social cognition(i.e., emotion identiﬁcation, emotion intensity diﬀerentiation, and age diﬀerentiation), and (5) sensorimotorand motor speed. Performance was operationalized as z -transformed accuracy and speed. The speed scoreswere multiplied by − z -scores. The eﬃciency scores were then z -transformed again, toachieve mean = 0 and SD = 1 . Neuroimaging acquisition and pre-processing were as previously described [23]. We depict the overall work-ﬂow of the neuroimaging and network extraction pipeline in Figure 1A.

As was previously described [22, 56], DTI data and all other MRI data were acquired on the same 3T SiemensTim Trio whole-body scanner and 32-channel head coil at the Hospital of the University of Pennsylvania.DTI scans were obtained using a twice-focused spin-echo (TRSE) single-shot EPI sequence (TR = 8100ms, TE = 82 ms, FOV = 240 mm /240 mm ; Matrix = RL: 128/AP:128/Slices:70, in-plane resolution (x& y) 1.875 mm ; slice thickness = 2 mm, gap = 0; FlipAngle = 90 ◦ /180 ◦ /180 ◦ , volumes = 71, GRAPPAfactor = 3, bandwidth = 2170 Hz/pixel, PE direction = AP). The sequence employs a four-lobed diﬀusionencoding gradient scheme combined with a 90-180-180 spin-echo sequence designed to minimize eddy currentartifacts. The complete sequence consisted of 64 diﬀusion-weighted directions with b = 1000 s/mm and 7interspersed scans where b = 0 s/mm . Scan time was about 11 min. The imaging volume was prescribed inaxial orientation covering the entire cerebrum with the topmost slice just superior to the apex of the brain[54]. 26 .3.2 Connectome construction Cortical gray matter was parcellated according to the Glasser atlas [57], deﬁning 360 brain regions as nodesfor each subject’s structural brain network, denoted as the weighted adjacency matrix A . To assess multiplespatial scales, cortical and subcortical gray matter was parcellated according to the Lausanne atlas [58].Together, 89, 129, 234, 463, and 1015 dilated brain regions deﬁned the nodes for each subject’s structuralbrain network in the analyses of Figure 6.DTI data was imported into DSI Studio software and the diﬀusion tensor was estimated at each voxel [59].For deterministic tractography, whole-brain ﬁber tracking was implemented for each subject in DSI Stu-dio using a modiﬁed ﬁber assessment by continuous tracking (FACT) algorithm with Euler interpolation,initiating 1,000,000 streamlines after removing all streamlines with length less than 10mm or greater than400mm. Fiber tracking was performed with an angular threshold of 45, a step size of 0.9375mm, and afractional anisotropy (FA) threshold determined empirically by Otzu’s method, which optimizes the contrastbetween foreground and background [59]. FA was calculated along the path of each reconstructed stream-line. For each subject, edges of the structural network were deﬁned where at least one streamline connecteda pair of nodes. Edge weights were deﬁned by the average FA along streamlines connecting any pair of nodes. CBF was quantiﬁed from control-label pairs using ASLtbx [60], as was previously described [32]. We consider f as CBF, δM as the diﬀerence of the signal between the control and label acquisitions, R a as the longitudinalrelaxation rate of blood, τ as the labeling time, ω as the post-labeling delay time, α as the labeling eﬃciency, λ as the blood/tissue water partition coeﬃcient, and M as the approximated control image intensity.Together, CBF f can be calculated according to the equation: f = ∆ M λR a exp ( ωR a )2 M α [1 − exp ( − τ R a )] − . (1)Because prior work has shown that the T1 relaxation time changes substantially in development and variesby sex, this parameter was set according to previously established methods, which enhance CBF estimationaccuracy and reliability in pediatric populations [61, 62].27 .4 Brain Maps As described previously [63], cortical myelin content was calculated by dividing the T1w image signal by theT2w image signal. Speciﬁcally, we deﬁne the myelin content x in the following manner:T1wT2w ≈ x ∗ b (1 /x ) ∗ b = x , (2)where x is the myelin contrast in the T1w image, 1 /x is the myelin contrast in the T2w, and b is the receivebias ﬁeld in both T1w and T2w images. We used a published atlas generated by this method [34]. As described previously [7], to estimate cortical areal scaling between the size of cortical regions and the totalbrain, regression coeﬃcients β were estimated for log (total cortical surface area) as a covariate predictinglog (vertex area) using spline regression models that incorporated eﬀects of age and sex on vertex area [64].We used the following relational form:log 10(Vertex area) ∼ s(age by = sex) + B I [log 10 (total − area)] . (3)When β is 1, the scaling between total brain size and brain regions is linear. When β deviates greater orless than 1, scaling is non-linearly and disproportionately expanding or contracting. We used the publishedatlas generated using the same data as in our study [7, 23]. In the context of the brain structural connectome, global eﬃciency represents the strength of the shortestpaths between brain regions supporting eﬃcient communication. In network neuroscience, global eﬃciency iscommonly used as a metric of a brain network’s capacity for shortest path routing [3, 12, 16]. We calculatedthe common global eﬃciency statistic [33], which is deﬁned for a graph G as: E glob ( G ) = 1 N ( N − (cid:88) i (cid:54) = j(cid:15)G d ij , (4)28here N is the number of nodes and d ij is the shortest distance between node i and node j . Intuitively, ahigh E value indicates greater potential capacity for global and parallel information exchange along shortestpaths, and a low E value indicates decreased capacity for such information exchange [33]. Beyond shortest paths between pairs of brain regions, we also sought to measure the strength of structuralconnections S comprising the paths of multiple connections. As global eﬃciency measures the capacity ofbrain networks for shortest path routing, path strengths measure the capacity for diﬀusion signaling. Pathstrengths are apt for assessing the network capacity for diﬀusion because paths can be represented as randomwalks p = ( i, j, . . . , k ), where p is a path and i , j , and k are nodes in the path. As in prior work [65], thestrength of the weighted connections in a path, denoted ω ( p ), in the graph G with adjacency matrix A isdeﬁned as: ω ( p ) = [ A ] i i [ A ] i i . . . [ A ] i ( l − i i , (5)where the matrix products produce the strengths of all possible random walks according to the length of p ,as depicted in the schematic Figure 1B. Then, for walks of length n , the strengths of the paths from node i to node j are deﬁned as: [ A n ] ij = (cid:88) p ∈P nij ω ( p ) , (6)where P nij is the set of all walks from node i to node j with length n . When n =1, the matrix exponent producesa matrix with elements equal to d ij from Equation 1, or the shortest distance between node i and node j .Intuitively, a high path strength represents structural paths that consist of higher integrity connectionsmeasured by DTI, whereas a low path strength indicates paths consisting of low integrity connections. Tocompute node strengths, the values for each node were summed. An average value was also calculated acrossnode strengths per individual participant. Shortest paths confer advantages in speed and signal ﬁdelity when messages are transmitted by diﬀusion.Therefore, we sought to measure a property of brain network architecture supporting diﬀusion by shortestpaths. Local detours which ﬁrst leave and then re-access the shortest path serve to support such diﬀusion,and the potential for such local detours can be estimated using a measure called path transitivity (see Figure3A, left) [13]. Path transitivity was previously used to predict functional BOLD activation comparably to29onventional distance or computational models of neural dynamics. To compute path transitivity, we ﬁrstcalculated the matching index for each pair of successive nodes i and j along the shortest path π s → t , withneighboring non-shortest path nodes k as: m ij = (cid:80) k (cid:54) = i,j ( w ik + w jk ) Θ ( w ik ) Θ ( w jk ) (cid:80) k (cid:54) = j w ik + (cid:80) k (cid:54) = i w jk , (7)where w is the connection weight, and Θ( w ik ) = 1 if w ik >

0, and 0 otherwise. Intuitively, the numeratoris non-zero if and only if there are two locally detouring connections that make a closed triangle along theshortest path. If either of the two connections w ik or w jk does not exist, then the numerator is 0. With thedenominator representing the strength of all cumulative connections of the shortest path nodes, the matchingindex fraction then represents the density of closed triangles (i.e., transitivity) around the shortest path.Whereas the matching index is a pairwise measure of the density of locally returning detours, path transitivitygeneralizes the density across the shortest path. Using the computed matching index m ij for each pairwiseconnection Ω from source node s to target node t by the set of shortest path edges π s → t , we compute pathtransitivity M as: M ( π s → t ) = 2 (cid:80) i ∈ Ω (cid:80) j ∈ Ω m ij | Ω | ( | Ω | − , (8)where the numerator sums the matching index m ij for all edges in Ω, the scale factor of 2 indicates anundirected graph, and the denominator sums over all possible edges. Intuitively, a high path transitivity M indicates that the shortest path is more densely encompassed by locally detouring triangular motifs. Lowpath transitivity indicates that the shortest path is surrounded by connections that deviate from the shortestpath without an immediate avenue of return. Modularity is a common architectural feature observed in neural systems across species. A single communitycontains brain regions that are more highly connected to each other than to brain regions located in othercommunities (see Figure 3A, right). Modularity of brain networks is spatially eﬃcient, supports the devel-opment of executive function in youths, and supports ﬂexibly adaptable functional activations according todistinct task demands [66, 67, 22, 68, 69]. To assess modularity, we apply a common community detectiontechnique known as modularity maximization [70], in which we used a Louvain-like locally greedy algorithm[71] to maximize a modularity quality function for the adjacency matrix A . The modularity quality function30s deﬁned as: Q = 12 µ (cid:88) ij ( A ij − γP ij ) δ ( g i , g j ) , (9)where µ = (cid:80) ij A ij denotes the total weight of A , A ij encodes the weight of an edge between node i andnode j in the structural connectivity matrix, P represents the expected strength of connections accordingto a speciﬁed null model [70], γ is a structural resolution parameter that determines the size of modules,and δ is the Kronecker function which is 1 if g i = g j and zero otherwise. As in prior work, we set γ to thedefault value of 1 [68]. Intuitively, a high Q value indicates that the structural connectivity matrix containscommunities, where nodes within a community are more densely connected to one another than expectedunder a null model. Modularity maximization is commonly used to detect community structure, and toquantitatively characterize that structure by assessing the strength and number of communities [22, 68, 70]. A signal that diﬀuses along the shortest path between brain regions confers advantages in speed, reliability,and ﬁdelity [3, 72, 16]. Following prior work, we sought to compute the number of random walkers beginningat node i that were required for at least one to travel along the shortest path to another node j withprobability η [12, 72]. To begin, we consider the transition probability matrix by U , deﬁned as U = WL − ,where each entry W ij of W describes the weight of the directed edge from node i to node j , and each entry L ii of the diagonal matrix L is the strength of each node i , deﬁned as (cid:80) i W ij . Intuitively, each entry U ij of U deﬁnes the probability of a random walker traveling from node i to node j in one step. Next, to computethe probability that a random walker travels from node i to node j along the shortest path, we deﬁne a newmatrix U (cid:48) ( i ) that is equivalent to U but with the non-diagonal elements of row i set to zero and U ii = 1 asan absorbant state. Then, the probability of randomly walking from i to j along the shortest path is givenby: 1 − N (cid:88) n =1 [ U (cid:48) ( i ) H ] in , (10)where H is the number of connections composing the shortest path from i to j . Similarly, the probability η of releasing r random walkers at node i and having at least one of them reach node j along the shortestpath is given by: 31 = 1 − (cid:32) N (cid:88) n =1 [ U (cid:48) ( i ) H ] in (cid:33) r . (11)Setting the above probability to some set value η , we can then solve for the number of random walkers r required to guarantee (with probability η ) that at least one of them travels from i to j along the shortestpath, denoted by: r ij ( η ) = log(1 − η )log (cid:16)(cid:80) Nn =1 [ U (cid:48) ( i ) H ] in (cid:17) . (12)We refer to the number of random walkers r ij as resources. In our analyses, we calculate resources r ij overa range of values of η for each participant. Finally, to calculate the resource eﬃciency of each participant,the resource eﬃciency of an entire network is taken to be 1 / ( r ij ( η )) averaged over all pairs of nodes i and j .With the right stochastic matrix U (cid:48) i , the resource eﬃciency of brain regions as message senders is 1 / ( r ij ( η ))averaged over i , while brain regions as message receivers is 1 / ( r ji ( η )) averaged over j . Rate-distortion theory formalizes the study of information transfer as passing signals (messages) through acapacity-limited information channel. A signal x is encoded as ˆ x with a level of distortion D that dependson the information rate R . The greater the rate, the less the distortion. The rate-distortion function R ( D ) deﬁnes the minimum information rate required to transmit a signal corresponding to a level of signaldistortion (see Figure 4A). Lossy compression arises from the choice of the distortion function d ( x, ˆ x ),which implicitly determines the relevant and irrelevant features of a signal. With the true signal x mappedto the compressed signal ˆ x described by p (ˆ x | x ), the rate-distortion function is deﬁned by minimizing themutual information of the signal and compression over the expected distortion deﬁned as d ( x, ˆ x ) p ( x, ˆ x ) = (cid:80) x ∈ X (cid:80) ˆ x ∈ ˆ X p ( x, ˆ x ) d ( x, ˆ x ): R ( D ) ≡ min d ( x, ˆ x ) I ( X, ˆ X ) = (cid:88) x ∈ Ω X (cid:88) ˆ x ∈ Ω ˆ X P (ˆ x | x ) P ( x ) log (cid:18) P (ˆ x | x ) P (ˆ x ) (cid:19) . (13)By minimizing the mutual information I ( X, ˆ X ), we arrive at a probabilistic map from the signal to thecompressed representation, where the information gain between the signal and compression is as small aspossible (i.e., high ﬁdelity) to favor the most compact representations.32imilar to the mathematical framework of rate-distortion theory, we sought to specify a distortion functionreﬂecting communication over the brain’s structural network. Prior work building models of perceptualand cognitive performance have inferred distortion functions through Bayesian inference of a loss function[73, 25]. For instance, the loss function could be the squared error denoting the residual values of the truesignal minus the compression, L = (ˆ x − x ) (Figure 4A). A neural rate-distortion theory has been theo-retically developed [27], but remains empirically untested due in part to a lack of methodological tools atthe level of brain systems. Moreover, it has been diﬃcult to deﬁne a distortion function that incorporatesboth true signals x and compressed signals ˆ x in part because the measurements of these signals in humanbrain networks remains challenging. Here, we deﬁne an analogous framework of information transfer throughcapacity-limited channels in the structural network of the brain. Particularly, we build a distortion functionfrom the simple intuition that the shortest path is the route that most reliably preserves signal ﬁdelity, asdepicted in (Figure 4B).Given that a random walker propagating from node i along the shortest path to node j retains the great-est signal ﬁdelity, we deﬁne the distortion function of any signal x from brain region i to a compressedrepresentation ˆ x decoded in brain region j as: d ( x, ˆ x ) ij = (1 − η ) , (14)where η denotes the probability that a walker gets from node i to node j along the shortest path. A signalwith greater probability η of propagating by the shortest path between brain region i and brain region j isat a lower risk of distortion (see Figure 4D). Intuitively, increased topological distance adds greater risk ofsignal distortion due to further transmission through capacity-limited channels (i.e. structural connections),temporal delay, and potential mixing with other signals. Given the measure of resources in Equation 12, wedevelop and test predictions of a novel deﬁnition of the rate R ( D ); here, we deﬁne R ( D ) as the resources r ij ( η ) required to achieve a tolerated level of distortion d ( x, ˆ x ) ij : R ( D ) ≡ r ij ( d ( x, ˆ x ) ij ) , (15)as in (Figure 4D). When the log of resources log( r ij ) is plotted against our metric of distortion D = d ∈ − η ij ,the exponential gradient is depicted linearly (see Figure 4E). Because prior work focused on 50% distortionduring analyses, we required the slope to intersect the mean midpoint rate at 50% distortion [12]. In addition33o the precedent oﬀered by prior work, this requirement is also reasonable given that we sought to modelboth high and low distortions equitably. The slope denotes the minimum number of resources requiredto achieve a tolerated level of distortion, which we refer to as the compression eﬃciency (4E; bottom).A steeper slope (i.e., a more negative relation) reﬂects reduced compression eﬃciency, or prioritization ofmessage ﬁdelity. A ﬂatter slope (i.e., a more positive relation) reﬂects increased compression eﬃciency, orprioritization of lossy compression. Individual variation in compression eﬃciency can be assessed by usingthe average resource eﬃciency across brain regions. When compression eﬃciency is computed for sets ofbrain regions by averaging across individuals, the slope can denote either messages sent from or arriving toa brain region by using the average resource eﬃciency over either all nodes j or all nodes i , respectively. Given the advantages of shortest path diﬀusion, we sought to assess how brain metabolism could support thereliability and ﬁdelity of signaling. Chemotactic diﬀusion can be modeled as random walks over a structuralconnectivity matrix biased by regional CBF [74]. To model chemotactic diﬀusion of random walkers attractedto or repelled from brain regions of high CBF, we used analytical solutions to biased random walks. First,we deﬁned the matrix T of CBF-biased transition probabilities as: T αij = α i A ij (cid:80) k α k A kj , (16)where the element of T ij deﬁnes the transition probabilities of a random walker traversing edges of thestructural connectivity matrix A which are multiplied by a bias term α . For random walkers attracted tobrain regions of high CBF, the bias term α was deﬁned as the average CBF value for each pair of brainregions. For random walkers repelled by regions of high CBF, the bias term α was deﬁned as 1 minusthe average CBF value for each pair of brain regions. Hence, a random walker propagates over the brain’sstructural connections with transition probabilities of T ij that reﬂect the integrity of structural connectionsand the average level of CBF between pairs of brain regions. We then substituted the U matrix in theresources r ij ( η ) of Equation 12 with T ij in Equation 16 to compute the number of resources required for abiased random walker to propagate by the shortest path with a speciﬁed probability.34 .5.8 Rich Club Due to the importance of brain network hubs in the broadcasting of a signal [3, 75, 16], we sought to identifythe set of high-degree brain regions in the rich club (see Figure 8A) [76]. To identify the subnetwork of richclub brain regions, we computed the weighted rich club coeﬃcient Φ z ( k ) as: φ z ( k ) = Z >k (cid:80) E>kl =1 z ranked l , (17)where Z ranked is a vector of ranked network weights, k is the degree, Z >k is the set of edges connecting thegroup of nodes with degree greater than k , and E >k is the number of edges connecting the group of nodeswith degree greater than k . Hence, the rich-club coeﬃcient Φ z ( k ) is the ratio between the set of edge weightsconnected to nodes with degree greater than k and the strongest E >k connections. The rich-club coeﬃcientwas normalized by comparison to the rich club coeﬃcient of random networks [76]. Random networks werecreated by rewiring the edges of each individual’s brain network while preserving the degree distribution.The rich-club coeﬃcient for the randomized networks Φ random ( k ) was computed using Equation 17. Then,the normalized rich-club coeﬃcient Φ norm ( k ) was calculated as follows: φ norm ( k ) = φ ( k ) φ random ( k ) , (18)where Φ norm ( k ) > norm ( k ) using a 1-sample t -test at each level of k , with family-wise error correction for multiple testsover k . Each individual was assigned the value of their highest degree > k rich club level and their nodeswere ranked by rich club level. Over the group of individuals, the nodal ranks were averaged and the top12% of nodes were selected as the rich club, following prior work [77]. Random graphs are commonly used in network science to test the statistical signiﬁcance of the role of somenetwork topology against null models. We used randomly rewired graphs generated by shuﬄing each indi-vidual’s empirical networks 20 times, as in prior work [78]. Furthermore, we generated Erd¨os-Reny´ı randomnetworks for each individual brain network where the presence or absence of an edge was generated by auniform probability calculated as the density of edges existing in the corresponding brain network. Edgeweights were randomly sampled from the edge weight distribution of the brain network. While the randomly35ewired graphs retain empirical properties such as the degree and edge weight distributions of the individualbrain networks, the Erd¨os-Reny´ı networks do not. Hence, the randomly rewired null network was used in allanalyses where the degree distribution should be retained (e.g., normalized rich club coeﬃcient), while theErd¨os-Reny´ı network was used in analyses assessing the overall contribution of the brain network topology(e.g., compression eﬃciency).Our tests using the randomly rewired network evaluate the null hypothesis that an apparent rich-clubproperty of brain networks is a trivial result of topology characteristic of random networks with someempirical properties preserved, as in prior work [76, 75]. The alternative hypothesis is that the brainnetwork has a rich-club organization beyond the level expected in the random networks. Our tests usingthe Erd¨os-Reny´ı network evaluate the null hypothesis that the rates in the rate-distortion function modelinginformation processing capacity in brain networks does not diﬀer from the rates in the rate-distortion functionof random networks. The alternative hypothesis is that the rate of the brain network’s rate-distortion functiondiﬀers from that of random networks, consistent with the notion that Erd¨os-Reny´ı networks have a greaterprevalence of shortest paths compared to brain networks. We additionally used the Erd¨os-Reny´ı network toassess the hypothesis of rate-distortion theory that synthetic networks should exhibit the same informationprocessing trade-oﬀs (the monotonic rate-distortion gradient) as empirical brain networks [25]. We selectedErd¨os-Reny´ı networks to assess these hypotheses for two reasons. First, Erd¨os-Reny´ı networks do not retaincore architectures of brain networks, such as modularity, and therefore reﬂect an extreme synthetic network.Second, Erd¨os-Reny´ı networks are commonly used as a benchmark for assessing shortest path prevalencedue to the prominence of uniformly distributed direct pairwise connections [39, 16]. In light of the centralassumption that shortest paths represent the route of highest signal ﬁdelity in our deﬁnition of distortion,we used Erd¨os-Reny´ı networks to verify our intuition that compression eﬃciency should be greater in theErd¨os-Reny´ı network than in brain networks.

To assess the covariation of our measurements across individuals and brain regions, we used generalizedadditive models (GAMs) with penalized splines. GAMs allow for statistically rigorous modeling of linearand non-linear eﬀects while minimizing over-ﬁtting [64]. Throughout, the potential for confounding eﬀectswas addressed in our model by including covariates for age, sex, age-by-sex interaction, network degree,network density, and in-scanner motion. 36 .7.1 Metabolic running costs associated with brain network architectures

We used penalized splines to estimate the nonlinear developmental patterns of global eﬃciency (Equation 4)and CBF, as in prior work [32, 22]. Then, we assessed the partial correlation between the residual variance(unexplained by covariates of age, sex, age-by-sex-interaction, degree, density, and motion) of global eﬃciencyand CBF. The ﬁnal models can be written as:Global eﬃciency ∼ spline(age) + sex + spline(age by sex) + degree + density + motion , (19)CBF ∼ spline(age) + sex + spline(age by sex) + degree + density + motion , (20)and Residual(Global eﬃciency) ∼ residual(CBF) . (21)To evaluate the importance of age as a confound for the relationship between global eﬃciency and CBF, wealso performed sensitivity analyses by removing selected covariates and re-assessing the model. In addition,for consistency with prior work [28], we performed the same analysis including covariates for gray mattervolume and density.To assess the relationship between CBF and the strength of structural paths supporting diﬀusion (Equation6), we again used penalized splines. The ﬁnal model can be written as:CBF ∼ path strength + spline(age) + sex + spline(age by sex) + degree + density + motion . (22)Assessments of path strengths were corrected for false discovery rate across the statistical tests performedover the discrete path lengths. Next, we sought to evaluate the metabolic running cost of brain network properties, in line with calls forinvestigation of the economic landscape of resource-constrained trade-oﬀs between hallmark brain networkarchitectures such as modularity (Equation 9) and new measures of brain network organization [3]. Followingour ﬁndings that CBF is associated with structural properties supporting diﬀusion, we investigated pathtransitivity (Equation 8). We continued to use penalized splines to model the non-linear patterns of CBF37nd brain properties of interest. The ﬁnal model can be written as:CBF ∼ path transitivity + modularity + path transitivity by modularity+ spline(age) + sex + spline(age by sex) + degree + density + motion . (23)To visualize the landscape of CBF as a function of modularity and path transitivity, we plotted the GAMmodel response function. We described the distribution of modularity and path transitivity across individualsusing frequency histograms. To assess the possibility of distinct compression eﬃciency of brain networks compared to random networks,we calculated the resource eﬃciency (Equation 12) at 14 levels of distortion and performed an analysis ofvariance (ANOVA) test. The ANOVA model can be written as:Resources ∼ distortion + type of network , (24)where the type of the network is a categorical variable designating if the network was a brain network or arandom network.To compute compression eﬃciency per individual brain network, we used a polynomial regression functionto ﬁnd the best linear ﬁt to the monotonic rate-distortion function according to the prediction of a linearrate-distortion gradient in semi-log space (log(resources) as a function of distortion). Next, we used a GAMmodel to assess the non-linear patterns of compression eﬃciency in development, which we can formallywrite as follows:Compression eﬃciency ∼ spline(age) + sex + spline(age by sex) + degree + density + motion . (25) To compute the compression eﬃciency of chemotactic diﬀusion, we modiﬁed the model of Equation 24 toinstead calculate resource eﬃciency using the biased random walk matrices from Equation 16. The model38an be written:Resources ∼ distortion + type of random walk + distortion by type of random walk , (26)where the type of random walk is a categorical variable designating unbiased random walks using the struc-tural network, attraction-biased random walks using the structural network biased with CBF, and repulsion-biased random walks using the structural network biased with (1 minus CBF). To assess the hypothesis thatresources diﬀer according to the type of random walk, we performed t -tests while controlling for family-wiseerror rate across multiple comparisons. Next, we sought to test the predictions of a high or low ﬁdelity communication regime. In a high ﬁdelityregime, minimum resources given an expected distortion should increase monotonically as a function ofnetwork complexity. To assess whether the relationship between resources and network complexity (opera-tionalized here as network size) is monotonic, we used a linear model written as:Log (resources) ∼ log (network size) . (27)In a low ﬁdelity regime, minimum resources given an expected distortion should plateau as a function ofcomplexity. We hypothesized that path transitivity is a property of structural networks that supports lossycompression and storage savings. The complexity of the shortest path was deﬁned as the number of nodescontributing to path transitivity (Equation 8). To assess whether the resources non-linearly plateau as afunction of shortest path complexity, we used a GAM model written as follows:Log (resources) ∼ spline(shortest path complexity) . (28) To explore how compression eﬃciency might relate to patterns of cortical myelination and areal scaling, weassessed the Spearman’s correlation coeﬃcient between myelination or scaling and send or receive compres-sion eﬃciency. To further test correspondence between brain maps, we used a spatial permutation test, whichgenerates a null distribution of randomly rotated brain maps that preserve the spatial covariance structure39f the original data [79]. We refer to the p -value of this statistical test as p SP IN . Finally, we applied theconservative Holm-Bonferroni correction for family-wise error across these tests.

Given the assumed integrative and broadcasting function of rich-club hubs, we sought to evaluate whethercompression eﬃciency diﬀered in rich-club hubs compared to other brain regions. We used the Wilcoxonrank-sum test to compare regional compression eﬃciency of either receiving or sending messages. Moreover,we assessed whether there was a diﬀerence between CBF in the rich-club hubs compared to other brainregions. Lastly, we tested the correlation of compression eﬃciency in the rich-club hubs and other brainregions with cognitive eﬃciency. To model non-linear patterns of cognitive eﬃciency, we used penalizedsplines controlling for potentially confounding covariates. The ﬁnal model can be written as:Cognitive eﬃciency ∼ compression eﬃciency + spline(age)+ sex + spline(age by sex) + degree + density + motion . (29)Due to previous report of the relationship between cognition and global eﬃciency (Equation 1), we determinedthat compression eﬃciency and global eﬃciency were not collinear and therefore conducted a sensitivityanalysis including global eﬃciency as a covariate. The model was written as:Cognitive eﬃciency ∼ compression eﬃciency + global eﬃciency+ spline(age) + sex + spline(age by sex) + degree + density + motion . (30) Recent work in neuroscience and other ﬁelds has identiﬁed a bias in citation practices such that papersfrom women and other minorities are under-cited relative to the number of such papers in the ﬁeld [80, 81,82, 83, 84, 85]. Here we sought to proactively consider choosing references that reﬂect the diversity of theﬁeld in thought, form of contribution, gender, and other factors. We used automatic classiﬁcation of genderbased on the ﬁrst names of the ﬁrst and last authors [80], with possible combinations including male/male,male/female, female/male, female/female. Excluding self-citations to the senior authors of our current paper,the references contain 58.0% male/male, 8.7% male/female, 21.7% female/male, 7.2% female/female, and4.3% unknown categorization. We look forward to future work that could help us to better understand howto support equitable practices in science. 40 eferences [1] M. J. West-Eberhard,

Developmental plasticity and evolution . Oxford University Press, 2003.[2] S. B. Laughlin, “Energy as a constraint on the coding and processing of sensory information,”

Currentopinion in neurobiology , vol. 11, no. 4, pp. 475–480, 2001.[3] E. Bullmore and O. Sporns, “The economy of brain network organization,”

Nature Reviews Neuro-science , vol. 13, no. 5, p. 336, 2012.[4] R. L. Buckner and F. M. Krienen, “The evolution of distributed association networks in the humanbrain,”

Trends in cognitive sciences , vol. 17, no. 12, pp. 648–665, 2013.[5] A. Avena-Koenigsberger, J. Go˜ni, R. Sol´e, and O. Sporns, “Network morphospace,”

Journal of theRoyal Society Interface , vol. 12, no. 103, p. 20140881, 2015.[6] K. J. Whitaker, P. E. V´ertes, R. Romero-Garcia, F. V´aˇsa, M. Moutoussis, G. Prabhu, N. Weiskopf,M. F. Callaghan, K. Wagstyl, T. Rittman, et al. , “Adolescence is associated with genomically patternedconsolidation of the hubs of the human brain connectome,”

Proceedings of the National Academy ofSciences , vol. 113, no. 32, pp. 9105–9110, 2016.[7] P. Reardon, J. Seidlitz, S. Vandekar, S. Liu, R. Patel, M. T. M. Park, A. Alexander-Bloch, L. S. Clasen,J. D. Blumenthal, F. M. Lalonde, et al. , “Normative brain size variation and brain shape diversity inhumans,”

Science , vol. 360, no. 6394, pp. 1222–1227, 2018.[8] A. Di Martino, D. A. Fair, C. Kelly, T. D. Satterthwaite, F. X. Castellanos, M. E. Thomason, R. C.Craddock, B. Luna, B. L. Leventhal, X.-N. Zuo, et al. , “Unraveling the miswired connectome: a devel-opmental perspective,”

Neuron , vol. 83, no. 6, pp. 1335–1353, 2014.[9] N. A. Crossley, A. Mechelli, J. Scott, F. Carletti, P. T. Fox, P. McGuire, and E. T. Bullmore, “The hubsof the human connectome are generally implicated in the anatomy of brain disorders,”

Brain , vol. 137,no. 8, pp. 2382–2395, 2014.[10] L. L. Gollo, J. A. Roberts, V. L. Cropley, M. A. Di Biase, C. Pantelis, A. Zalesky, and M. Breakspear,“Fragility and volatility of structural hubs in the human connectome,”

Nature neuroscience , vol. 21,no. 8, p. 1107, 2018.[11] A. N. Kaczkurkin, T. M. Moore, M. E. Calkins, R. Ciric, J. A. Detre, M. A. Elliott, E. B. Foa,A. G. de la Garza, D. R. Roalf, A. Rosen, et al. , “Common and dissociable regional cerebral bloodﬂow diﬀerences associate with dimensions of psychopathology across categorical diagnoses,”

Molecularpsychiatry , vol. 23, no. 10, p. 1981, 2018.[12] J. Goni, A. Avena-Koenigsberger, N. V. de Mendizabal, M. P. van den Heuvel, R. F. Betzel, andO. Sporns, “Exploring the morphospace of communication eﬃciency in complex networks,”

PLoS One ,vol. 8, no. 3, p. e58070, 2013.[13] J. Go˜ni, M. P. van den Heuvel, A. Avena-Koenigsberger, N. V. de Mendizabal, R. F. Betzel, A. Griﬀa,P. Hagmann, B. Corominas-Murtra, J.-P. Thiran, and O. Sporns, “Resting-brain functional connectivitypredicted by analytic measures of network communication,”

Proceedings of the National Academy ofSciences , vol. 111, no. 2, pp. 833–838, 2014.[14] B. Miˇsi´c, R. F. Betzel, A. Nematzadeh, J. Goni, A. Griﬀa, P. Hagmann, A. Flammini, Y.-Y. Ahn,and O. Sporns, “Cooperative and competitive spreading dynamics on the human connectome,”

Neuron ,vol. 86, no. 6, pp. 1518–1529, 2015. 4115] A. Avena-Koenigsberger, B. Miˇsi´c, R. X. Hawkins, A. Griﬀa, P. Hagmann, J. Go˜ni, and O. Sporns, “Pathensembles and a tradeoﬀ between communication eﬃciency and resilience in the human connectome,”

Brain Structure and Function , vol. 222, no. 1, pp. 603–618, 2017.[16] A. Avena-Koenigsberger, B. Misic, and O. Sporns, “Communication dynamics in complex brain net-works,”

Nature Reviews Neuroscience , vol. 19, no. 1, p. 17, 2018.[17] J. Stiso and D. S. Bassett, “Spatial embedding imposes constraints on neuronal network architectures,”

Trends in cognitive sciences , 2018.[18] W. B. Levy and R. A. Baxter, “Energy eﬃcient neural codes,”

Neural computation , vol. 8, no. 3,pp. 531–543, 1996.[19] E. Tang, M. G. Mattar, C. Giusti, D. M. Lydon-Staley, S. L. Thompson-Schill, and D. S. Bassett,“Eﬀective learning is accompanied by high-dimensional and eﬃcient representations of neural activity,”

Nature neuroscience , vol. 22, no. 6, p. 1000, 2019.[20] J. M. Shine, M. Breakspear, P. T. Bell, K. A. E. Martens, R. Shine, O. Koyejo, O. Sporns, and R. A.Poldrack, “Human cognition involves the dynamic integration of neural activity and neuromodulatorysystems,”

Nature neuroscience , vol. 22, no. 2, p. 289, 2019.[21] M. L. Mack, A. R. Preston, and B. C. Love, “Ventromedial prefrontal cortex compression during conceptlearning,”

Nature Communications , vol. 11, no. 1, pp. 1–11, 2020.[22] G. L. Baum, R. Ciric, D. R. Roalf, R. F. Betzel, T. M. Moore, R. T. Shinohara, A. E. Kahn, S. N.Vandekar, P. E. Rupert, M. Quarmley, et al. , “Modular segregation of structural brain networks supportsthe development of executive function in youth,”

Current Biology , vol. 27, no. 11, pp. 1561–1572, 2017.[23] T. D. Satterthwaite, M. A. Elliott, K. Ruparel, J. Loughead, K. Prabhakaran, M. E. Calkins, R. Hopson,C. Jackson, J. Keefe, M. Riley, et al. , “Neuroimaging of the philadelphia neurodevelopmental cohort,”

Neuroimage , vol. 86, pp. 544–553, 2014.[24] R. C. Gur, J. D. Ragland, M. Reivich, J. H. Greenberg, A. Alavi, and R. E. Gur, “Regional diﬀerencesin the coupling between resting cerebral blood ﬂow and metabolism may indicate action preparednessas a default state,”

Cerebral Cortex , vol. 19, no. 2, pp. 375–382, 2008.[25] C. R. Sims, “Eﬃcient coding explains the universal law of generalization in human perception,”

Science ,vol. 360, no. 6389, pp. 652–656, 2018.[26] T. Berger, “Rate distortion theory, a mathematical basis for data compression (prentice-hall,”

Inc.Englewood Cliﬀs, New Jersey , 1971.[27] S. E. Marzen and S. DeDeo, “The evolution of lossy compression,”

Journal of The Royal Society Inter-face , vol. 14, no. 130, p. 20170166, 2017.[28] B. V´arkuti, M. Cavusoglu, A. Kullik, B. Schiﬄer, R. Veit, ¨O. Yilmaz, W. Rosenstiel, C. Braun,K. Uludag, N. Birbaumer, et al. , “Quantifying the link between anatomical connectivity, gray mattervolume and regional cerebral blood ﬂow: an integrative mri study,”

PLoS One , vol. 6, no. 4, p. e14801,2011.[29] M. P. van den Heuvel, R. S. Kahn, J. Go˜ni, and O. Sporns, “High-cost, high-capacity backbone for globalbrain communication,”

Proceedings of the National Academy of Sciences , vol. 109, no. 28, pp. 11372–11377, 2012.[30] M. P. Van Den Heuvel, C. J. Stam, R. S. Kahn, and H. E. H. Pol, “Eﬃciency of functional brainnetworks and intellectual performance,”

Journal of Neuroscience , vol. 29, no. 23, pp. 7619–7624, 2009.4231] X. Liang, Q. Zou, Y. He, and Y. Yang, “Coupling of functional connectivity and regional cerebral bloodﬂow reveals a physiological basis for network hubs of the human brain,”

Proceedings of the NationalAcademy of Sciences , vol. 110, no. 5, pp. 1929–1934, 2013.[32] T. D. Satterthwaite, R. T. Shinohara, D. H. Wolf, R. D. Hopson, M. A. Elliott, S. N. Vandekar,K. Ruparel, M. E. Calkins, D. R. Roalf, E. D. Gennatas, et al. , “Impact of puberty on the evolutionof cerebral perfusion during adolescence,”

Proceedings of the National Academy of Sciences , vol. 111,no. 23, pp. 8643–8648, 2014.[33] V. Latora and M. Marchiori, “Eﬃcient behavior of small-world networks,”

Physical review letters , vol. 87,no. 19, p. 198701, 2001.[34] M. F. Glasser, M. S. Goyal, T. M. Preuss, M. E. Raichle, and D. C. Van Essen, “Trends and propertiesof human cerebral cortex: correlations with cortical myelin content,”

Neuroimage , vol. 93, pp. 165–175,2014.[35] M. Bertolero, B. Yeo, and M. D’esposito, “The diverse club,”

Nature communications , vol. 8, no. 1,p. 1277, 2017.[36] S. Achard and E. Bullmore, “Eﬃciency and cost of economical brain functional networks,”

PLoS com-putational biology , vol. 3, no. 2, p. e17, 2007.[37] H. Johansen-Berg, “Behavioural relevance of variation in white matter microstructure,”

Current opinionin neurology , vol. 23, no. 4, pp. 351–358, 2010.[38] M. Rubinov, “Constraints and spandrels of interareal connectomes,”

Nature communications , vol. 7,p. 13812, 2016.[39] O. Sporns, “Network attributes for segregation and integration in the human brain,”

Current opinionin neurobiology , vol. 23, no. 2, pp. 162–171, 2013.[40] G. Hahn, A. Ponce-Alvarez, G. Deco, A. Aertsen, and A. Kumar, “Portraits of communication inneuronal networks,”

Nature Reviews Neuroscience , vol. 20, no. 2, pp. 117–127, 2019.[41] A. Avena-Koenigsberger, X. Yan, A. Kolchinsky, M. P. van den Heuvel, P. Hagmann, and O. Sporns,“A spectrum of routing strategies for brain networks,”

PLOS Computational Biology , vol. 15, pp. 1–24,03 2019.[42] P. E. V´ertes, A. Alexander-Bloch, and E. T. Bullmore, “Generative models of rich clubs in hebbianneuronal networks and large-scale human brain networks,”

Philosophical Transactions of the RoyalSociety B: Biological Sciences , vol. 369, no. 1653, p. 20130531, 2014.[43] S. Oldham and A. Fornito, “The development of brain network hubs,”

Developmental cognitive neuro-science , 2018.[44] E. Ferrer, K. J. Whitaker, J. S. Steele, C. T. Green, C. Wendelken, and S. A. Bunge, “White mattermaturation supports the development of reasoning ability through its inﬂuence on processing speed,”

Developmental Science , vol. 16, no. 6, pp. 941–951, 2013.[45] J. Scholz, M. C. Klein, T. E. Behrens, and H. Johansen-Berg, “Training induces changes in white-matterarchitecture,”

Nature neuroscience , vol. 12, no. 11, p. 1370, 2009.[46] S. G. Vij, J. S. Nomi, D. R. Dajani, and L. Q. Uddin, “Evolution of spatial and temporal features offunctional brain networks across the lifespan,”

NeuroImage , vol. 173, pp. 498–508, 2018.[47] T. Watanabe, S. Hirose, H. Wada, Y. Imai, T. Machida, I. Shirouzu, S. Konishi, Y. Miyashita, andN. Masuda, “A pairwise maximum entropy model accurately describes resting-state human brain net-works,”

Nature communications , vol. 4, p. 1370, 2013.4348] A. Zalesky, A. Fornito, L. Cocchi, L. L. Gollo, M. P. van den Heuvel, and M. Breakspear, “Connectomesensitivity or speciﬁcity: which is more important?,”

Neuroimage , vol. 142, pp. 407–420, 2016.[49] F. Lieder and T. L. Griﬃths, “Resource-rational analysis: understanding human cognition as the optimaluse of limited computational resources,”

Behavioral and Brain Sciences , pp. 1–85, 2019.[50] A. C. Schapiro, N. B. Turk-Browne, M. M. Botvinick, and K. A. Norman, “Complementary learningsystems within the hippocampus: a neural network modelling approach to reconciling episodic mem-ory with statistical learning,”

Philosophical Transactions of the Royal Society B: Biological Sciences ,vol. 372, no. 1711, p. 20160049, 2017.[51] V. Balasubramanian, D. Kimber, and M. J. B. Ii, “Metabolically eﬃcient information processing,”

Neural computation , vol. 13, no. 4, pp. 799–815, 2001.[52] L. M. Hernandez, J. D. Rudie, S. A. Green, S. Bookheimer, and M. Dapretto, “Neural signatures ofautism spectrum disorders: insights into brain network dynamics,”

Neuropsychopharmacology , vol. 40,no. 1, p. 171, 2015.[53] I. Dinstein, D. J. Heeger, L. Lorenzi, N. J. Minshew, R. Malach, and M. Behrmann, “Unreliable evokedresponses in autism,”

Neuron , vol. 75, no. 6, pp. 981–991, 2012.[54] D. R. Roalf, M. Quarmley, M. A. Elliott, T. D. Satterthwaite, S. N. Vandekar, K. Ruparel, E. D.Gennatas, M. E. Calkins, T. M. Moore, R. Hopson, et al. , “The impact of quality assurance assessmenton diﬀusion tensor imaging outcomes in a large-scale population-based cohort,”

Neuroimage , vol. 125,pp. 903–919, 2016.[55] T. M. Moore, S. P. Reise, R. E. Gur, H. Hakonarson, and R. C. Gur, “Psychometric properties of thepenn computerized neurocognitive battery.,”

Neuropsychology , vol. 29, no. 2, p. 235, 2015.[56] E. Tang, C. Giusti, G. L. Baum, S. Gu, E. Pollock, A. E. Kahn, D. R. Roalf, T. M. Moore, K. Ruparel,R. C. Gur, et al. , “Developmental increases in white matter network controllability support a growingdiversity of brain dynamics,”

Nature communications , vol. 8, no. 1, p. 1252, 2017.[57] M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Yacoub, K. Ugurbil,J. Andersson, C. F. Beckmann, M. Jenkinson, et al. , “A multi-modal parcellation of human cerebralcortex,”

Nature , vol. 536, no. 7615, p. 171, 2016.[58] L. Cammoun, X. Gigandet, D. Meskaldji, J. P. Thiran, O. Sporns, K. Q. Do, P. Maeder, R. Meuli, andP. Hagmann, “Mapping the human connectome at multiple scales with diﬀusion spectrum mri,”

Journalof neuroscience methods , vol. 203, no. 2, pp. 386–397, 2012.[59] F.-C. Yeh, T. D. Verstynen, Y. Wang, J. C. Fern´andez-Miranda, and W.-Y. I. Tseng, “Deterministicdiﬀusion ﬁber tracking improved by quantitative anisotropy,”

PloS one , vol. 8, no. 11, p. e80713, 2013.[60] Z. Wang, G. K. Aguirre, H. Rao, J. Wang, M. A. Fern´andez-Seara, A. R. Childress, and J. A. Detre,“Empirical optimization of asl data analysis using an asl data processing toolbox: Asltbx,”

Magneticresonance imaging , vol. 26, no. 2, pp. 261–269, 2008.[61] W.-C. Wu, V. Jain, C. Li, M. Giannetta, H. Hurt, F. W. Wehrli, and D. J. Wang, “In vivo venousblood t1 measurement using inversion recovery true-ﬁsp in children and adults,”

Magnetic resonance inmedicine , vol. 64, no. 4, pp. 1140–1147, 2010.[62] V. Jain, J. Duda, B. Avants, M. Giannetta, S. X. Xie, T. Roberts, J. A. Detre, H. Hurt, F. W. Wehrli,and D. J. Wang, “Longitudinal reproducibility and accuracy of pseudo-continuous arterial spin–labeledperfusion mr imaging in typically developing children,”

Radiology , vol. 263, no. 2, pp. 527–536, 2012.4463] M. F. Glasser and D. C. Van Essen, “Mapping human cortical areas in vivo based on myelin content asrevealed by t1-and t2-weighted mri,”

Journal of Neuroscience , vol. 31, no. 32, pp. 11597–11616, 2011.[64] S. N. Wood, “Stable and eﬃcient multiple smoothing parameter estimation for generalized additivemodels,”

Journal of the American Statistical Association , vol. 99, no. 467, pp. 673–686, 2004.[65] C. O. Becker, S. Pequito, G. J. Pappas, M. B. Miller, S. T. Grafton, D. S. Bassett, and V. M. Preciado,“Spectral mapping of brain functional connectivity from diﬀusion imaging,”

Scientiﬁc reports , vol. 8,no. 1, p. 1411, 2018.[66] O. Sporns and R. F. Betzel, “Modular brain networks,”

Annual review of psychology , vol. 67, pp. 613–640, 2016.[67] D. S. Bassett, D. L. Greenﬁeld, A. Meyer-Lindenberg, D. R. Weinberger, S. W. Moore, and E. T.Bullmore, “Eﬃcient physical embedding of topologically complex information processing networks inbrains and computer circuits,”

PLoS computational biology , vol. 6, no. 4, p. e1000748, 2010.[68] D. S. Bassett, N. F. Wymbs, M. A. Porter, P. J. Mucha, J. M. Carlson, and S. T. Grafton, “Dynamicreconﬁguration of human brain networks during learning,”

Proceedings of the National Academy ofSciences , vol. 108, no. 18, pp. 7641–7646, 2011.[69] M. A. Bertolero, B. T. Yeo, and M. D’Esposito, “The modular and integrative functional architecture ofthe human brain,”

Proceedings of the National Academy of Sciences , vol. 112, no. 49, pp. E6798–E6807,2015.[70] M. E. Newman, “Modularity and community structure in networks,”

Proceedings of the nationalacademy of sciences , vol. 103, no. 23, pp. 8577–8582, 2006.[71] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in largenetworks,”

Journal of statistical mechanics: theory and experiment , vol. 2008, no. 10, p. P10008, 2008.[72] “Chapter 7 - paths, diﬀusion, and navigation,” in

Fundamentals of Brain Network Analysis (A. Fornito,A. Zalesky, and E. T. Bullmore, eds.), pp. 207 – 255, San Diego: Academic Press, 2016.[73] C. R. Sims, “Rate–distortion theory and human perception,”

Cognition , vol. 152, pp. 181–198, 2016.[74] W. Alt, “Biased random walk models for chemotaxis and related diﬀusion approximations,”

Journal ofmathematical biology , vol. 9, no. 2, pp. 147–177, 1980.[75] M. P. van den Heuvel and O. Sporns, “Network hubs in the human brain,”

Trends in cognitive sciences ,vol. 17, no. 12, pp. 683–696, 2013.[76] V. Colizza, A. Flammini, M. A. Serrano, and A. Vespignani, “Detecting rich-club ordering in complexnetworks,”

Nature physics , vol. 2, no. 2, p. 110, 2006.[77] G. Collin, O. Sporns, R. C. Mandl, and M. P. van den Heuvel, “Structural and functional aspectsrelating to cost and beneﬁt of rich club organization in the human cerebral cortex,”

Cerebral cortex ,vol. 24, no. 9, pp. 2258–2267, 2013.[78] S. Maslov and K. Sneppen, “Speciﬁcity and stability in topology of protein networks,”

Science , vol. 296,no. 5569, pp. 910–913, 2002.[79] A. F. Alexander-Bloch, H. Shou, S. Liu, T. D. Satterthwaite, D. C. Glahn, R. T. Shinohara, S. N.Vandekar, and A. Raznahan, “On testing for spatial correspondence between maps of human brainstructure and function,”

Neuroimage , vol. 178, pp. 540–551, 2018.[80] J. D. Dworkin, K. A. Linn, E. G. Teich, P. Zurn, R. T. Shinohara, and D. S. Bassett, “The extent anddrivers of gender imbalance in neuroscience reference lists,” bioRxiv , 2020.4581] D. Maliniak, R. Powers, and B. F. Walter, “The gender citation gap in international relations,”

Inter-national Organization , vol. 67, no. 4, pp. 889–922, 2013.[82] N. Caplar, S. Tacchella, and S. Birrer, “Quantitative evaluation of gender bias in astronomical publica-tions from citation counts,”

Nature Astronomy , vol. 1, no. 6, p. 0141, 2017.[83] P. Chakravartty, R. Kuo, V. Grubbs, and C. McIlwain, “

Journal of Com-munication , vol. 68, no. 2, pp. 254–266, 2018.[84] Y. Thiem, K. F. Sealey, A. E. Ferrer, A. M. Trott, and R. Kennison, “Just Ideas? The Status andFuture of Publication Ethics in Philosophy: A White Paper,” tech. rep., 2018.[85] M. L. Dion, J. L. Sumner, and S. M. Mitchell, “Gendered citation patterns across political science andsocial science methodology ﬁelds,”