[PDF] Cerebral cortical communication overshadows computational energy-use, but these combine to predict synapse number

Abstract

Darwinian evolution tends to produce energy-efficient outcomes. On the other hand, energy limits computation, be it neural and probabilistic or digital and logical. Taking a particular energy-efficient viewpoint, we define neural computation and make use of an energy-constrained, computational function. This function can be optimized over a variable that is proportional to the number of synapses per neuron. This function also implies a specific distinction between ATP-consuming processes, especially computation \textit{per se} vs the communication processes including action potentials and transmitter release. Thus to apply this mathematical function requires an energy audit with a partitioning of energy consumption that differs from earlier work. The audit points out that, rather than the oft-quoted 20 watts of glucose available to the brain \cite{sokoloff1960metabolism,sawada2013synapse}, the fraction partitioned to cortical computation is only 0.1 watts of ATP. On the other hand at 3.5 watts, long-distance communication costs are 35-fold greater. Other novel quantifications include (i) a finding that the biological vs ideal values of neural computational efficiency differ by a factor of 10^8 and (ii) two predictions of N, the number of synaptic transmissions needed to fire a neuron (2500 vs 2000).

Full PDF

CCerebral cortical communication overshadowscomputational energy-use, but these combine topredict synapse number

William B Levy a,1 and Victoria G. Calvert b a Department of Neurosurgery and Department of Psychology; University of Virginia; Charlottesville, VA 22903; USA; b College of Arts and Sciences; University of Virginia;Charlottesville, VA 22903; USAThis manuscript was compiled on February 15, 2021

Darwinian evolution tends to produce energy-efﬁcient outcomes. Onthe other hand, energy limits computation, be it neural and prob-abilistic or digital and logical. Taking a particular energy-efﬁcientviewpoint, we deﬁne neural computation and make use of an energy-constrained, computational function. This function can be optimizedover a variable that is proportional to the number of synapses perneuron. This function also implies a speciﬁc distinction betweenATP-consuming processes, especially computation per se vs thecommunication processes including action potentials and transmit-ter release. Thus to apply this mathematical function requires an en-ergy audit with a partitioning of energy consumption that differs fromearlier work. The audit points out that, rather than the oft-quoted 20watts of glucose available to the brain (1, 2), the fraction partitionedto cortical computation is only 0.1 watts of ATP. On the other handat 3.5 watts, long-distance communication costs are 35-fold greater.Other novel quantiﬁcations include (i) a ﬁnding that the biological vsideal values of neural computational efﬁciency differ by a factor of and (ii) two predictions of N , the number of synaptic transmis-sions needed to ﬁre a neuron (2500 vs 2000). energy-eﬃcient | bits per joule | optimal computation | brain energyconsumption | neural computation T he purpose of the brain is to process information, butthat leaves us with the problem of ﬁnding appropriatedeﬁnitions of information processing. We assume that givenenough time and given a suﬃciently stable environment (e.g.,the common internals of the mammalian brain), then Nature’sconstructions approach an optimum. The problem is to ﬁndwhich function or combined set of functions are optimal whenwe incorporate empirical values into the function(s). Theinitial example in neuroscience is (3), which shows that infor-mation capacity is far from optimized, especially in comparisonto the optimal information per joule which is in much closeragreement with empirical values. Whenever we ﬁnd such anagreement between theory and experiment, we conclude thatthis optimization, or near optimization, is Nature’s perspective.Using this strategy, we and others seek quantiﬁed relationshipswith particular forms of information processing and requirethat these relationships are approximately optimal (3–9). Arecent theoretical development identiﬁes a candidate optimalcomputation at the level of single neurons (10). To applythis theory requires understanding certain neuronal energyexpenditures. Here the focus is on the energy budget of thehuman cerebral cortex and its primary neurons. The energyaudit here diﬀers from the premier earlier work (11) in twoways: the brain considered here is human not rodent, and theaudit here uses a partitioning motivated by the information-eﬃciency calculations rather than the classical partitions of cell biology and neuroscience (11). Importantly, our auditreveals greater joule-use by communication than by computa-tion. This observation in turn generates new insights into theoptimal synapse number. Speciﬁcally, the bits/J optimizedcomputation must provide a suﬃcient bits/sec to the axon andpresynaptic mechanism to justify the great expense of timelycommunication. Simply put from the optimization perspective,we assume evolution does not build a costly communicationsystem and then fails to supply it with an appropriate bits/secto justify its costs. The bits/J are optimized over N when N ≈ a priori N is assumed toequal the number of synapses per neuron times the successrate of synaptic transmission (an estimated 2500).To measure computation, and to partition out its cost, re-quires a suitable deﬁnition at the single neuron level. Ratherthan the generic deﬁnition ‘any signal transformation’ (5),or the neural-like ‘converting a multivariate signal to scalarsignal’, we conjecture a more detailed deﬁnition (10). To movetowards this deﬁnition, note two important brain functions:estimating what is present in the sensed world and predictingwhat will be present, including what will occur as the braincommands manipulations. Then, assume that such macro-scopic inferences arise by combining single neuron inferences.Such a neuron performs the same type of inference as themacroscopic problem; i.e., conjecture a neuron performingmicroscopic estimation (or prediction). Instead of sensing theworld, a neuron’s sensing is merely its capacitive chargingdue to recently active synapses. Using this sampling of total Signiﬁcance Statement

Engineers hold up the human brain as a low energy form ofcomputation. However from the simplest physical viewpoint, aneuron’s computation cost is remarkably larger than the bestpossible bits/J – off by a factor of . Here we explicate,in the context of energy consumption, a deﬁnition of neuralcomputation that is optimal given explicit constraints. The plau-sibility of this deﬁnition as Nature’s perspective is supportedby an energy-audit of the human brain. The audit itself re-quires certain novel perspectives and calculations revealingthat communication costs are 35-fold computational costs. WBL conceptualized the study, developed the theoretical aspects and their description. WBL andVC developed the energy-audit and its description.There are no competing interests E-mail: [email protected]

February 15, 2021 | vol. XXX | no. XX | a r X i v : . [ q - b i o . N C ] F e b MPAR NMDAR0.000.010.020.030.040.050.060.07W

Fig. 1.

Computation costs little compared to communication. Communication alone accounts for more ca. two thirds of the available 4.94 ATP-W (Table 1), with slightly moreconsumption due to WM than GM (big pie chart). Computation, the smallest consumer, is subpartitioned by the two ionotropic glutamate receptors (bargraph).

SynMod + includes astrocytic costs, process extension, process growth, axo- and dendro-plasmic transport of the membrane building blocks, and time-independent housekeeping costs(although this last contributor is a very small fraction). The small pie chart sub-partitions GM communication. See Results and Methods for details. WM communication includesits maintenance and myelination costs in addition to resting and action potentials. accumulated charge over a particular elapsed time, a neuronimplicitly estimates the value of its local latent variable, avariable deﬁned by evolution and developmental construction(10). Applying an optimization perspective, which includesimplicit Bayesian inference, a suﬃcient statistic, maximum-likelihood unbiasedness, as well as energy costs, (10) producesa quantiﬁed theory of single neuron computation. A resultof this theory is the deﬁnition of the optimal IPI probabilitydistribution. Motivating IPI coding is this fact: The use ofconstant amplitude signaling, e.g. action potentials, impliesthat all information can only be in IPIs. Therefore, no codecan outperform an IPI code, and it can only equal an IPI codein bit-rate if it is one-to-one with an IPI code. In neuroscience,an equivalent to IPI codes is the instantaneous rate code whereeach message is IP I − . In communication theory, a discreteform of IPI coding is called diﬀerential pulse position modula-tion (12); (13) explicitly introduced a continuous form of thiscoding as a neuron communication hypothesis, and it receivesfurther development in (14).The Results recall and further develop earlier work con-cerning a certain optimization that deﬁnes IPI probabilities(10). An energy audit is required to use these novel devel-opments. Combining the theory with the audit leads to twooutcomes: (i) the optimizing N serves as a consistency checkon the audit and (ii) future energy audits for individual celltypes will predict N for that cell type, a test of the theory.Specialized approximations here that are not present in ear-lier work (11) include the assumptions that (i) all neurons ofcortex are pyramidal neurons, (ii) pyramidal neurons are theinputs to pyramidal neurons, (iii) a neuron is under constantsynaptic bombardment, and (iv) a neuron’s capacitance mustbe charged 16 mV from reset potential to threshold to ﬁre.Following the audit, the reader is given a perspective thatmay be obvious to some, but it is rarely discussed and seem-ingly contradicts the engineering literature (but see (8)). Inparticular, a neuron is an incredibly ineﬃcient computationaldevice in comparison to a physical analog. It is not just a fewbits/J away from optimal, but oﬀ by a huge amount, a factorof 10 . The theory resolves the eﬃciency issue using a modiﬁedoptimization perspective. Activity dependent communicationand synaptic modiﬁcation costs force upward optimal com-putational costs. In turn, the bit-value of the computational energy expenditure is constrained to a central-limit like result:Every doubling of N can produce no more than 0.5 bits. Inaddition to (i) explaining the 10 excessive energy-use otherresults here include (ii) identifying the largest ‘noise’ sourcelimiting computation, which is the signal itself, and (iii) parti-tioning the relevant costs, which may help engineers redirectfocus towards computation and communication costs ratherthan the 20 W total brain consumption as their design goal. Results

Energy audit.

ATP use for computation and communication.

Microscopic energycosts are based on bottom up calculations for ATP consump-tion (11). 36,000 J/molATP, (15) implies watts. As derivedbelow, computation consumes ca. 0.1 ATP-watts/cortex or one two-hundredth of the nominal and oft quoted 20 watts thatwould be produced by complete oxidation of the glucose takenup by the brain (1). Fig 1 compares cortical communicationcosts to computational costs. Also appearing is an energy con-sumption labeled (

SynMod + ). What (16) calls ’housekeeping’is a hypothesis on its part; an alternative hypothesis is in-spired by and consistent with results from developing brain(17). This category seems to be dominated by costs consis-tent with synaptogenesis (e.g.,growth and process extensionvia actin polymerization and via new membrane incorpora-tion, membrane synthesis and its axo- and dendro-plasmictransport, and astrocytic costs); a small fraction of SynMod + is time-dependent ’maintenance’. Here SynMod + is calcu-lated by subtracting the bottom-up calculated communicationand computation ATP consumption from available ATP, atop-down empirical partitioning (Table 1).For some, the rather large cost of communication mightbe surprising but apparently is necessary for suitable signalvelocities and information rates (3, 18–21). Combining graymatter (GM) communication costs with the total white matter(WM) costs accounts for 71%, 3.52 W (Fig 2), of the total4.94 ATP-watts/cortex, compared to 2% for computation.Supposing that all WM costs are essentially communicationcosts (including oligodendrocyte/myelination costs), then theratio for communication vs computation is 35:1. et al. able 1. Rudimentary partitioning, glucose to ATP Brain/Region Watts Unoxidized Heat ATP-(weight) (complete (equivalent watts wattsoxidation) watts)whole brain (1495 g) 17.0 1.86 8.89 6.19cerebellum (154 g) 1.77 0.19 0.93 0.65other regions (118 g) 1.65 0.18 0.87 0.60forebrain cortex (1223 g): 4.94white (590 g) 5.07 0.56 2.66 1.85gray (633 g) 8.45 0.93 4.43 3.09

See Methods and SI Appendix (Tables) for details and citations

Computation costs in the human brain.

The energy needed to re-cover ion-gradients from the total excitatory synaptic current-ﬂows/IPI determines the cost of computation for that IPI.Various quantitative assumptions feeding into subsequent cal-culations are required (see Methods and SI Appendix), butnone are more important than the generic assumption thatthe average ﬁring-rate of each input to a neuron is the sameas the average ﬁring-rate out of that neuron (6). Via thisassumption, and assuming 10 synapses per neuron and a 75%failure rate, the aggregate eﬀects of inhibition, capacitance,and postsynaptic K + conductances are implicitly taken intoaccount. This aggregation is possible since increases of any ofthese parameters merely lead to smaller depolarizations persynaptic activation but cause little change in synaptic currentﬂow per excitatory synaptic event. Indeed, such attenuatingeﬀects are needed to make sense of several other variables. Aquick calculation helps illustrate this claim.An important starting point for computational energy costis the average number of excitatory synaptic activations to ﬁrea cortical neuron. Assume the neuron is a pyramidal neuronand that its excitatory inputs are other pyramidal neurons.Therefore, the mean ﬁring rate of this neuron is equal to themean ﬁring rate of each input. Thus, threshold will be thenumber of input synapses times quantal success rate (6); i.e.,ca. 10 · .

25 = 2500 = N because on average each inputﬁres once per IPI out. Even after accounting for quantalsynaptic failures, inhibition is required for consistency with2500 excitatory events propelling the 16 mV journey from resetto threshold. Activation of AMPARs and NMDARs providesan inﬂux of three Na + ’s for every two K + that ﬂow out. Withan average total AMPAR conductance of 200 pS, there are114.5 pS of Na + per synaptic activation (SA). Multiplyingthis conductance by the 110 mV driving force on Na + andby the 1.2 msec SA duration yields 15 . + inﬂux by 3 compensates for the 2 K + that ﬂowout for every 3 Na + that enter; thus, the net charge inﬂux is5.04 fC/SA. We assume that the voltage-activated, glutamate-primed NMDARs increases this net ﬂux by a factor of 1.5,yielding 7.56 fC/SA (see Methods and SI Appendix, Tables S3,S4, and S5 for details and ATP costs). Taking into account the2500 synaptic activations per IPI yields 18.9 pC/IPI. Using a750 pF value for a neuron’s capacitance, this amount of chargewould depolarize the membrane potential 25.2 mV rather thanthe desired 16 mV. Thus, the excitatory charge inﬂux mustbe opposed by inhibition and K + conductances to oﬀset thetotal 7.56 fC net positive inﬂux. Most simply, just assumea divisive inhibitory factor of 1.5. Then the numbers are all consistent, and the average depolarization is 6.4 µ V persynaptic activation. Because each net, accumulated chargerequires one ATP to return the three Na + ’s and 2 K + ’s, thecomputational cost of the 16 mV depolarization is 6 . · − J/neuron/spike. I. e., required computational power for eachneuron spike of cortex 6 . · − · . · = 0 .

10 W.

Communication costs.

As quantiﬁed in Methods (see also SIAppendix, Tables S3 and S5), the GM long-distance communi-cation cost of 1 .

67 W (Fig 1) includes the partitioned costs ofaxonal resting potential, APs, and presynaptic transmission(neurotransmitter recycling and packaging, vesicle recycling,and calcium extrusion). The neurotransmission costs assumea 1 Hz mean neuron ﬁring rate and a 75% failure rate. Nextusing (16), the calculation assumes one vesicle is releasedper non-failed AP. Diﬀering from (16) while closer to earlierwork (11), assume there is the same Ca-inﬂux with every AP(22). Further, also use a more recent measurement of Na + -K + overlapping current ﬂows of the AP, 2.38 (23). Of all thediﬃcult but inﬂuential estimates, none is more challengingand important than axonal surface area; see Methods. Firing Rate.

In regard to average ﬁring rate, we postulate anaverage value of one pulse per neuron per decision-makinginterval (DMI), which we assume as 1 sec. W a tt s Cortical Power Consumption vs. Firing Rate

WMComm + GMComm3.52 W @ SynMod ++ GMComm + Comp3.09W @ G M C o m m + C o m p G M C o m m Fig. 2.

Energy-use increases linearly with average ﬁring-rate, but for reasonablerates, computation (Comp) costs much less than communication (Comm). Comparingthe bottom (red) curve (GM communication costs) to the top (blue) curve (GM com-munication cost plus computational costs), illustrates how little computational costsincrease relative to communication costs. The y-intercept value is 1.09 W for restingpotential. The unﬁlled circle plotting

SynMod + +GMComm+Comp adds the 1.32W of GM SynMod + to the 1.77 W of GMComm+Comp @ As Fig 2 indicates, the combined WM and GM communi-cation cost at 1 Hz is 3.52 W. Computational costs are only avery small fraction of frequency-dependent costs. Calculationof

SynMod + is not possible and, as explicated in Discussion,we discredit the ouabain manipulation others (11, 24) use toestimate it. The value here is arrived at by diﬀerencing thecalculated and measured costs from the available energy (seeSI Appendix, Fig S1).Using a ﬁring-rate of 1 Hz and 1 . · neurons/cortex,a bottom up calculation for the excitatory postsynaptic ion-ﬂux per AP per cortex yields 0.10 W. The linear relationship Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX | etween ﬁring rate and energy consumption has a substantialbaseline energy consumption of 1.09 W (y-axis intercept).Apparently resting axon conductance (25) is required for aresting potential and stable behavior (26). In the case of thedendrite, computational costs are zero at zero ﬁring rate, atheoretical limit result which, as argued earlier, is a nonsensepractical situation. However, dendritic leak is assumed tobe essentially zero since we assume, perhaps controversially(cf. (11)), that a cortical neuron is under constant synapticbombardment and that all dendrosomatic conductances aredue to synaptic activation and voltage-activated channels.That is, a neuron resets after it ﬁres and immediately startsdepolarizing until hitting threshold.Computational costs are very sensitive to failure rates,which for this ﬁgure are ﬁxed at 75%, whereas communicationis only slightly sensitive to the synaptic failure rate (see belowfor more details). An energy-use partitioning based on glucose oxidation.

The oft-repeated brain energy consumption of 20 W is not simplythe cost of computation and communication, thus requiring anappropriate partitioning (see Table 1). The 17 W of glucosepotential energy from recent PET scan research (27) replacesSokoloﬀ’s 20 W from the 1950s. The PET scan research pro-duces regional per-gram values, and these values are scaled bythe regional masses (28), allowing regional estimates, Table S1.Arguably, 11% of the total glucose uptake is not oxidized (29),(some arteriovenous diﬀerences obtain a smaller percentage;see SI Appendix). After removing the 8.89 W for heating,there are only 6.19 ATP-W available to the whole brain. Theregional partitioning implies cerebral gray consumes 3.09 ATP-W, split between computation, communication, and

SynMod + .After direct calculation of communication and computationalcosts, the remaining GM energy is allocated to SynMod + . A specialized partitioning.

The ultimate calculation of bits/J re-quires a bipartite partition of action potential-driven costs:Those independent of N , A := E WMAP + E GMAxAP + E SynModGrow ≈ .

23 + 0 .

45 + 1 .

08 = 2 . J/sec/cortex vs those propor-tional to N , B := E COMP + E Pre + E NSynMod + E PreNaAP ≈ .

10 + 0 .

11 + 0 .

02 = 0 . J/sec/cortex. For thethree non- N contributers, WMAP is white matter actionpotential-dependent; GMAxAP is gray matter action potential-dependent; SynModGrow are all the functions that under-lie synaptogenesis and shedding as well as maintenance asdriven by ﬁring. For the four N -proportional functions,Comp is postsynaptic ionotropic activation; Pre is presynap-tic Ca and transmitter release functions; PreNaAP is par-tial presynaptic depolarization driven by the axonal AP; andNSynMod is N -proportional synaptic modiﬁcations includ-ing synaptic metabotropic activation. Note that E SynMod + = E SynModGrow + E NSynMod + E Plus ≈ .

08 + 0 .

11 + 0 .

12, where E Plus isthe time-dependent maintenance cost. Finally, to use A and B , they are rescaled to J/IPI/neuron (divide by number ofneurons ﬁring in one sec) and, additionally for B , a rescalingto dependence on synapse number (multiply by N ÷ E (Λ , T ) := ( A + N · B/ · E [ T ] ÷ n where E [ T ] isthe average IPI and n is the number of cortical neurons. A baseline for maximally efﬁcient computation.

A simplistic model relates physics to neuroscience.

For the sake ofcreating a baseline, initial comparison and for further under- standing of just what "computation" can mean, suppose aneuron’s computation is just its transformation of inputs tooutputs. Then, quantifying the information passed throughthis transformation (bits/sec) and dividing this informationrate by the power (W = J/sec) yields bits/J. This ratio isour eﬃciency measure. In neuroscience, it is generally agreedthat Shannon’s mutual information ( MI ) is applicable formeasuring bit-rate of neural information processing, neuraltransformations, or neural communication, e.g., (5, 6, 30–35).Speciﬁcally, using mutual information and an associated rateof excitatory postsynaptic currents of a single neuron pro-duces a comparison with the optimal bits/J for computationas developed through physical principles. To understand theanalogy with statistical mechanics, assume the only noise iswideband thermal noise, k T ≈ . · − J (Boltzmann’sconstant times absolute temperature, T = 310 K). The bits/Jratio can be optimized to ﬁnd the best possible energetic costof information, which is ( k T ln 2) − , the Landauer limit (36). Fig. 3.

Maxwell’s demon cycle is analogous to the neuron’s computational cycle.The initial state in the demon cycle is equivalent to the neuron at rest. The demonsensing fast molecules is analogous to the synaptic activations received by the neuron.Whereas the demon uses energy to set the memory and then opens the door for amolecule, the neuron stores charge on the membrane capacitance (C m ) and thenpulses out once this voltage reaches threshold. Simultaneous with such outputs, bothcycles then reset to their initial states and begin again. Both cycles involve energybeing stored and then released into the environment. The act of the demon openingthe door is ignored as an energy cost; likewise, the neuron’s computation does notinclude the cost of communication. Each q i is a sample and represents the chargeaccumulated on the plasma membrane when synapse i is activated. To give this derivation a neural-like ﬂavor, suppose a perfectintegrator with the total synaptic input building up on theneuron’s capacitance. Every so often the neuron signals thisvoltage and resets to its resting potential. Call the signal V sig ,and rather unlike a neuron, let it have mean value (restingpotential) of zero. That is, let it be normally distributed N (0 , σ sig = E [ V sig ]). The thermal noise voltage-ﬂuctuationis also a zero-centered normal distribution, N (0 , σ noise ). Ex-pressing this noise as energy on the membrane capacitance, C m σ noise = k T ⇒ σ noise = k T C m (37–39). Then using Shan-non’s result, e.g., theorem 10.1.1 as in (40), the nats per trans-mission are ln(1 + σ sig σ noise ) = ln(1 + C m E [ V sig ] k T ) (withnatural logarithms being used since we are performing a max-imization, thus nats = bits · ln 2). Converting to bits, andcalling this result the mutual information channel capacity, C MI = (2 ln 2) − ln(1 + C m E [ V sig ] k T ) .Next we need the energy cost, the average signalJ/transmission developed on the ﬁxed C m by the synap-tic activation, E := C m E [ V sig ]2 . Dividing the bits/sec et al. MI by the J/sec E yields the bits/J form of interest; C MI E = ( C m E [ V sig ] ln 2) − ln(1 + C m E [ V sig ] k T ). This ratiois recognized as the monotonically decreasing function ln(1+ x ) cx with x, c > . Therefore maximizing over E [ V sig ] butwith the restriction E [ V sig ] >

0, this is a limit result implyingan approach to zero bits/sec. That is, lim E [ V sig ] → C MI E = ( C m E [ V sig ] ln 2) − C m E [ V sig ] k T = ( k T ln 2) − ≈ . · bits/J.Two comments seem germane. First, physicists arrivedat this value decades ago in their vanquishing of Maxwell’sdemon and its unsettling ability to create usable energy fromrandomness (36). In their problem, the device (the demon) isnot obviously computational in the neural sense; the demonjust repeatedly (i) senses, (ii) stores, and (iii) operates a doorbased on the stored information, and then (iv) erases its storedinformation as it continues to separate fast molecules from theslower ones (41, 42): see Fig 3. Moreover, even after simplify-ing this cycle to steps (i), (ii) and (iv), physicists do see thedemon’s relevance to digital computation. Such a cycle is at theheart of modern computers where computation occurs throughrepetitive uses, or pairwise uses, of the read/write/erase cycles.For example, bit-shifting as it underlies multiplication andthe pairwise sensing and bit-setting (then resetting) of binary,Boolean logical operations reﬂect such cycles. Thus, as is wellknown from other arguments e.g., (36), (43), the limit-result ofphysics sets the energy-constraining bound on non-reversibledigital computation. Regarding (iii) it would seem that ifthe demon communicates and controls the door as slowly aspossible (i.e, the limit of time going to inﬁnity), there is noneed to assign an energy-cost to these functions.In spite of a non-surprising qualitative comparison, thereis a second insight. Compared to the estimates here of aneuron cycling from reset to ﬁring to reset, this physics resultis unimaginably more eﬃcient, not just ﬁve or ten times more,but 10 -fold more eﬃcient. Suppose that the computationalportion of a human cortical neuron has capacitance C m ≈ V rst = − .

066 V while ﬁringthreshold is V θ = − .

050 V. Then in the absence of inhibition,the excitatory synaptic energy needed to bring a neuron fromreset to threshold is C m ( V rst − V θ ) ≈ . · − J/spike.Assuming 4 bits/spike, the bits/J are 2 . · . Comparedto the optimal limit set by physics, this eﬃciency value is10 times less energy-eﬃcient, a seemingly horrendous energy-eﬃciency for a supposedly optimized system. The disagreement reorients our thinking.

In the context of under-standing neural computation via optimized energy-use, thishuge discrepancy might discourage any further comparisonwith thermal physics or the use of mutual information. It couldeven discourage the assumption that Nature microscopicallyoptimizes bits/J. But let us not give up so quickly. Note thatthe analogy between the four-step demon versus an abstractdescription of neural computation for one interpulse interval(IPI) is reasonable (see Fig 3). That is, (i) excitatory synapticevents are the analog of sensing, these successive events are (ii)stored as charge on the plasma membrane capacitance untilthreshold is reached, at which point (iii) a pulse-out occurs,and then (iv) the "memory" on this capacitor is reset and the cycle begins anew. Nevertheless, the analogy has its weakspots.The disharmony between the physical and biological per-spectives arises from the physical simpliﬁcations that time isirrelevant and that step (iii) is cost-free. While the physicalsimpliﬁcations ignore costs associated with step (iii), biologymust pay for communication at this stage. That is, physics onlylooks at each computational element as a solitary individual,performing but a single operation. There is no considerationthat each neuron participates in a large network or even thata logical gate must communicate its inference in a digital com-puter in a timely manner. Unlike idealized physics, Naturecannot aﬀord to ignore the energy requirements arising fromcommunication and time constraints that are fundamentalnetwork considerations (45) and fundamental to survival itself(especially time (20, 21)).According to the energy audit, the costs of communicationbetween neurons outweighs computational costs. Moreover,this relatively large communication expense further motivatesthe assumption of energy-eﬃcient IPI-codes (i.e., making alarge cost as small as possible is a sensible evolutionary priori-tization). Thus the output variable of computation is assumedto be the IPI, or equivalently, the spike generation that is thetime-mark of the IPIs endpoint.Furthermore, any large energy cost of communication sensi-bly constrains energy allocated to computation. Recalling ouroptimal limit with asymptotically zero bits/sec, it is unsustain-able for a neuron to communicate minuscule fractions of a bitwith each pulse out. To communicate the maximal bits/spikeat low bits/sec leads to extreme communication costs becauseevery halving of bits/sec requires at least a doubling of thenumber of neurons to maintain total bits/sec. Such increas-ing neuron numbers moves neurons farther away from eachother (see SI Appendix), requiring longer axons to reach theseother neurons and wider axons to avoid increased time delays,since such delays undermine the timely delivery of information(20, 21). This space problem arising from a larger numberof neurons is recognized as severely constraining brain evolu-tion and development as well as impacting energy-use (46–51).It is better for overall energy consumption and eﬃciency tocompute at a larger, computationally ineﬃcient bits/IPI thatwill feed the axons at some requisite bits/sec, keeping neuronnumber at some optimal level. To say it another way, a myopicbits/J optimization can lead to a nonsense result, such as zerobits/sec and asymptotically an inﬁnite number of neurons.

Nevertheless, assuming eﬃcient communication rates andtimely delivery that go hand-in-hand with the observed com-munication costs, there is still reason to expect that neuronalcomputation is as energy-eﬃcient as possible in supplying therequired bits/sec of information to the axons . The problemthen is to identify such a computation together with its bits/Jdependence and its inferred bits/sec.

A neurally relevant optimization.

How close is the optimized N bits/J to 2500?. The computations ofthis section combined with the earlier energy-audit imply aneﬃciency of ca. 1 . · bits/computational-J and less than7.5 bits/IPI for neurons completing their ﬁrst IPI. Comparingthe curves of Fig 4, the bits/J maximization that accountsfor all spike-dependent costs produces the agreeable result N ≈ Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX | omparison, the purely computational perspective of costs, Fig4b, indicates the exponentially increasing eﬃciency is reachedas a limit N →

0. 4a also indicates the optimization result isrobust around the optimizing N , changing little over a 7-foldrange; likewise, bits/IPI is robust. In sum for N = 2000,the neuron computational eﬃciency is inferior to the demonby ca. 10 but is optimal when other costs are considered.In fact, more detailed considerations below suggest slightlydowngrading bit-rate estimates.Using the notation Λ for the random variable (RV) of thetotal, unfailed input intensity (events/sec) to a neuron and ˆΛthe RV that is the neuron’s estimate, Fig 4A illustrates theconcave function being being maximized, I (Λ; T ) E (Λ ,T ) = I (Λ;ˆΛ) E (Λ ,T ) = log (ln( ˆ λ mx ˆ λ mn )) + log ( ( N + 1) N ) − log (2 πe )( A + N · B/ · E [ T ] ÷ n (1)where I (Λ; T ) and its equivalent I (Λ; ˆΛ) is the bits/IPI ofinformation gain (10). This gain arises from an additive neuroncommunicating its implicitly estimated latent variable’s valueˆΛ = ˆ λ as a ﬁrst-hitting time, T = t (i.e., RV T producingone particular realization, t ). The denominator, previouslyintroduced at the end of the energy audit, is the J/IPI perneuron as a function N . The ratio ˆ λ mx ˆ λ mn is essentially the ratioof the maximum rate of synaptic activation to the baselinerate. I (Λ; T ), derived in (10), requires Corollary 2 (below) forconversion to I(Λ; ˆΛ). Moreover, attending this result are therelated results, Lemma 2b and Corollary 1, which enhanceour understanding of the neuron’s computation. Just beforethese mathematical developments, we recall and interpret someresults of (10), one of which sheds light on the 10 discrepancywith the demon-result. Deeper insights into the deﬁned computation.

As noted in the in-troduction and developed in detail elsewhere (10), the neuron’scomputation is an estimation of its scalar latent variable Λ = λ . I (Λ; T ) = E Λ ,T [ p ( T | Λ) p ( T ) ] is the information gain for a Bayesianperforming estimation (52). Written this way, the relative en-tropy starts with the prior p ( t ) and via sampling, i.e., synapticactivations, implicitly arrives at a likelihood p ( t | λ ). The formof this conditional probability density is a maximum entropydevelopment, which is the best distribution in the sense ofmaximizing a gaming strategy (53). The maximum entropyconstraints are energy and unbiasedness. This likelihood alsocarries all of the information of sampling.Deﬁning (i) θ as threshold to ﬁre, (ii) E [ V syn ] as the averagesize of the synaptic event arriving at the initial segment, and(iii) E [ V syn ] as its second moment, from equations 12 and 6 of(10), p ( t | λ ) = θ √ πλt E [ V syn | λ ] exp(2 θE [ V syn | λ ] E [ V syn | λ ] − λtE [ V syn | λ ] E [ V syn | λ ] − θ λtE [ V syn | λ ] ) .While the only consistent marginal distribution we have yet todiscover is p ( λ ) = ( λ log( λ mx λ mn )) − with 0 < λ mn < λ < λ mx < ∞ , which is enough to infer the form of p ( t ) and of p ( λ | t ).Importantly, the IPI, t , is a suﬃcient statistic, which isinformation-equivalent to the likelihood p ( t | λ ) and so is thelatent RV estimate, ˆ λ = N ( N +1) t . The conditional meansquared error of the estimate is E [(ˆΛ − Λ) | λ ] = λ ( N +2)( N +1)

500 1000 1500 2000 2500 3000 35003.403.453.503.553.603.65 b i t s / J * a J * includes allspike - dependent energy ( × )

500 1000 1500 2000 2500 3000 350012345 Synaptic activations / neuron / IPI ( N ) b i t s / J b ( ) × J is computational energy × × Fig. 4.

Bits/J/neuron at optimal N . (a) The bits/J function, Eq (1) accounting for allspike-dependent energy-use, is concave and reaches a maximum when N is ca.2000. This efﬁciency decreases little more than 5% over a seven-fold range awayfrom this 2000. At this optimum there are 7.48 bits/spike. (b) The optimal N implies . · bits/computational-joule. (b) is calculated by changing Eq (1)’s denominatorto N · B ∗ E [ T ] ÷ (2500 · n ) instead of J ∗ := ( A + N · B/ ∗ E [ T ] ÷ n of Eq (1). as Corollary 1 here demonstrates. Thus we not only deﬁne aneuron’s computation, but can understand its performance asa statistical inference.Parsing Eq (1), the information rate increases at the rate ofca. log ( N ) while energy consumption increases in propor-tion to N . This disadvantageous ratio and the large optimizing N help explain the demon’s superior eﬃciency. Moreover, in-creasing the non-computational demands such that A ÷ B increases, leads to a larger optimal value of N and vice versa.Regardless, the corollaries of the next subsection clearly showthat the energy devoted to computation, or other N -dependentenergy consumers, restricts the precision of a neuron’s estima-tion and restricts the information a neuron generates whenthe neuron is required to be energy optimal. Mathematical derivations.

As an approximation of a result in(54), assume an empirical distribution of synaptic weights suchthat the second non-central moment is equal to twice themean squared (e.g., an exponential distribution). Note alsothat θ can be written as the product N , the average numberof synaptic increments, multiplied by the average synapticincrementing event E [ V syn | λ ] (with inhibition and capacitancetaken into account, (10)). That is, θ = N · E [ V syn | λ ]. Puttingthis assumption to work, we obtain a simpliﬁcation, and thereare two new corollaries based on the above p ( t | λ ). Lemma 1 . p ( t | λ ) = N (2 πλt ) − / exp( − λt − N λt + N ) . Proof : Start with p ( t | λ ) given earlier, substitute using θ = et al. · E [ V syn | λ ], and then note that E [ V syn | λ ] E [ V syn | λ ] = .At this point there is an instructive and eventuallysimplifying transform to create p (ˆ λ | λ ) from p ( t | λ ) Thetransform arises from the unbiased requirement, one of theconstraints producing the earlier optimization results (10).As a guess suppose the unbiased estimate is ˆ λ = N ( N +1) t orequivalently t = N ( N +1)ˆ λ , then use this relation to transform p ( t | λ ) to p (ˆ λ | λ ). Lemma 2a : p (ˆ λ | λ ) = √ N + 1(2 πλ ˆ λ ) − / exp( − λN N +1)ˆ λ − ˆ λ ( N +1)2 λ + N ). Lemma 2b : E [ ˆΛ | λ ] = λ = N N +1 · E [ T − | λ ] ;So ˆ λ = N ( N +1) t is indeed the desired unbiased estimate, whichhas a particular mean squared error. Corollary 1 : E [(ˆΛ − Λ) | λ ] = λ ( N +2)( N +1) . Proofs . See Methods.As the corollary shows, devoting more energy to computa-tion by increasing N reduces the error of the estimate. Specif-ically, the standard deviation decreases at the rate of 1 / √ N .Of course, computational costs increase in direct proportionto N .This corollary adds additional perspective to our deﬁnitionof a neuron’s computation as an estimate. Furthermore, thenew likelihood, p (ˆ λ | λ ), is particularly convenient for calculat-ing information rates, a calculation which requires one moreresult. That result is the marginal distribution of ˆΛ. Becausethe only known suﬃcient density (and arguably the simplest)is p ( λ ) = ( λ log( λ mx λ mn )) − , the estimate’s marginal density issimply approximated via Lemma 3. p (ˆ λ ) = R λmxλmn p ( λ ) p (ˆ λ | λ ) dλ ≈ (ˆ λ ln( λ mx λ mn )) − ,where the approximation arises by the near identity of theintegral to p ( λ ) assuming the range of λ and ˆ λ is the same.Moreover, the lack of ˆ λ bias for all conditioning values of λ hints that the approximation should be good. In fact usingMathematica at its default precision, numerical evaluationof R λmxλmn p ( λ ) p (ˆ λ | λ ) dλ indicates zero diﬀerence between thisintegral and (ˆ λ ln( λ mx λ mn )) − .The information rate per ﬁrst-IPI can now be evaluated. Corollary 2 . E T, Λ [log p ( T | Λ) p ( T ) ] = E ˆΛ , Λ [log p (ˆΛ | Λ) p (ˆΛ) ]= log (ln( ˆ λ mx ˆ λ mn )) + log ( ( N +1) πeN ) + E ˆΛ , Λ [log ( ˆΛΛ )] ≈ log (ln( λ mx λ mn )) + log ( ( N +1) πeN ) . Proof : E ˆΛ , Λ [log p (ˆΛ | Λ) p (ˆΛ) ] = h (ˆΛ) − h (ˆΛ | Λ) ≈ h (Λ) − h (ˆΛ | Λ).

Limitations on the information rate.

The bit/rate calculated aboveis arguably naive, even beyond the fact that we are assum-ing there is such a thing as an average neuron. First, underphysiological conditions, humans are constantly making deci-sions, including novel sensory acquisitions (e.g., a saccade andnew ﬁxation). Suppose that such a decision-making intervaland sensory reacquisition occurs every second. Then, manyneurons do not complete even their ﬁrst IPI. Such neuronsmake a much smaller information contribution, although stillpositive. To maintain average ﬁring rate, suppose half the time a neuron completes one IPI, one-quarter of the time twoIPIs, one-eighth of the time three IPIs, etc per decision-makinginterval. Thus half the time a neuron does not complete a ﬁrstIPI, one-quarter of the time a neuron completes a ﬁrst IPI butnot a second, etc. Each non-completed IPI has a bit value.Combining the contributions for complete and incomplete IPIsproduces a bit value of 5.1 bits/second for a 1 Hz neuron. SeeMethods for details.Shot-noise is potentially deleterious to bit rate as well.As a crude approximation of shot-noise aﬀecting the signal,suppose Shannon’s independent additive Gaussian channel; i.e,the mutual information is log σ signal + σ noise σ noise . In biophysicalsimulations, depending on synaptic input intensity, it takes 50to 250 Nav 1.6 activations to initiate an AP (44). Using thisrange as a Poisson noise and 2500 as the Poisson signal, thecapacity is much smaller than the rate of information gain,2.8 to 1.7 bits/sec. In fact simulations with this biophysicalmodel produce 3 bits/IPI (44). This value is probably anunderestimate by about one bit because the model did notcontain inhibition; without inhibition, synaptic excitation ratesare limited to less than 750 events to reach threshold vs the2500 here allowed by inhibition and dendrosomatic surfacearea. Discussion

The Results contribute to our understanding of computationin the brain from the perspective of Nature. Essentially, theResults analyze a deﬁned form of neural computation that is (i)based on postsynaptic activation and that is (ii) a probabilisticinference (10). From this deﬁned perspective, the correspond-ing bits/J is maximized as a function of N . This value of N is 2000, close enough to 2500 to substantiate the latter’s usein the audit. Likewise, it only changes the estimated synapsesper neuron from 10000 to 8000 given a 75% failure rate.As ﬁrst introduced into neuroscience in (3) and later em-phasized by (8), a certain class of bits/J optimizations canproceed if the denominator joule-term consists of two parts:a constant joule-consumption term added to a term in whichthe joule-consumption depends on the variable being opti-mized. Typically, this denominator consists of a constantenergy-consumption term plus a ﬁring-rate dependent term.However, here both denominator terms are dependent on themean ﬁring-rate (a constant); in addition, the second term isalso dependent on N . Thus there is the N -based optimizationof Fig 4a while in Fig 4b, the computational-bits/computation-J has no biologically meaningful maximization. Rather, 4bnotes the 1 . · computational-bits/computational-J atthe optimal N = 2000 from 4a, where all relevant costs areincluded.Another quantitative accomplishment is quantifying the10 discrepancy between the Demon’s optimal computationand a neuron’s optimal computation. From Fig 4b, we seethat if Nature selects for smaller N , the eﬃciency increasesexponentially. Indeed, a two-synapse neuron with 10,000-fold less surface area and a 1,000-fold decrease in the voltagebetween reset and threshold only misses the Demon’s valueby 10-fold. However, using such neurons leads to other, largercost increases if communication time is to remain constant.For a particular semi-quantitative analysis establishing thispoint, see SI Appendix. Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX | ne reason why the bits/J/spike of Eq. 1 (Fig 4a) increasesso slowly as N increase is that the synaptic inputs are assumedto be unclocked, asynchronous, and therefore approximatelyPoissonian (13). Unlike energy costs that grow in proportionto N , the slow information growth at the rate 2 − ln( N ) seemsunavoidable (5). Indeed for visual sensing (34) notes a similardiﬀerence in growth rates.Although the basis of our theory is IPI coding, this hy-pothesis has some relevance to optimization theories based onrate-coding, e.g., (5, 8). Speciﬁcally, each input to a neuron isa non-Poisson point process with an implicit rate. However,the union of these inputs is, to a good approximation, Poisson(13). This union of input lines creates the neuron’s latentRV Λ. Thus, each neuron is estimating the intensity of alocal-population rate-code over the time of each IPI. This mayexplain the similarity of bit-rate estimates between modelssince as calculated in the Results and Methods, the random-ness underlying this approximately Poisson signal is itself thelargest source of uncertainty (i.e, entropy). Finally, the rate-code approach (e.g., (5, 8)) might claim a greater generalityas it applies to many pulses whereas the current IPI theoryapplies with exactitude only to the ﬁrst IPI; the theory re-quires extensive work for application to later IPIs in cortexwhere neurons are receiving feedback during the generation oflater IPIs, which feedback can change the value of the neuron’sinitial input. Human and rodent energy audits.

The per-neuron values hereare relatively close to those obtained by Herculano-Houzel(28). Her value for the gray matter energy-use of humancortex is 1 . · − µ mol of glucose per neuron per minute,which converts to 2 . · − W/neuron in terms of ATP. Ourvalue is 1.94 · − W/neuron (Table S3). This small, 16%diﬀerence is not surprising since she uses the older glucosevalues of slightly more than 20 W per brain, and we use herregional brain weight values and cell counts.The top-down part of the audit can do no more than limitthe total ATP available among the deﬁned uses of ATP. Thisvalue is then subject to partitioning across speciﬁc, functionalconsumers.Staying as close to (11) as sensible, newer values are used(e.g., for conversion of glucose to ATP (15) and for the overlap-ping Na-K conductances of the AP (23)). Species diﬀerencesalso create unavoidable discrepancies, including average ﬁr-ing rate, the fraction of the time that the glutamate-primedNMDARs are voltage-activated, and, more importantly, thesurface area of rodent axons vs human axons. Other discrep-ancies arise from diﬀerences in partitioning of energy con-sumers. After removing WM costs, our partitioning of GMcreates three subdivisions: computation, communication, and

SynMod + . Although the partitioning of energy consumptionis at variance with (3) and (11), this is not a problem becausepartitioning is allowed to suit the question. On the otherhand, estimating the cost of SynMod + is problematic (seeouabain comments below). Moreover, the optimization hererequires a subpartitioning of SynMod + . The subcategoriesare partitioned as proportional to: time only, mean ﬁring-rate, and mean ﬁring-rate multiplied by N . Of relevance here,the cost of synaptic modiﬁcation, including metabotropic re-ceptor activation and postsynaptically activated kinases donot fall within the present deﬁnition of computation but areactivity-dependent SynMod + costs. An earlier human GM energy audit (24) comes to a diﬀerentconclusion than the one found here. Although our more con-temporary empirical values and more detailed analysis pointto many initial disagreements with this study, these initialdisagreements oﬀset each other so that (24) concludes the GMhas 3.36 ATP-W available, within 10% of our 3.09. On theother hand, there are two rather important disagreements: (i)the postsynaptic current associated with the average presy-naptic spike arrival and (ii) the total non-computational andnon-communication energy expenditures which go hand-in-hand with a discrepancy of resting potential costs. Regarding(i), the relative postsynaptic currents per spike diﬀer by nearly14-fold, and this diﬀerence arises from three sources. First,in (24) synaptic success rates are 2-fold greater than the rateused in (11) and here. Second, the number of synapses is 2.2-fold greater (we use newer values from normal tissue). Third,average synaptic conductance per presynaptic release is 3-foldgreater than the values here (again using newer values (54)).See Methods and SI Appendix for details.The other disagreement, (ii) arises because (24) concludesthat 50% of the ATP goes to processes independent of electricalpotentials (i.e., independent of computation plus communica-tion). The earlier work bases its values on ouabain studies.While there is no argument that ouabain poisons the Na-KATPase pump preventing it from metabolizing ATP, there isclear evidence that ouabain activates other functions known toincreases ATP consumption. Ouabain increases spontaneoustransmitter release (55) and depolarizes neurons. One mustassume until shown otherwise that these two eﬀects stimulatemany ATPases and ATP-consuming processes that would notnormally occur at rest. These include internal CaATPasesto handle Ca + SynMod + .Our ultimate problem with (24) are the two calculationsof N that it implies. Taking (24)’s values of failure rate andsynapse number implies that N = 8750 while applying (24)’scosts to the optimization of Eq. 1 produces N ≈ General relevance of Results.

Outside of neuroscience.

Because there is some interest e.g.,(57, 58) outside of neuroscience to reproduce neurally medi-ated cognition on a limited energy budget, the energy-audithere brings increased speciﬁcity to a comparison between theevolved biological vs the human engineered perspective. Inparticular, engineers often tout brain function as consumingenergy at what they consider a modest 20 W given the dif-ﬁculty they have in reproducing human cognition. Here weprovide a more precise set of comparisons. Our computationcan be compared to the job performed by the central process-ing unit. Communication has it’s two major forms deﬁnedhere, axonal costs and presynaptic functions, which must becompared to communication into and out of memories plusthe communication of clock pulses. Perhaps maintenance canbe compared to memory refresh costs. However, comparing et al. ower conversion loss by a computer to the heat generationof intermediary metabolism is challengeable since heating isfundamental to mammalian performance. A better compari-son might be between the cost of cooling a computer and thebiological heating cost. Inside neuroscience.

Although the primary goal of the energyaudit is an estimate of the cost of computation per se , theaudit also illuminates the relative energetic costs of variousneural functions. Notably for humans, the audit reveals thataxonal resting potential costs are greater than the ﬁring-ratecosts. This axonal rest expense is directly proportional tothe leak conductance and axonal surface area. Thus, of allthe parameters, these two might beneﬁt the most from betterempirical data. Regarding these large, leak-associated costs,two additional points seem relevant. First, regarding fMRIstudies that measure regional brain metabolism, the smallincreases of oxygen consumption over baseline consumption(59) are consistent with the high, continuous cost of axonalleak.Second, arguing from her data and data of other studies(28), Herculano-Houzel presents the intriguing hypothesis thataverage glucose consumption per cortical neuron per minuteis constant across mammalian species. Qualitatively, this ideais consistent with the increase in neuron numbers along withthe decrease of ﬁring rates found in humans vs rats. How-ever, it seems that the hypothesis can only be quantitativelycorrect if axonal leak-conductance in humans is much lowerthan in animals with smaller brains and presumably shorteraxons of smaller diameters. This topic deserves more detailedexploration.Hopefully the work here motivates further empirical work,especially using primates, to improve the energy-audit andthe calculations that ensue. Such empirical work includesbetter surface area measurements and a better idea about theNMDAR oﬀ-rate time constant. Finally, going beyond theaverage neuron, perhaps someday there will be energy-auditsfor the diﬀerent cell types of cortex.

Materials and Methods

Partitioning glucose by region and by metabolic fate.

This sectionexplains the top-down calculations of Table 1. The glucose-uptakevalues combine the regional uptakes, reported in terms of per 100g of tissue from Graham et al. (27) as copied into our Table S1along with the reported regional masses from Azevedo et al. (60).We choose this uptake study because of its use of the [ C]glucosetracer and its straightforward application to obtain regional netglucose uptakes. Multiplying regional masses by uptake values,and converting to appropriate units as in Table S1, yields the ﬁrst"Watts" column of Table 1. These glucose-watts are calculated using2.8 MJ/mol (61). The regional uptakes are combined to producethe brain total as illustrated in Fig S1.Following the ﬂow diagram of Fig S1, next we remove the non-oxidized glucose from regional and total uptakes. We use an oxygen-glucose index (OGI) value of 5.3 (out of 6 possible oxygen moleculesper one glucose molecule). We assume the OGI is constant acrossregions and that we can ignore other, non-CO carbons that enterand leave the brain. Thus, these simple glucose-watts are split intooxidized and non-oxidized as produced in Table 1 and illustrated inFig S1.As the energy source, the oxidized glucose is then partitioned intotwo diﬀerent metabolic fates: heating and ATP. Again we assumethis process is constant across regions and that the brain does notdiﬀer too much from other regions which have been studied in greater depth. The biological conversion is calculated using Nath’storsional mechanism, which yields 37 ATP molecules per moleculeof glucose and 36,000 J/mol of ATP at 37 ◦ C. Computation Costs.

Our "on average" neuron begins at its resetvoltage and then is driven to a threshold of -50 mV and then onceagain resets to its nominal resting potential of -66 mV. Betweenreset and threshold, the neuron is presumed to be under constantsynaptic bombardment with its membrane potential, V m , constantlychanging. To simplify calculations, we work with an approximatedaverage V m , V ave of -55 mV; this approximation assumes V m spendsmore time near threshold than reset. (Arguably the membranepotential near a synapse which is distant from the soma is a coupleof mVs more depolarized than the somatic membrane voltage, butthis is ignored.) To determine the cost of AMPAR computation, weuse the ion preference ratios calculated from the reversal potentialand use the total conductance to obtain a Na + conductance of114.5 pS per 200pS AMPAR synapse as seen in Table S4. (The ion-preference ratios used for the calculations in Table S4 are calculatedfrom the reported reversal potential value of -7 mV (62) and theindividual driving forces at this potential, − − ( −

7) = − mV forK + and 55 − ( −

7) = 62 mV for Na + .) Multiplying the conductanceby the diﬀerence between the Na + Nernst potential and the averagemembrane potential ( V Na,Nern − V ave ) yields a current of 12.5 pAper synapse. Multiplying this current by the SA duration convertsthe current to coulombs per synaptic activation, and dividing this byFaraday’s constant gives us the moles of Na + that have entered persynaptic activation. Since 1 ATP molecule is required to pump out 3Na + molecules, dividing by 3 and multiplying by the average neuronﬁring rate and success rate yields 1 . · − mols-ATP/synapse/sec.Multiplying by the total number of synapses (1 . · ) implies therate of energy consumption is 0.069 W for AMPAR computation.When NMDARs are taken into account, the total computationalcost is 0.10 W (assuming that NMDARs average conductance ishalf as much as AMPAR’s).Table S4 lists the excitatory ion-ﬂuxes mediated by AMPARsand NMDARs. The cost of the AMPAR ion ﬂuxes is straightforward.The cost of NMDARs ion ﬂuxes depends on the oﬀ-rate time constantas well as the average ﬁring rate. That is, if this oﬀ-rate timeconstant is as slow as 200 msec and the IPI between ﬁrings ofthe postsynaptic neuron is 500 msec or more (such as the 1 secinterval that comes from the 1.0 Hz frequency used in the followingcalculations), then most glutamate-primed NMDARs will not bevoltage activated. Thus, in contrast to the rat where the AMPARand NMDAR ﬂuxes are assumed to be equal, here we assume theion-ﬂuxes mediated by NMDARs are half that of the AMPARs andmultiply the AMPAR cost by 1.5 to obtain the ﬁnal values in TableS4.The spike-generator contributes both to computation and tocommunication; fortunately, its energetic cost is so small that it canbe ignored. Communication Costs.

Table S5 provides an overview of the commu-nication calculations, which are broken down into Resting PotentialCosts, Action Potential Costs, and Presynaptic Costs. The followingsections explain these calculations, working towards greater andgreater detail.In general, the results for communication costs are built onless-than-ideal measurements requiring large extrapolations. Forexample, there does not seem to be any usable primate data. Theproper way to determine surface area is with line-intersection counts,not point counts, and such counts require identiﬁcation of almostall structures. As the reader will note in the supplement, use ofmouse axon diameters produces much larger surface areas, thusraising communication costs and decreasing the energy available forcomputation and

SynMod + . Resting Potential Costs.

The cost of the resting potential itself issimply viewed as the result of unequal but opposing Na + and K + conductances. If other ions contribute, we just assume that theirenergetic costs eventually translate into Na + and K + gradients. Theaxonal resting conductance uses the recent result of 50 kΩ cm (25).With our surface area of 21 . · cm (includes axonal boutons, seeTable S6), this produces a total conductance of 436 S. The drivingvoltage for each ion is determined by subtracting the appropriate Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX | ernst potential from the assumed resting membrane potentialof -66 mV. Using Nernst potentials of +55 mV and -90 mV forNa + and K + resp., just assume currents are equal and opposite atequilibrium. Thus, conductance ratios derive from the equilibrium: −

24 mV · g K = −

121 mV · g Na ; implying g K = 5 . g Na ; furtherimplying g Na g Na + g K = . . The Na + -conductance times the drivingvoltage yields the Na + -current, 0 .

121 V · . ·

436 S = 8.73 A.Scaling by Faraday’s constant implies the total Na + inﬂux; thendivide by 3 to obtain mols of ATP required to pump out this inﬂux,3.02 · − molATP/s. Multiplying by 36,000 J/molATP yields 1.09W, the resting potential cost.Plasma membrane leak is a major energy expenditure, 22% ofATP-W here compared to 13% in (11). Here however, we emphasizethat this cost is 66% of gray matter communication costs. Thediﬀerences in percentages arise from diﬀerent interpretations of afunctioning neuron and of the meaning of certain measurements.Our distinction between the cost of reset diﬀers from their cost ofresting potentials: here resting cost is entirely axonal and essentiallycontinuous across time. Their resting cost is dendrosomaticallybased and deviates from our assumption that a neuron is underconstant synaptic bombardment. Action Potential Costs.

Action potential costs are calculated fromNa + pumping costs; see Table S5. The coulombs to charge a 110 mVaction potential for the non-bouton axon starts with the product ofthe total GM axonal capacitance, 14 . . · . · . .

61 amps. To account forthe neutralized currents observed by Hallerman et al. (23), multiplythis by 2.28, yielding 3.66 A.Bouton costs, although clearly part of an axon, are calculated sep-arate from the axon. As will be detailed later, our approximation ofsurface areas treats all presynaptic structures as bouton terminaux ,and rather than assume tapering for impedance matching purposes,presume an abrupt transition of diameters. Importantly, we assumethat a bouton mediates a calcium spike and that this spike onlyrequires a 0.02 V depolarization to be activated. Altogether, therate of Na + coulomb charging for boutons is 6 .

34 F · .

02 V · + and bouton charging determinesthe Na + to be pumped. Faraday’s constant converts coulombs persec to mols of charge per sec, yielding a Na + ﬂux of 3 . · − mol/sec. Dividing by 3 converts to ATP mol/sec; multiplying thisvalue by Nath’s 36,000 J/molATP yields the total action potentialcost of 0.47 W.To calculate bits/J requires WMAP costs. Assume that theoligodendrocytes (especially myelogenesis) are using energy solelyto support the AP. Then we approximate two-thirds of the 1.85 Wenergy goes to WMAP, 1 .

23 W.The action potential values here largely agree with (20), butthere are a number of important diﬀerences. They use an old, non-mammalian value for overlap. The neutralized current ﬂux of the APin mammals is 2.28 (23) at the initial segment, far from the multiplierof four they use. Furthermore, the plotted values in (20) Fig 7A arenot adjusted for overlap. This ﬁgure uses an axonal length of 1 µ m;therefore, for the axonal diameter plot-point of 0.5 µ m, the surfacearea is π/ · − m = 1 . · − cm . This implies a capacitanceof 1 . · − F. Then the total charge needed for 0.1 V polarizationis 1 . · − coulombs. Multiplying by the number of charges percoulomb yields 1 . · − · . · = 9 . · ≈ Na + , theplotted value of ﬁg 7A (20). Thus the neutralized Na + ﬂux wassomehow lost when the y-axis was labelled. With this understanding,our values only diﬀer from (20) because the calculations here usethe mammalian measured overlap of 2.28. Presynaptic AP Costs.

The presynaptic transmitter-associated costsare mostly based on the values of Attwell and Laughlin (11) andof Howarth et al. (16). The assumptions include an assumed 25%success rate of vesicular release for each cortical spike (1 . · spikes/sec under the 1 Hz and 1 . · synapses assumptions).However, in contrast to Howarth et al. (16), which uses a numbersupported by observations in calyx of Held (63) and in cell cultures(64), the observations of Stevens and Wang (22) in CA1 hippocampalpyramidal neurons indicate that the same calcium inﬂux occurs forboth synaptic successes and failures. Because adult hippocampalsynapses seem a better model of cerebral cortical synapses than calyx or tissue culture synapses, we use the hippocampal observations.Therefore, the 1 Hz ﬁring rate produces a Ca cost that is morethan 8-fold greater than the cost of vesicle release events (VRevents, Table S5). The Ca inﬂux per action potential is 1 . · Ca /vesicle, and assuming 1 ATP is required to pump out eachCa , the Ca cost is 1 . · ATPs/vesicle. Multiplying thisby 1 . · APs/sec for the gray matter, dividing by Avogadro’snumber, and ﬁnally multiplying by 36 kJ/molATP yields a totalpresynaptic Ca cost of 0.11 W.The cost per vesicle release is determined by adding the pack-aging and processing costs and then multiplying by the number ofglutamate molecules per vesicle as in (11) and (16). Adding thecost of membrane fusion and endocytosis yields a total of 5,740ATPs/vesicle (16). This value is multiplied by the VR events persecond and divided by Avogadro’s number to obtain 3 . · − ATPmol/sec. Converting to watts yields a presynaptic transmitterrelease cost of 0.01 W and a total presynaptic cost of 0.12 W forthe GM.

Synapse counts.

Both computation and communication costs de-pend on the number of cortical synapses. For the approach takenhere, computational costs scale in a one-to-one ratio to synapticcounts while communication costs scale proportionally, but with asmaller proportionality constant.The calculations use the Danish group’s synapse counts of 1 . · (65). The alternative to the numbers used here report an80% larger value (66); however, their human tissue comes fromnominally non-epileptic tissue from severely epileptic patients. Sincethe incredibly epileptic tissue is likely to stimulate the nearby non-epileptic tissue at abnormally high ﬁring rates, we ﬁnd the data’simport questionable. Estimation of Surface Areas from Mouse and Rabbit Data.

Herevolume-fraction data are used to estimate axon and presynapticsurface areas. As far as we know, there are two journal-published,quantitative EM studies of cerebral cortex that are suitable for ourpurposes: one in rabbit (67) and one in mouse (68). (Althoughstructural identiﬁcations do not neatly conform to our simplifyingcylindrical assumptions, we can still use their data to direct and tocheck our estimates.)Chklovski et al. (68) report a 36% volume-fraction for smallaxons, 15% for boutons, 11% for glia, 12% for other, and 27% fordendrites and spines as read from their graph in their Figure 3.They purposefully conducted their evaluations in tissue that lackedcell bodies and capillaries. Because cortical tissue does contain cellbodies and capillaries, this will produce a small error for the averagecortical tissue. More worrisome is the size of "other," half of whichcould be very small axons.The quantiﬁcation by Schmolke and Schleicher (67) examinesthe rabbit visual cortex. Their evaluation partitions cortex into twotypes of tissue: that with vertical dendritic bundling and that whichlacks dendritic bundling (they do not seem to report the relativefraction of the two types of cortex, but we assume the tissue withoutbundling dominates over most of cortex). For boutons and axonsrespectively, they report volume fraction values within bundles of17% and 20% and values between bundles of 26% and 29%.The 30% axonal volume fraction used in Table S6 is a compromisebetween the (68) value of 36% and the two values from (67). Theaverage of the within bundle and between bundle volume-fractionsfrom (67) is used for boutons. Speciﬁcally, the approximated hu-man volume fractions are (i) 22% boutons, (ii) 30% small axons,(iii) 11% glia, (iv) 5% neuronal somata, (v) 3% vasculature, (vi)29% dendrites, spineheads, and spine-stems, totaling 100%. (It isassumed that standard ﬁxation removes almost all of the physiolog-ical extracellular space and, naively, shrinkage/swelling has littlerelative eﬀect on these values.) The calculations are essentiallyunaﬀected by the two conﬂicting bouton volume fractions since thediﬀerence between the two possible calculations is negligible.Table S6 lists the critical values, the intermediate values for thecylindrical model to ﬁt the data, and ﬁnally the implications forthe relevant membrane capacitance.

Cylindrical model approximations for axons and boutons.

Axons : Bymaking a cylindrical assumption and assuming the average smallaxon’s diameter is 0 . µ m (radius = 0 . · − cm), a smallextrapolation of a cross-species result in the cerebellum (69), we et al. an estimate the total surface area of these unmyelinated axonsusing the 30% volume-fraction to calculate the length of an averageaxon, L ax . The total volume (cm ) occupied by all such axons is L ax · . · · π (0 . · − ) . Dividing this volume by the volumeof the GM (632 cm ) must equal the volume fraction, 0.3. Solvingyields L ax = 6 .

44 cm. Then net surface area is calculated usingthis length, the same diameter and number of neurons, 6 . · . · · π · . · − = 1 . · cm . For an independent calculationof axon length based on LM data, see SI Appendix. Boutons : The surface area estimates also treat boutons (Btn)as uniform cylinders of a diﬀerent diameter. Assume that corticalpresynaptic structures in humans are no bigger than in any othermammalian species. To determine bouton surface area, assume abouton diameter ( d pb ) 1 . µm and height ( h pb ) 1 . µm . Denote thetotal number of synapses in the gray matter as n gm (1 . · ). (Notethat the cylinder area of interest has only one base.) Then, withthe formulation A pb = n gm π ( d pb h pb + ( d pb ) ), the bouton surfacearea works out to A pb = 1 . · π (1 . µm · . µm +(0 . µm ) ) =6 . · cm . See Tables S6 and S7.We assume a bouton only accounts for one synapse. However,larger boutons can contact multiple, distinct postsynaptic neurons.Thus the small cylinders, as individual synapses, are an attemptto approximate such presynaptic conﬁgurations. See Table S8 formore details and for the eﬀect of overestimating areas. Oxidized vs. non-oxidized glucose.

Arteriovenous blood diﬀerencesindicate that insuﬃcient oxygen is consumed to oxidize all theglucose that is taken up by the brain. Supposing glucose is theonly energy-source, it takes six O ’s for complete oxidation. Thecalculations use an OGI value of 5.3 (70). Other values fromarteriovenous diﬀerences are found in the literature (71–73). Evenbefore these blood diﬀerences where observed, Raichle’s lab proposedas much as 20% of the glucose is not oxidized (29). Glucose to ATP based on Nath’s theory.

Table S2 oﬀers the reader achoice between Nath’s torsional conversion mechanism of glucoseto ATP (15, 74, 75) versus the conventional conversion to ATPbased on Mitchell’s chemiosmotic theory (76). According to Nath,the minimum number of ATP molecules produced per molecule ofglucose oxidized is 32, and this includes mitochondrial leak and slip(15). Nath’s calculations are based on free-energy values under phys-iological conditions. However, his calculations are recent while thestandard model has been taught for decades, although not withoutcontroversy (77). The standard textbook number for this conversionis 33 ATPs per molecule of glucose before accounting for mitochon-drial proton leak and slip. Since leak is often assumed to consume20% of the energy that might have gone to ATP production inoxidative phosphorylation (11, 78), the Mitchell conversion numberis reduced from 33 to 27 molecules of ATP (2 ATPs are producedby glycolysis and 2 by the Krebs cycle, so this 20% reduction onlyapplies to the ATP produced in the electron transport chain).

SynMod + . Here

SynMod + is not directly calculated. Rather it isresidual of the energy available after removing the above uses. Theassumed subpartitioning occurs as follows. Assume 10% of thisgoes to time proportional costs; assume the postsynaptic fraction,accounting for metabotropic activations, receptor modiﬁcation, andactin polymerization-depolymerization cycles, equals 0 .

134 W, whichis activity and synapse number dependent. The remainder, devotedto synaptogenesis and ﬁring rate dependent axonal and dendriticgrowth (e.g., membrane construction, protein insertion, axo- anddendro-plasmic transport) is just activity dependent.

Proofs.

The proof of lemma 2a is just a textbook change of variablefrom one density to another (79) where dt = N ( N +1)ˆ λ d ˆ λ ; to prove corollary 1 and the ﬁrst equality of lemma 2b , use to calculatethe appropriate conditional moments, which Mathematica obliges;to prove the second equality of , use lemma 1 to calculate theindicated conditional moment. Parameterizing the marginal prior p ( λ ) . As derived from ﬁrst princi-ples (10), the only known, consistent marginal prior of the latent RVis p ( λ ) = ( λ ln( λ mx λ mn )) − where the bounds of the range of this RV, and thus its normalizing constant, are the subject of empirical ob-servations and the required deﬁnition λ ∈ (0 < λ mn < λ mx < ∞ ).From the energy-audit, use the 1 Hz average ﬁring rate. Then E [Λ], the mean marginal total input ﬁring rate, is 10 /sec. Nowsuppose that the rate of spontaneous release is 1 Hz over these 10 synapses giving us λ mn = 1. With one unknown in one equation, E [Λ] = λ mx − λ mn ln( λmx ) = 10000, Mathematica produces λ mx ≈ Adjusting the bit-rate calculation for multiple IPIs per decision-mak-ing interval (DMI).

The 7.48 bits per IPI only applies to a neuron’sﬁrst IPI. Later spikes are worth considerably less using the currentsimplistic model of a ﬁxed threshold and no feedback. Moreover,while maintaining the average ﬁring rate, we might suppose that onlyhalf the time a neuron completes a ﬁrst IPI, half of these completea second IPI, and so. Thus the average number of spikes per DMIremains nearly one. With a ﬁxed threshold, the bit values of thelater spikes are quite small. The value of the second through fourthspikes are { log ( NN ) , log ( N N ) , log ( N N ) } which gives ca.0.35 bits. However complementing the completion of ﬁrst IPI is,half the time, the bit contribution of an uncompleted IPI, 0 . · . / .

35 + 1 ≈ . Shot-noise can effect bit-rate but not as much as the signal.

As mea-sured in the biophysical simulations (44), the most deleteriousdegradation of a neuron’s computation arises, not from thermalnoise or shot-noise (45), but from the neuron’s input signal itself.Here is a calculation consistent with this biophysical observation.Using stochastic NaV 1.2 and NaV 1.6 channels in a biophysicalmodel of a rat pyramidal neuron, it is possible to observe shot-noiseand to estimate the number of such channels that are activatedat threshold. With relatively slow depolarization, there are lessthan 250 channels on when threshold is reached, and this numberof channels seems to contribute less than 1.6 mV (see Fig 5 in (44)).Thus modeling channel activation as a Poisson process with rate250 and individual amplitudes of 6.4 µ V, Campbell’s theorem (80)produces the variance; this variance is less than 250 · (6 . · − ) =1 . · − . The same calculation for the input excitation yields avariance of 2500 · (6 . · − ) = 1 . · − , a 10 : 1 ratio. Numerically-based optimization calculations.

Optimizing the bits/Jequation uses Mathematica. Treat N , the average number of eventsper IPI, as a continuous variable. Then to optimize, take thederivative, dN , of the single neuron, single IPI bit/J formulation.Set the numerator of this derivative equal to zero and solve for N using Mathematica’s NSolve. ACKNOWLEDGMENTS.

The authors are grateful for commentsand suggestions of earlier versions provided by Costa Colbert, RobertBaxter, Sunil Nath, and David Attwell.

1. L Sokoloff, The metabolism of the central nervous system in vivo.

Handb. Physiol. Sect. I,Neurophysiol . , 1843–1864 (1960).2. J Sawada, D Modha, Synapse: Scalable energy-efﬁcient neurosynaptic computing in Appli-cation of Concurrency to System Design (ACSD), 2013 13th International Conference on,pages xiv–xv . (2013).3. WB Levy, RA Baxter, Energy efﬁcient neural codes.

Neural Comput . , 289–295 (1996).4. RM Alexander, Optima for animals . (Princeton University Press), (1996).5. V Balasubramanian, D Kimber, MJ Berry II, Metabolically efﬁcient information processing.

Neural computation , 799–815 (2001).6. WB Levy, RA Baxter, Energy-efﬁcient neuronal computation via quantal synaptic failures. J.Neurosci . , 4746–4755 (2002).7. P Sterling, S Laughlin, Principles of neural design . (MIT Press), (2015).8. V Balasubramanian, Heterogeneity and efﬁciency in the brain.

Proc. IEEE , 1346–1358(2015).9. JV Stone,

Principles of neural information theory: Computational neuroscience and metabolicefﬁciency . (Sebtel Press), (2018).10. WB Levy, T Berger, M Sungkar, Neural computation from ﬁrst principles: Using the maximumentropy method to obtain an optimal bits-per-joule.

IEEE Transactions on Mol. Biol. Multi-Scale Commun . , 154 –165 (2016).11. D Attwell, SB Laughlin, An energy budget for signaling in the grey matter of the brain. J.Cereb. Blood Flow Metab . , 1133–1145 (2001).12. AD Mayer, Ph.D. thesis (Graduate School of Arts and Sciences, University of Pennsylvania)(1959). Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX |

3. T Berger, WB Levy, A mathematical theory of energy efﬁcient neural computation and com-munication.

IEEE Transactions on Inf. Theory , 852 –874 (2010).14. M Sungkar, T Berger, WB Levy, Capacity achieving input distribution to the generalized in-verse gaussian neuron model in . (IEEE), pp. 860–869 (2017).15. S Nath, The thermodynamic efﬁciency of atp synthesis in oxidative phosphorylation. Biophys.chemistry , 69–74 (2016).16. C Howarth, P Gleeson, D Attwell, Updated energy budgets for neural computation in theneocortex and cerebellum.

J. Cereb. Blood Flow Metab . , 1222–1232 (2012).17. E Engl, R Jolivet, CN Hall, D Attwell, Non-signalling energy use in the developing rat brain. J.Cereb. Blood Flow & Metab . , 951–966 (2017).18. P Crotty, T Sangrey, WB Levy, The metabolic energy cost of action potential velocity. J.Neurophysiol . , 1237–1246 (2006).19. TD Sangrey, WO Friesen, WB Levy, Analysis of the optimal channel density of the squid giantaxon using a re-parameterized hodgkin-huxley model. J. neurophysiology , 2541–2550(2004).20. JA Perge, K Koch, R Miller, P Sterling, V Balasubramanian, How the optic nerve allocatesspace, energy capacity, and information. J. Neurosci . , 7917–7928 (2009).21. JA Perge, J Niven, E Mugnaini, V Balasubramanian, P Sterling, Why do axons differ in cal-iber? J. Neurosci . , 626–638 (2012).22. CF Stevens, Y Wang, Facilitation and depression at single central synapses. Neuron ,795–802 (1995).23. S Hallermann, CP De Kock, GJ Stuart, MH Kole, State and location dependence of actionpotential metabolic cost in cortical pyramidal neurons. Nat. neuroscience , 1007 (2012).24. P Lennie, The cost of cortical computation. Curr. biology , 493–497 (2003).25. M Raastad, The slow depolarization following individual spikes in thin, unmyelinated axons inmammalian cortex. Front. cellular neuroscience , 203 (2019).26. AA Faisal, SB Laughlin, Stochastic simulations on the reliability of action potential propaga-tion in thin axons. PLoS computational biology , e79 (2007).27. MM Graham, M Muzi, AM Spence, F O’Sullivan, , et al., The fdg lumped constant in normalhuman brain. The J. Nucl. Medicine , 1157–1166 (2002).28. S Herculano-Houzel, Scaling of brain metabolism with a ﬁxed energy budget per neuron:Implications for neuronal activity, plasticity and evolution. PLoS ONE , e17514 (2011).29. PT Fox, ME Raichle, MA Mintun, C Dence, Nonoxidative glucose consumption during focalphysiologic neural activity. Science , 462–464 (1988).30. W Bialek, F Rieke, R De Ruyter Van Steveninck, D Warland, Reading the neural code.

Sci-ence , 1854 –1857 (1991).31. SB Laughlin, RR de Ruyter van Steveninck, JC Anderson, The metabolic cost of neuralinformation.

Nat. Neurosci . , 36–41 (1998).32. PA Abshire, AG Andreou, Relating information capacity to a biophysical model for blowﬂyretina. IJCNN’99. Int. Jt. Conf. on Neural Networks. Proc . , 182–187 (1999).33. P Dayan, LF Abbott, Theoretical neuroscience:computational and mathematical modeling ofneural systems . (MIT press), First edition, (2001).34. J Niven, J Anderson, SB Laughlin, Fly photoreceptors demonstrate energy-information trade-offs in neural coding.

PLoS biology , e116 (2007).35. JJ Harris, R Jolivet, E Engl, D Attwell, Energy-efﬁcient information transfer by visual pathwaysynapses. Curr. Biol . , 3151–=3160 (2015).36. R Landauer, Irreversiblity and heat generation in the computing process. IBM journal researchdevelopment , 183 –191 (1961).37. D Middleton, An introduction to statistical communication theory . (McGraw-Hill), First edition,(1960).38. A Papoulis,

Probability,random variables, and stochastic processes . (McGraw-Hill, Inc), Thirdedition, (1991).39. R Sarpeshkar, T Delbruck, CA Mead, White noise in mos transistors and resistors.

IEEECircuits Devices Mag . , 23–29 (1993).40. TM Cover, JA Thomas, Elements of information theory . (John Wiley and Sons, Inc), Firstedition, (1991).41. H Leff, AF Rex,

Maxwell’s Demon 2 Entropy, Classical and Quantum Information, Computing .(CRC Press), (2002).42. JM Parrondo, JM Horowitz, T Sagawa, Thermodynamics of information.

Nat. physics ,131–139 (2015).43. CH Bennett, The thermodynamics of computation- a review. Int. J. Theor. Phys . , 905–940(1982).44. C Singh, WB Levy, A consensus layer v pyramidal neuron can sustain interpulse-intervalcoding. PloS one , e0180839 (2017).45. SB Laughlin, TJ Sejnowski, Communication in neuronal networks. Science , 1870–1874(2003).46. G Mitchison, Axonal trees and cortical architecture.

Trends neurosciences , 122–126(1992).47. DB Chklovskii, CF Stevens, Wiring optimization in the brain in Advances in Neural InformationProcessing Systems . pp. 103–107 (2000).48. K Zhang, TJ Sejnowski, A universal scaling law between gray matter and white matter ofcerebral cortex.

Proc. Natl. Acad. Sci . , 5621–5626 (2000).49. E Bullmore, O Sporns, The economy of brain network organization. Nat. Rev. Neurosci . ,336 (2012).50. J Karbowski, Cortical composition hierarchy driven by spine proportion economical maximiza-tion or wire volume minimization11. PLoS computational biology , 320–336 e1004532(2015).51. IE Wang, TR Clandinin, The inﬂuence of wiring economy on nervous system evolution. Curr.Biol . , R1101–R1108 (2016).52. DV Lindley, On a measure of the information provided by an experiment. The Annals Math.Stat . , 986–1005 (1956).53. P Grünwald, Strong entropy concentration, game theory, and algorithmic randomness. Int.Conf. on Comput. Learn. Theory (COLT ’01) , 320–336 (2001). 54. M Medalla, JI Luebke, Diversity of glutamatergic synaptic strength in lateral prefrontal versusprimary visual cortices in the rhesus monkey.

J. Neurosci . , 112–127 (2015).55. BF Baker, AC Crawford, A note on the mechanism by which inhibitors of the sodium pumpaccelerate spontaneous release of transmitter from motor nerve terminals. The J. physiology , 209–226 (1975).56. JW Deitmer, WR Schlue, Intracellular na+ and ca++ in leech retzius neurones during inhibi-tion of the na+/k+ pump.

Pﬂugers Arch . , 195–201 (1983).57. J Hasler, Special report: Can we copy the brain?-a road map for the artiﬁcial brain. IEEESpectr . , 46–50 (2017).58. CD Schuman, et al., A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 , (2017).59. PT Fox, ME Raichle, Focal physiological uncoupling of cerebral blood ﬂow and oxidativemetabolism during somatosensory stimulation in human subjects. Proc. Natl. Acad. Sci . ,1140–114 (1986).60. FA Azevedo, et al., Equal numbers of neuronal and nonneuronal cells make the human brainan isometrically scaled-up primate brain. J. Comp. Neurol . , 532–541 (2009).61. DL Nelson, AL Lehninger, MM Cox, Lehninger principles of biochemistry . (Macmillan), Fourthedition, (2008).62. M Yoshimura, T Jessell, Amino acid-mediated epsps at primary afferent synapses with sub-stantia gelatinosa neurones in the rat spinal cord.

The J. physiology , 315–335 (1990).63. ID Forsythe, T Tsujimoto, M Barnes-Davies, MF Cuttle, T Takahashi, Inactivation of presy-naptic calcium current contributes to synaptic depression at a fast central synapse.

Neuron , 797–807 (1998).64. DL Brody, DT Yue, Release-independent short-term synaptic depression in cultured hip-pocampal neurons. J. Neurosci . , 2480–2494 (2000).65. B Pakkenberg, et al., Aging and the human neocortex. Exp. gerontology , 95–99 (2003).66. L Alonso-Nanclares, J González-Soriano, JR Rodriguez, J DeFelipe, Gender differences inhuman cortical synaptic density. Proc. Natl. Acad. Sci . , 14615–14619 (2008).67. C Schmolke, A Schleicher, Structural inhomogeneity in the neuropil of lamina ii/iii in rabbitvisual cortex. Exp. brain research , 39–47 (1989).68. DB Chklovskii, T Schikorski, CF Stevens, Wiring optimization in cortical circuits. Neuron ,341–347 (2002).69. KD Wyatt, P Tanapat, SSH Wang, Speed limits in the cerebellum: constraints from myelinatedand unmyelinated parallel ﬁbers. Eur. J. Neurosci . , 2285–2290 (2005).70. SN Vaishnavi, et al., Regional aerobic glycolysis in the human brain. Proc. Natl. Acad. Sci . , 17757–17762 (2010).71. J Wahren, K Ekberg, E Fernqvist-Forbes, S Nair, Brain substrate utilisation during acutehypoglycaemia. Diabetologia , 812–818 (1999).72. PJ Boyle, et al., Diminished brain glucose metabolism is a signiﬁcant determinant for fallingrates of systemic glucose utilization during sleep in normal humans. The J. clinical investiga-tion , 529–535 (1994).73. P Rasmussen, et al., Brain nonoxidative carbohydrate consumption is not explained by exportof an unknown carbon source: evaluation of the arterial and jugular venous metabolome. J.Cereb. Blood Flow Metab . , 1240–1246 (2010).74. S Nath, Beyond the chemiosmotic theory: analysis of key fundamental aspects of energycoupling in oxidative phosphorylation in the light of a torsional mechanism of energy transduc-tion and atp synthesis—invited review part 1. J. bioenergetics biomembranes , 293–300(2010).75. S Nath, Two-ion theory of energy coupling in atp synthesis rectiﬁes a fundamental ﬂaw in thegoverning equations of the chemiosmotic theory. Biophys. chemistry , 45–52 (2017).76. P Mitchell, Chemiosmotic coupling in oxidative and photosynthetic phosphorylation.

Biol. Rev . , 445–501 (1966).77. J Villadsen, J Nielsen, G Lidén, Thermodynamics of bioreactions in Bioreaction EngineeringPrinciples . (Springer), pp. 119–150 (2011).78. D Rolfe, GC Brown, Cellular energy utilization and molecular origin of standard metabolicrate in mammals.

Physiol. reviews , 731–758 (1997).79. AM Mood, FA Graybill, DC Boes, Introduction to the theory of statistics . (McGraw-Hill, Inc),Third edition, (1974).80. E Parzen,

Stochastic Processes . (Society for industrial and applied mathematics), (1999).81. M Overgaard, et al., Hypoxia and exercise provoke both lactate release and lactate oxidationby the human brain.

The FASEB J . , 3012–3020 (2012).82. SS Nath, S Nath, Energy transfer from adenosine triphosphate: quantitative analysis andmechanistic insights. The J. Phys. Chem. B , 1533–1537 (2009).83. B Mélanie, R Caroline, V Yann, R Damien, Allometry of mitochondrial efﬁciency is set bymetabolic intensity.

Proc. Royal Soc. B , 20191693 (2019).84. TP Vogels, R Kanaka, LF Abbott, Neural network dynamics.

Annu. Rev. Neurosci . , 357–376 (2005).85. B Sengupta, S Laughlin, J Niven, Balanced excitatory and inhibitory synaptic currents pro-mote efﬁcient coding and metabolic efﬁciency. PLoS computational biology , e1003263(2013).86. V Braitenberg, A Schüz, Cortex: statistics and geometry of neuronal connectivity: Secondthoroughly revised addition . (Springer Science and Business Media), (1998).87. E Meyer, J Cooper, Correlations between na+-k+ atpase activity and acetylcholine release inrat cortical synaptosomes.

J. neurochemistry , 467–475 (1981).88. MS Santos, R Rodriguez, AP Carvalho, Effect of depolarizing agents on the ca2+-independent and ca2+-dependent release of [3h] gaba from sheep brain synaptosomes. Biochem. pharmacology , 301–308 (1992).89. E Satoh, Y Nakazato, On the mechanism of ouabain-induced release of acetylcholine fromsynaptosomes. J. neurochemistry , 1038–1044 (1992).90. RS Lomeo, et al., Exocytotic release of [3 h]-acetylcholine by ouabain involves intracellularca 2+ stores in rat brain cortical slices. Cell. molecular neurobiology , 917–927 (2003).91. AB Parekh, The wellcome prize lecture: Store-operated ca2+ entry: Dynamic interplay be-tween endoplasmic reticulum, mitochondria and plasma membrane. The J. physiology ,333–348 (2003). et al.

2. AM Mata, MR Sepúlveda, Calcium pumps in the central nervous system.

Brain researchreviews , 398–405 (2005).93. A Verkhratsky", Physiology and pathophysiology of the calcium store in the endoplasmicreticulum of neurons. Physiol. reviews , 201–279 (2005).94. T Binzegger, RJ Douglas, KA Martin, An axonal perspective on cortical circuits in New as-pects of axonal structure and function . (Springer), pp. 117–139 (2010).

1. SUPPLEMENTARY APPENDIXThe added expense arising from using neurons that ap-proach Demon values in energy-efﬁciency of computa-tion

Consider the extreme construction which replaces the prototypical750 pF neuron analyzed here with a system of many small,computationally energy-eﬃcient neurons. The replacement systemis made of many miniature neurons; each miniature neuron hasjust two synapses. Such a miniature neuron computes near thedemon-inspired eﬃciency, kTln2, because the surface area, andtherefore the capacitance, can be reduced about ten-thousand fold(or at least down to the limit not much bigger than the size of amammalian cell nucleus); additionally, the voltage from reset tothreshold can be reduced 1000-fold. This gives us a nominal 10 savings in postsynaptic, computational energy, within 10 of kTln2.However, we must use a system constructed of such miniatureneurons to replace our conventional neuron because we assume thatconventional neural computation must use learned information thatrequires combining 2 = 8192 synapses (close to 10000 and powersof 2 are convenient here). To produce the needed combinationof signals in this system of miniatures, one needs 8191 miniatureneurons arranged in a hierarchy of 13 levels with 2 miniatureneurons at the bottom, with each succeeding level having half thenumber of neurons of its input level, until the combined signalsreach one miniature neuron at the top. This system requires twicethe original 2 synapses, i.e., 2 synapses (2 come into to thebottom level and then half again as much for each succeedinglevel). This extra 2 synapses has the regular presynaptic costswhich wipes out the original postsynaptic savings that we gotfrom the miniatures’ rescaled capacitance rescaled voltage-range.(Presynaptic costs will be a little less than 0.12 W; see Table S5).More to the point, this system of 8191 neurons (call it asurrogate neuron) takes up more space. Assuming synapses havea ﬁxed size; then these pre- and postsynaptic structures occupymore than 1/3 of the space of whole brain. Additionally thevolume of the 8191 miniature neurons must also include the cellbodies; assume a nucleus size of 4 µ m for each of these miniatures.Then together they occupy several fold more volume than all thedendrites of a regular neuron. And ﬁnally there is the volumeoccupied by the axons and presynaptic structures of the neuron atthe top of the surrogate neuron.This top-miniature neuron has an axon that replicates theconnectivity of a standard axon. So in this model it has 8192presynaptic structures. Clearly, the surrogate neuron is bigger thana regular neuron.Now assume that communication time between regular neuronshas evolved to ﬁt the ecological niche of a human 10 or more yearsago. Then brain computational speed depends on matching thisniche and cannot be allowed to vary. However due to the increasedvolume occupied by the surrogate neuron, the distance betweensurrogates is greater than the distance between regular neurons.Since each output axon from the top-miniature neuron is equivalentto a conventional neuron’s axon, this surrogate axon must travelover a larger distance than the equivalent, regular-neuron axon.Because communication time between surrogate neurons cannotbe allowed to increase from normal, the top-miniature-neuron’saxon must not only be longer but it also must be wider. Thus dueto this axon’s surface area increase, communication costs go up, andthey have gone up far beyond the savings in computational costs. Selecting, adjusting, and commenting on literature val-ues for glucose uptake

In the literature, there are various uncertainties and incompati-bilities which require compromises and approximations. The sub-cortical distinctions made by Azevedo et al. (60) and Graham etal. (27) are diﬀerent. That is, the purely anatomical study lumpstogether the striatum, thalamus, colliculus, hypothalamus, ponsand medulla while the reported [ C]glucose measurements onlyinclude the individual subcortical regions caudate, putamen andthalamus. Assuming these subcortical forebrain regions accountfor much of the weight of what Table 1 and Table S1 labels as"other regions", and assuming that the unmeasured brain regionsare not too diﬀerent in glucose uptake, we just use a single valueand multiply by the weights of the subcortical regions given byAzevedo et al. (60).The [ C]glucose uptake by the choroid plexus and brain capil-laries is assumed to be negligible. The ventricular weight is inferredfrom Azevedo et al. (60) to produce their total of 1510 g.Another approximation is required due to the non-uniformity ofbrain size. The representative brain mass values available are foran average male brain. Incompatibly, the regional glucose uptakevalues from Graham et al. (27) are averages from six females andfour males with no statements about sex diﬀerences. Due to this sexheterogeneity there will be large total brain weight heterogeneityand a lower average brain mass per subject than the 1510 g value.Thus some scaling or conversion is needed. To remain consistent,the Graham et al. regional uptake values are scaled by the Azevedoet al. regional brain weights. Table S1 details these calculations.As a result, instead of a total brain uptake rate of 6.48 µ mol/secthe summed regional uptake rates yields 6.05 µ mol/sec.The OGI number is problematic in terms of accuracy and re-producibility. One group with multiple publications on the topicreports an OGI value of slightly more than 5.2 (81). Other A-Vstudies favor a higher OGI (ca. 5.6, (71)), but several of thesestudies also favor decreasing the value of glucose uptake by virtueof the net eﬄux of non-CO carbons. Supposing the larger OGIis correct and supposing that we are allowed to ignore non-CO carbon eﬄux, such an increase in oxidized carbons could yield anadditional ca. 0.14 ATP-watt available to gray matter. Conversion of glucose to ATP

There are three problems with the textbook calculations and previ-ous brain calculations of glucose to ATP conversion: such calcula-tions are based on room temperature free-energies, the mitochondrialleak value is based on a maximal leak value from non-neuronal tis-sue, and the chemiosmotic hypothesis ignores slip in redox pumps.Slip would further reduce the chemiosmotic ATP values by 10%(Nath, personal communication). However, Nath’s novel mechanismdoes not require any ATP downgrades. (He identiﬁes the neutralform of succinic acid as mediating leak. This form can penetratethe cristal membrane as the dianionic form, creating slip whilethe succinate monoanion is the motive form. Thus, both leak andslip are accounted for (15).) Here, however, we do not attemptto account for the issue of slip in our chemiosmotic calculations.Thus, the choice oﬀered for ATP production in the gray matterin Table S2 is 3.09 W (Nath) or 2.61 W (Mitchell). Both of thesecalculations use the conversion factor of 36 kJ/molATP (15, 82)rather than the room-temperature value typically used.Although we consider Nath’s torsional mechanism to be a moreaccurate depiction of ATP production than the chemiosmotic mech-anism, we recognize that this newer mechanism is not directlyinformed by brain mitochondrial studies. That is, brain tissuehas diﬀerent uncoupling proteins which could potentially alter theamount of ATP produced per mole of glucose. Of course, diﬀerentspecies have diﬀerent issues concerning thermoregulation and thusmay diﬀer in amount of leak (see for example (83)). Such studiesargue that larger animals are more eﬃcient in regard to mitochon-drial production of ATP. Thus the conversion values of glucose toATP used for rats are plausibly lower than for humans.

Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX | ther contrasting estimates with earlier results in theliterature Postsynaptic ionotropic costs are different.

Our postsynaptic costsfor AMPAR activation are about half of Attwell and Laughlin’s(79 · ATP/action potential/neuron vs 134 · ATP/actionpotential/neuron) (11). The diﬀerence mostly arises from their useof an outlier value for synaptic conductance that is doubted by theauthors themselves as cited in Attwell and Laughlin. Biophysicalsimulations encourage us to discard the outlier value.The biophysical simulations in Singh and Levy (44), which usethe consensus layer 5 prefrontal pyramidal neuron found in a varietyof biophysical models, support the lower postsynaptic activationcosts used here. Such simulations use 200 pS AMPAR-conductancesper synapse and no NMDARs. Under these conditions, it typicallytakes 250 to 700 synaptic activations to ﬁre the neuron. Suchsimulations did not include inhibition. Upgrading the model toinclude inhibition, as is done here and as occurs in the severalarticles that consider balanced inhibition (e.g., (84, 85)), is requiredto be consistent with the estimate of 2500 synaptic activations peroutput spike. That is, inhibition is implicitly incorporated into ourcost calculations by virtue of increasing the number of synapticactivations needed to reach threshold.The same amount of excitatory synaptic activations (ca. 2500input activations per pulse out) can be true for both humans andrats. That is, although the dendritic surface area of a humancortical neuron is greater than a rat’s, a lower rate of inhibition inthe human can compensate so that the same amount of excitatorysynaptic activation is required to ﬁre either neuron.

Sensitivity to axon and presynaptic assumptions.

Gray matter com-munication costs are rather large and are directly a function ofaxonal and presynaptic surface areas. Therefore, it is worth reveal-ing the sensitivity of the calculations to the assumptions going intothese surface area calculations.

Axons : Fixing the volume-fraction at 30%, and varying axondiameter changes the values of surface area and axon length. TableS7 shows the relationship between these parameters as well as theiraﬀect on aspects of communication costs. Mouse data motivatesmuch narrower axons but not smaller than 0.25 µ m (26). As thesmallest possible diameter, we choose the 0 . µm mouse-inspireddiameter (86). Boutons : As noted before, the cylindrical assumption for bou-tons is crude. Table S8 illustrates the sensitivity of our bouton sizeassumptions and, by extension, our bouton capacitance values rela-tive to volume fractions. In these calculations, the diﬀerent volumefractions are a necessary result of varying the bouton dimensions.

Reconciling the cost of

SynMod + . First, it must be said that thecatchall partition labeled

SynMod + has never been properly mea-sured for an adult animal as far as we know. More to the point herehowever, is that our perspective on what SynMod + should includediﬀers somewhat from earlier work.What follows explains our rejection of previous estimations of theATP-use attributed to the catchall and unpartitoned " SynMod + "category of energy consumption. In the rodent audit, the energyconsumed by SynMod + is based on published research that: (i)mathematically diﬀerences ATP-use before and after inhibition ofthe Na-K ATPase pump and that (ii) implicitly but necessarilyassumes that removing the Na + gradient will not increase otherforms of ATP-use. However, the literature argues otherwise. Ingeneral, cardiac glycosides such as ouabain, will accelerate severalforms of ATP consumption.Ouabain causes depolarization because of leak conductance, andit also causes transmitter release (87–90) and in particular quantaltransmitter release ( ? ). Quantal transmitter release implies (i) theATP-consuming processes of vesicle recycling and re-loading, (ii)postsynaptic activation of GTP-consuming metabotropic receptors,and (iii) postsynaptic activation of calcium-conducting NMDARs,which will activate various postsynaptic, ATP-consuming kinases.Moreover, in addition to the vesicle recycling, metabotropic, andkinase costs, there are both pre- and postsynaptic calcium pump-ing costs. That is, poisoning the Na-K pump leads to increase of[Ca ] in as does the NMDAR activation. Raising levels of internalcalcium in turn activates one or more of at least three types of Ca -ATPases: those in the plasma membrane, those associatedwith sarco- and endoplasmic reticula, and a mitochondrial accumu-lator (91–93). Any other kind of poisoning experiment that causesdepolarization will produce similar increased demands for ATP.Finally, there are those (not Attwell et al.) that assume blockingaction potentials will remove communication costs, but our leakcalculations refute this idea. Comparison to a class of completely quantiﬁed axons.

Concerningaxon lengths, there is one LM study that provides data indicatingthe reasonable nature of the length estimates obtained above. Inparticular, there are axonal length data for cat L2/3 pyramidal cellsusing 30 completely stained axons. These data imply an axonallength of 4 cm per neuron (94), but as LM measurements, theywill incorporate terminal boutons. For comparison to the estimateshere, we combine axonal boutons and the axonal lengths withoutterminal boutons. Assume there are ten thousand boutons perneuron, each 1 . · − cm in length. Using a volume fraction forsmall axons of exactly 30%, our unadorned axon length is 6.44 cmper neuron. Adding the bouton lengths yields 7.44 cm per neuron.Thus we are predicting that LM quantiﬁcation of the average humanpyramidal neuron’s axon is about 86% longer than the cat axon.More details on the derivations and parametric sensitivities arefound in Supplement including Tables S6-S8. Probability and entropy approximations

Initial development of approximations.

There are two approximationsneeded to calculate a valued Lindley-Shannon information rate, h (ˆΛ) − h (ˆΛ | Λ). After an exact integration by Mathematica, theﬁrst approximation follows. Speciﬁcally, p (ˆ λ ) = R λ mx λ mn p ( λ ) p (ˆ λ | λ ) dλ =(ˆ λ ln( λ mx λ mn )) − · (erf (cid:18) Nλ mx − ( N +1)ˆ λ p N +1)ˆ λλ mx (cid:19) − erf (cid:18) Nλ mn +( N +1)ˆ λ p N +1)ˆ λλ mn (cid:19) + e N (cid:18) erf (cid:18) Nλ mn − ( N +1)ˆ λ p N +1) λ mn ˆ λ (cid:19) − erf (cid:18) Nλ mx +( N +1)ˆ λ p N +1) λ mx ˆ λ (cid:19)(cid:19) ) ≈ (ˆ λ ln( λ mx λ mn )) − = p ( λ ), and therefore, h (ˆΛ) ≈ h (Λ).Remarks: (i) Mathematica performs the exact integration whenone temporarily substitutes m for ( N + 1) and speciﬁes the ap-propriate assumptions. (ii) With N = 2500, and a naive use ofMathematica, p (ˆ λ ) approximations of p ( λ ) appears exact when run-ning at the default precision of Mathematica. The erf terms in p (ˆ λ )combine to a value of two, which is the exact value needed (sincethis value is multiplied by one-half).The second approximation concerns the conditional diﬀerentialentropies. Recall from Results, p (ˆ λ | λ ) = √ N + 1(2 πλ ˆ λ ) − / exp( − λN N +1)ˆ λ − ˆ λ ( N +1)2 λ + N ). Thus in bits and according to Mathematica, − h (ˆΛ | λ ) = (ln(2) − ( ln(2 πe ( N + 1) /N ) − ln( λ ) + exp(2 N ) √ N BesselK (1 , [ − , N ] √ π ). Before valuing this entropyto yield the second approximation, there is a simpliﬁcation thatobviates the need for a third approximation. When the expectation h (ˆΛ | Λ) = R p ( λ ) h (ˆΛ | Λ = λ ) dλ is taken, the term E [ln(Λ)] appears.However, this same term of opposite sign appears in the marginaldiﬀerential entropy h (Λ), so they combine to zero. Then with exp(2 · √ BesselK (1 , [ − , · √ π < − ≈

0, we have thesecond approximation, and this approximation improves as N increases. The information gain in continuous time (per sec) thenis h (ˆΛ) − h (ˆΛ | Λ) ≈ log (ln( ˆ λ mx ˆ λ mn )) + log (2 πe ( N + 1) /N ). et al. nergy Audit Overview Including Partitioning Fig. S1.

The partitioning of the energy available from glucose for the top-down esti-mates. Note: Arrows indicate our partitioning process, not ﬂow of energy/glucose. A:Proceeding from left to right, total brain glucose-uptake is calculated by summing theregional uptakes. From there, the total glucose is subdivided by metabolic fate. B: Webegin with the regional cortical gray matter glucose uptake and end with speciﬁc ATPconsumption. The ﬁrst level of partitioning combines regional rates of [ C]glucoseuptake with regional brain weights; at the second level, glucose is partitioned based onmetabolic fate (OxGM, the glucose that goes into cellular respiration, and NonOxGM,the glucose that is not oxidized); at the third level, energy from oxidized glucose ispartitioned between ATP production and heat generation, and the majority of glucose-energy goes to the latter. At the fourth level, ATP energy is partitioned betweencommunication, computation, and

SynMod + . GM-gray matter; WM-white matter;Crbllm-cerebellum; Other regions, see Table S1; NonOx-nonoxidized glucose; Ox-oxidized glucose; Comm-communication; Comp-computation; Other-includes synapticmodiﬁcation, growth/retraction, maintenance, etc.Levy et al. PNAS |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX | able S1. Glucose Partitioning Mass (g) (cid:5)

Glucose uptake (cid:5)(cid:5)

Glucose uptake ? µmol/ g/min µmol/ region / secwhole brain 1495 – 6.05 ?? forebrain cortexcortical gray 633 28.6 3.02cortical white 590 18.4 1.81cerebellum 154 24.7 0.63other regions †

118 30 †† ‡ (cid:5) Regional masses from (60). (cid:5)(cid:5) values from (27) ? Individual regional values are calculated from ﬁrst two columns. ?? This is a sum of the regional values. See text for details. † Includes basal ganglia, thalamus, hypothalamus, etc. †† Uses avg. glucose uptake value of striatum and thalamus of (27) ‡ This range is based on the possible remaining mass using valuesfrom (60)

Table S2. Glucose Energy Partitioning to ATP-watts (cid:5)

Top-Down Calculations Watts (cid:5)

Non-oxidized (cid:5)(cid:5)

ATP- ATP-(complete (equivalent watts ? watts ?? oxidation) watts)whole brain (1495 g) 17.0 1.86 6.19 5.23cerebellum (154 g) 1.77 0.19 0.65 0.55other regions (118 g) † SynMod + ‡ ‡ gray: 1.6Hzcommunication 2.75 2.75 †† computation 0.17 0.17 †† other ATP demands 0.17 ‡ (cid:5) Watts based on glucose-uptake values from (27) and 2 . MJ/mol glucose (61) and 36 kJ/molAT P (15)(82); regional masses from (60). (cid:5)(cid:5)

Also assuming complete oxidation of glucose. See partitioning of glucosein earlier sections ? Using Nath’s torsional mechanism (15), (74) which incorporates mito-chondrial leak ?? Using chemiosmotic theory (76) which is then downgraded by standardmitochondrial leak value: 20% (78) † Including basal ganglia, thalamus, brainstem, etc. The missing massis ventricular. See Table S1 and the accompanying footnotes for moreinformation. †† Indicates bottom-up values exceed available energy if the top-downcalculations are accepted. ‡ Assuming that

SynMod + consumes the remaining gray matter ATP-watts ( SynMod + = gray matter ATP-watts - communication - compu-tation). Table S3. Bottom-Up Computation and Communication ? Gray Matter Computational Costs (cid:5)

AMPAR NMDAR † AMPAR + NMDAR0.069 W . · . ≈ . W 0.104 WGray Matter Communication CostsResting Potential Action Potentials Presynaptic Transmission1.09 W 0.47 W 0.12 W . · − J/NRN/AP . · − J/NRN/AP . · − J/NRN/APGray Matter Communication + ComputationGM Comm Total Comp Total Comm+Comp1.68 W 0.10 W 1.78 W . · − J/NRN/AP . · − J/NRN/AP . · − W/NRNOther Totals (cortex = WM + GM)WM ?? + GM Comm Comp + SynMod + (cid:5)(cid:5) Total GM + WM3.53 W/cortex 1.41 W 4.94 W/cortex . · − J/NRN/AP . · − J/NRN/AP . · − J/NRN/AP ? For more details on these calculations, see Methods (cid:5) . † Assumes NMDA-receptor activation contributes half-again the cost ofAMPA-receptor activations ?? White matter communications also includes white matter

SynMod + (cid:5)(cid:5) Gray matter

SynMod + includes synaptic modiﬁcation costs, e.g.,metabotropic transmitter eﬀects, axonal growth/retraction, receptor mod-iﬁcation/removal as well as the transport needed for such modiﬁcations et al. able S4. Computational costs arising from ionotropic glutamate synaptic activations Voltages V rev V m,ave V Na,Nern. V K,Nern − mV − mV +55 mV − mVFor a single synaspe, Average (Ave) AMPA-Receptor (AR) per synaptic activation (SA) ? G ave ( V Na + − V ave,m ) · G Na / SA Na + amps Na + coulombs/SA pS/SA mV · . pS/SA 12.5 pA/SA 15.1 fC/SAAMPAR failure rate-adjusted costs (cid:5) (All GM synapses = 1 . · ) Na + ﬂux Na + ﬂux ATP mol/sec ATP watts0.057 C/sec . · − mol/sec . · − mol/sec 0.069 WAMPAR + NMDAR Computational costper cerebral cortex per neuron per spike . † · . W ≈ . W . · − J ? SA duration 1 . (cid:5) . † Assumes NMDA-receptor activation contributes half-again the cost of AMPA-receptor activations

Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX | able S5. Gray Matter Communication (cid:5) Parameters. I (all conductances are at rest) V restm V Na,Nern V K,Nern G Na : G K − mV +55 mV − mV

121 : 24

Parameters. IICapacitivity Resistivity Conductivity Axon/Bouton Area . µF/cm ,

000 Ω cm . · − S/cm ≈ Total across all axons or boutonsArea Axon + Bouton G restaxon + btn G restNa C ax ; C btn . · cm S S . F ; 6 . FNa + rest-ﬂux and ATP to remove Na + ﬂux Na + ﬂux ATPs used . C/s . · − mol/s . · − mol/s Cost of axon + bouton rest potentials ( J/molAT P ) 1.09 W110mV ( V AP ) Axon Action Potential (AP); mV ( ∆ V Bt ) Bouton depolarizationAxon charging/sec (cid:5) Overlap-scaled ? Bouton charging/sec (cid:5)(cid:5) Na + mol/sec . amps . amps . amps . · − mol/secCost of action potentials ( . · − AT P mol/sec ) 0.47 WATP per vesicle released (VR); ATP per Ca spike AT P/vesicle Ca +2 removal . · ATPs . · ATPs (9 . · − mols) (1 . · − mols)Presynaptic AP costs ?? VR Events/s Ca events/s VR ATP use Ca ATP use . · /s . · /s . · − mol/s . · − mol/sAP Generated Presynaptic Cost 0.12 WTotal 1.68 W (cid:5) ? Multiplier from (23) ??

75% failure rate of TR but no failure of Ca entry et al. able S6. GM Communication Volume Fractions and Cylindrical Approximations (cid:5) Vol. Frac Volume (cid:5)(cid:5)

Diameter Height ? Length ?? Area pm † C m †† per bouton (total)Boutons 22 %

139 cm µ m 1.0 µ m - 6.61 · cm . FAxons

190 cm µ m - 9.66 · cm . · cm . FTotal 52 %

329 cm - - - . · cm . F (cid:5) Assuming 1 . · synapses per cortex (cid:5)(cid:5) Cortical gray matter volume ? Height is for a single cylindrical bouton ?? Length is total length of all small axons † Area of plasma membrane (pm) †† Membrane capacitance using 0 . µ F/cm based on 0 . µ F/cm of the lipid membrane plusactivation of two-thirds of the 0 . µ F/cm of Na + channel gating-charge. Table S7. Other axon parameter sets consistent with a 30% volume-fraction

Axon Diameter ( µm ) 0.28 0.4 0.5 0.6Axon Area ( cm ) 27 . · . · . · . · Total Area (cid:5) ( cm ) 33 . · . · . · . · Axon Length (cid:5)(cid:5) ( cm ) 3 . · . · . · . · RP cost (W) 1.67 1.28 1.09 0.96AP cost (W) 0.83 0.58 0.47 0.39 (cid:5)

Total area is the sum of axonal surface area and bouton surface area as inTable S6, 22% volume-fraction. (cid:5)(cid:5)

The axon length is the length of all axons extended by 1.0 µ m assumedbouton height and number. Table S8. Effect of Varying Bouton Dimensions

Diameter Height

Area pm Vol. Frac C m µm µm . · cm % 4 . F µm µm . · cm % 5 . F . µm . µm . · cm % 6 . F Levy et al.

PNAS |

February 15, 2021 | vol. XXX | no. XX || vol. XXX | no. XX |