Missing links as a source of seemingly variable constants in complex reaction networks
MMissing links as a source of seemingly variable constants in complex reaction networks
Zachary G. Nicolaou and Adilson E. Motter
1, 2 Department of Physics and Astronomy, Northwestern University, Evanston, Illinois 60208, USA Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois 60208, USA
A major challenge in network science is to determine parameters governing complex networkdynamics from experimental observations and theoretical models. In complex chemical reactionnetworks, for example, such as those describing processes in internal combustion engines and powergenerators, rate constant estimates vary significantly across studies despite substantial experimentalefforts. Here, we examine the possibility that variability in measured constants can be largelyattributed to the impact of missing network information on parameter estimation. Through thenumerical simulation of measurements in incomplete chemical reaction networks, we show thatunaccountability of network links presumed unimportant (with local sensitivity amounting to lessthan two percent of that of a measured link) can create apparent rate constant variations as largeas one order of magnitude even if no experimental errors are present in the data. Furthermore,the correlation coefficient between the logarithmic deviation of the rate constant estimate andthe cumulative relative sensitivity of the neglected reactions was less than 0 . I. INTRODUCTION
In the study of real complex network systems, informa-tion about the system components is often incomplete,unreliable, or unknown. Recently proposed methods toinfer the likelihood of links in a network take advantage ofnode similarities or network structures [1–6]. Combinedwith statistical inference, such methods have demon-strated remarkable success in a variety of real-world net-works with heterogeneous errors [7]. Other approachesinclude observing links present in reconstructed networksafter perturbations [8], applying the inverse problem togenerative models for networks with predicted hyperbolicstructures [9], and inferring missing nodes from commu-nity detection algorithms [10]. Statistical measures ofcorrelation and causation have also been used to inferlinks from observations [11–14].In addition to the topology of their connections, linksand nodes often carry parameters that encode their prop-erties. In models of physical networks, such parametersmust be determined through measurements of observ-ables, which may be costly and limited. In some cases,while the topology of the network is known in principle,parameters such as link weights can only be estimatedfor a relatively small proportion of the network. Never-theless, if the network is derived from a knowledge-basedmodel (i.e., it is believed to include all relevant detailsand is physically well motivated), it is usually assumedthat individual parameters can be measured accuratelythrough targeted experiments.The key problem in such cases is not to determinewhich nodes and links are present, but rather, whichof them are most important to include in the model.“Weak” nodes and links may be systematically neglectedto reduce the number of parameters and derive minimalmodels, but it is unclear how these missing elements mayaffect the estimated parameters, and hence the predicteddynamics, of the parts of the network that are retained. That is, it remains largely unknown what impact miss-ing and deliberately omitted structural information mayhave on the relevant dynamical processes predicted ordescribed by a network model.Here, in contrast to previous studies focused on solelynetwork structure, we explore the impact of network in-completeness on the parameters governing the systemdynamics. To motivate this work, consider the simpleelectric network of resistors shown in Fig. 1(a). Supposean experimenter attempts to fit an incomplete model ofthe system (in which the red resistor is absent) to cur-rent measurements in order to determine the resistance R of the blue resistor. Since the missing resistor is rel-atively large, little current passes through it. From thenetwork perspective, the links in the network should begiven a weight proportional to their admittance (inverseresistance) and, intuitively, a model that neglects smallweighted links would still be expected to be good at pre-dicting the dynamics. The absence of the red resistor R (a) (b) RI +– V adj I +– V adj R e s t V FIG. 1. Impact of missing information in a simple electricnetwork. (a) Network of four resistors, where V = 1V isfixed and V adj differs in different experiments. The resistanceof the blue resistor ( R = 1Ω) is unknown to the experimenterand is determined through measurements of the currents I , while the red resistor ( R = 10Ω) is missing in the model.(b) Estimate of resistor value vs adjustable voltage, as givenby the least squares fit. The blue dashed line shows the actualvalue, and the black dotted line shows where the currentsbecome insensitive to the estimate. a r X i v : . [ c ond - m a t . d i s - nn ] O c t results in a systematic error due to the modeling approx-imation, and we assume for simplicity that the measure-ments are perfect and no additional errors are present.The currents drawn from the voltage sources are mea-sured, and the resistance R in the model is adjusteduntil the model’s predicted currents minimize the devi-ations from the measured currents, resulting in a leastsquares estimate (see Appendix A for details). Figure1(b) shows how this estimated resistance R est dependson the voltage value V adj that the experiment is con-ducted under. Crucially, two experimenters who operatewith different values of V adj can obtain substantially dif-ferent estimates for R est and may falsely conclude thatthe resistance changes with the experimental conditions.The apparent variability of a constant in this exampleis actually quite easy to understand. Essentially no cur-rent passes through the blue resistor when V adj ≈ V ,so the measured currents are not sensitive to changes inthe value of R (as quantified by local sensitivity in Ap-pendix A). Thus, the error introduced by the incompletemodel is amplified around the dotted line in Fig. 1(b),and therefore the measure of link importance based onthe admittance (link weight) does not adequately reflectthe impact of link removal on parameter estimation.While the errors induced by the estimation procedurein Fig. 1 are easy to understand in terms of local sensitiv-ities, subtler challenges arise in more complex networks,especially when the network dynamics are nonstation-ary. In the remainder of the paper, we show that forcomplex chemical reaction networks in particular, miss-ing information about the underlying network can leadto significant variations in rate constants estimated un-der different conditions even when the local sensitivitiesfor the neglected links are small compared to those forthe links being measured. Given that network models ofchemical reactions will necessarily be incomplete (sinceadditional short-lived radical species and rare reactionscould always be added to a given model), this variabilityposes a serious challenge to the efforts to reliably deter-mine rate constant through experiments.To illustrate this challenge, we focus on importantchemical reaction networks involved in natural gas com-bustion in Section II and ethylene pyrolysis in SectionIII, which have been carefully developed for decades, pri-marily through experiments employing shock tubes andlaminar flame speed measurements [15–19]. We comparethe results for these processes in Section IV, and find inparticular that local sensitivity analysis, which is a com-monly used methods for determining reaction importance[20–22], correlates weakly with the reactions impact onrate constant measurements in all cases. We conclude inSection V with a discussion of implications and directionsfor future research. II. NATURAL GAS COMBUSTION
We first consider the impact of unknown network reac-tions in the case of natural gas combustion. We model thechemical process with a coupled set of kinetic equations,which corresponds to a network of chemical species, el-ementary reactions, and physical parameters, including the rate constants that relate species concentrations tothe rate of reactions (see Appendix B). To emulate theeffects of missing information in models of real processes,we take a specific network (which we regard as the com-plete network) as our “ground truth” from which mea-surement values (regarded to be exact) are generated us-ing simulations. We employ the GRI 3.0 network [23]as our “complete” natural gas network, which is thensimulated using the open source software Cantera to in-tegrate a continuously stirred reactor with an accuratetime stepping method for stiff ordinary differential equa-tions [24, 25]. The GRI 3.0 network is a specificationof 53 chemical species and 325 elementary reactions andrate constants, which have been curated carefully fromthe body of experiments in the literature. Despite ex-tensive curation, this network is not reliable for generalpurposes across different conditions when the parametersare fixed, which our analysis will suggest to be due to in-completeness of the network model itself. But for thepurpose of this analysis, we regard this network as com-plete and evaluate our hypothesis by quantifying the im-pact of further removing information from the network.Specifically, starting with this network, we produce plau-sible models for the process by removing some reactionsfrom the complete network and use the resulting “in-complete” networks to determine parameter values frommeasurements.Figure 2(a) shows the evolution of the chemical com-positions in a mixture of natural gas and oxygen. Thereactants combine explosively as the reaction proceedsand ultimate produce water and a variety of other prod-ucts. During this process, the temperature and pressureof the gas increase sharply as the fuel ignites, as shown inFig. 2(b). This ignition is the result of a chain reactionprocess taking part in the complex network of specieswhich are connected by elementary reactions. The net-work structure of this process can be represented as a hy-pergraph, where nodes are chemical species and directedhyperedges are reactions, with weights proportional tothe reaction flux. This hypergraph can be visualized byits bipartite incidence graph, where the hyperedges arereplaced with reaction nodes, as shown in Fig. 2(c).A traditionally employed measure of reaction impor-tance in chemical reaction networks is the local sensitiv-ity, which quantifies changes in the concentration withchanges in the rate constant (see Appendix B). To assessthe reliability of this metric, we generate new incompletenetworks from the complete network by randomly remov-ing reactions that are deemed unimportant while retain-ing reactions presumed to be important, where reactionimportance is assessed on the basis of local sensitivities.Rate constants in these incomplete networks are then fitto simulated measurements from the complete networkto assess the impact of network incompleteness on pa-rameter estimation.For our measurement simulations, we consider two re-actions whose rate constants are to be determined,H + O −− (cid:42)(cid:41) −− O + OH , (I)CH + H −− (cid:42)(cid:41) −− CH + H , (II)taking all other rate constants at the fixed values given inthe natural gas network. Equation (I) is a chain branch- CH O H O10 - - - - M o l e F r a c t i on , X i ( a ) ( ms ) T e m pe r a t u r e ( K ) P r e ss u r e ( a t m ) ( b ) Nodes Links0.741.061.381.702.022.35 S pe c i e s T i m e ( m s ) R ea c t i on T i m e ( m s ) - - - - - M o l e F r a c t i on R a t e m o l m - m s - Others CC HC H C H C H C H C H CHCH OH C C H OH CO CH O CH CH CH OCH O CH COCO H H H OH O HC OHCOHO N OO OH ( c ) FIG. 2. Natural gas ignition in simulations. (a) Mole fractions vs time, with water, methane, and oxygen shown in green, blue,and orange, respectively, and all other species shown in gray. (b) Temperature (red line) and pressure (blue line) vs time, wherethe sharp rises at 1 . H , CH CHO, C H , HCCOH, HCNN, H CN,AR, HCNO, HOCN, HCN, CN, HNCO, NCO, NH , NH , NH, N O, N, HNO, NO , NNH, NO) are shown together in thenode labeled “Others” with their mean size (and color). An animation of the temporal evolution of this network is available inthe Supplemental Material [26]. ing reaction, which helps to populate the radical poolthat leads to ignition. Equation (II) is a propagationreaction, which helps to diversify this pool of radicals.We perform a numerical optimization to minimize anerror (cid:15) in order to fit the incomplete network to the mea-surements. The error we use quantifies the discrepancy inthe peak oxygen radical concentration and ignition time,as described in Appendix C. This particular quantity isemployed in order to emulate the data available in realexperiments, such as shock tube experiments based onoxygen radical concentration peaks [15]. Furthermore,the reactions in Eqs. (I) and (II) are known to be impor-tant for combustion modeling, and their local sensitivities S OI and S OII are among the largest of all the reactionsin the network during the ignition events, as shown inFig. 3(a).Incomplete networks are generated randomly by re-moving reactions with small oxygen sensitivity from thenetwork. In order to quantify how the sensitivity of theremoved reactions impacts the variability of the rate con-stant estimate, we retain from the N tot total reactions atunable number of reactions, N ret , with the largest O-sensitivities, which are not candidates for removal in thegeneration of the incomplete networks. Incomplete net-works are then generated by randomly selecting a tunablenumber of reactions N rem from the remaining N tot − N ret reactions and removing them from the network. Foreach value of N ret and N rem , multiple incomplete net-works are generated with differing random seeds, whichallows us to statistically study the effects of missing infor-mation. Rate constant estimates are obtained from fitsminimizing (cid:15) . The fit of one sample realization is shownin Fig. 3(b), but the quality of this fit varies significantlywith each randomly realized incomplete network for fixed N rem and N ret .Under any particular thermodynamic condition, mini-mizing the error results in a rate constant estimate k est for Eq. (I) or Eq. (II), but these estimates will varywith the thermodynamic conditions. For example, forone incomplete network generated with N ret = 40 and N rem = 40, the estimate for Eq. (I) varied consider-ably among the 27 thermodynamic conditions describedin Appendix C, with values ranging from 0 .
06 to 4 . k . Each esti-mate made in this fashion uses only a single observational Eq. ( I ) Eq. ( II ) - - Time ( ms ) S O ℓ CompleteNetwork IncompleteFit ( ms ) X O × (a) (b) FIG. 3. Simulated rate constant measurement for Eqs. (I)and (II). (a) Local sensitivity S O (cid:96) vs time during ignition forEq. (I) (green line), Eq. (II) (red line), and the other eightmost sensitive reactions (dashed lines). (b) Oxygen radicalmole fraction X O vs time during the ignition event corre-sponding to Experiment 14 (see Table C1 in Appendix C) forthe complete network and a fitted incomplete network with N ret = N rem = 40. The value of the rate constant for the fit-ted network provides an estimate for the rate constants k est . N r e m Eq. ( I ) Eq. ( II ) log ( k est / k ) N r e m S r / S m
10 40 70 100 130 160104070100130160 N ret N r e m
10 40 70 100 130 160 N ret log (ϵ)- - - - FIG. 4. Average values of | log ( k est /k ) | (top row), S r /S m (center row), and log ( (cid:15) ) (bottom row) as a function of N rem and N ret for Eq. (I) (left column) and Eq. (II) (right column).The averages follow similar trends, with magnitudes increas-ing for increasing N rem and decreasing N ret . data point, and so we may hope to eliminate the varia-tion in the estimate by combining many observations ina single fit to produce an estimate.To test if measurements of multiple data points willimprove the estimate, we performed 1000 rate constantestimates for Eqs. (I) and (II) for incomplete networksgenerated with N rem and N ret varying from 10 to 160 byincrements of 15 (in total, 2 . × estimates), whereeach estimate was obtained by minimizing the aggregatederror over all 27 thermodynamic conditions. Figure 4shows how the average results from these simulationsvary with N rem and N ret . For each estimate, we considerthe ratio of the sensitivities of the removed and measuredreactions, S r /S m , the magnitude of logarithmic deviationof the rate constant estimate (representing the orders-of-magnitude difference between the rate constant estimateand the actual rate constant), | log ( k est /k ) | , and thelogarithm of the error, log ( (cid:15) ). It is clear from theseresults that after increasing the amount of observationaldata in the rate constant estimates, there can still be sig-nificant variation in the estimates even when relativelyfew of the least sensitive reactions are missing from thenetwork and the fit appears adequate.Having incorporated multiple thermodynamic condi-tions in each estimate, it is interesting to study how theestimates change under yet different conditions. We per-formed a second set of rate constant measurements usingthe same set of incomplete networks described above un-der slightly different experimental conditions by dilutingthe gas with about ten percent argon. Argon, a neu-tral buffer which does not react chemically, is sometimesused in shock-tube experiments to adjust conditions, suchas pressure, under the assumption that its presence willnot alter the results. However, besides deviating againfrom the true rate constants, the same incomplete net-works also exhibit substantial differences between rateconstant measurements performed in air with and with-out argon dilution, as shown in Fig. 5. Among all the - - - - k estair + Ar k k e s t a i r k Eq. ( I ) - - k estair + Ar k Eq. ( II ) log ( S r / S m )- - - - FIG. 5. Rate constant estimate with argon dilution vs rateconstant estimate without argon dilution for (a) Eq. (I) and(b) Eq. (II). The same incomplete network is used in each pairof measurements, and the color code indicates the ratio of thesensitivities between the removed reactions and the measuredreaction. paired measurements, the Pearson correlation coefficientbetween rate constant estimates with no argon and withargon dilution was 0 .
93 for Eq. (I) and 0 .
78 for Eq. (II),while the fraction of variance unexplained by a linearregression between the two conditions was, respectively,14% and 39%. Furthermore, while networks with smallremoval sensitivities tend to produce rate constant esti-mates closer to the real value, the scatter in Fig. 5 isnot substantially smaller when the estimates are good orwhen the removal sensitivity is small. Thus, network in-completeness can clearly result in variable rate constantestimates under differing conditions even when the reac-tions being measured have large sensitivities comparedto the neglected reactions and the change in conditionsis seemingly innocuous.
III. ETHYLENE PYROLYSIS
To illustrate that these results generalize beyond com-bustion, we next consider a different complex reactionnetwork describing ethylene pyrolysis. Pyrolysis is a com-mon industrial chemical process involving the burning oforganic compounds in the absence of oxygen. In this pro-cess, larger organic molecules like C H (ethylene) breakdown into smaller molecules such as H and CH . We em-ploy a recent model of light hydrocarbon pyrolysis con-sisting of 84 species and 1698 reactions as our completepyrolysis network [19]. Figure 6 shows the evolution ofthe process and network structure in the pyrolysis case,as in Fig. 2.For ethylene pyrolysis, we consider rate constant mea-surements based on net yields of H and CH rather thanthe peak O concentration and ignition time used previ-ously. We allow the reaction to evolve until the H con-centration reaches its steady state with 27 different ther-modynamic conditions relevant to low pressure chemicalvapor deposition. We consider two reactions,H + C H −−→ H + C H , (III)C H + C H −−→ C H + C H , (IV)each of which has large sensitivity with respect to the H C H C H CH N M o l e F r a c t i on , X i ( a ) ( s ) T e m pe r a t u r e ( K ) P r e ss u r e ( a t m ) ( b ) Nodes Links1.23.04.96.88.710.5 S pe c i e s T i m e ( s ) R ea c t i on T i m e ( s ) - - - - - M o l e F r a c t i on R a t e m o l m - s - Others C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H CH CH C H C H H H C H C H C H C H ( c ) FIG. 6. Counterpart of Fig. 2 for ethylene pyrolysis. The ethylene fuel breaks down in the absence of oxygen in a slower andmore temporally distributed process than in the case of combustion, where oxygen is present. H and CH net yields. The error (cid:15) was minimized asbefore to obtain the rate constant estimates. Appendix Ddescribes details for these pyrolysis simulations. Figures7-8 show counterparts to Fig. 3-4 in the case of pyrolysis.It is clear from these results that, as was the case formethane combustion, rate constant estimates may varyconsiderably for ethylene pyrolysis when relatively fewreactions are missing even if the missing reactions havesmall sensitivities and the fit errors are small. IV. COMBUSTION/PYROLYSISCOMPARISONS
The distributions of rate constant estimates for thecombustion and pyrolysis networks are broadly similar,as summarized by the histograms in Fig. 9(a)-(d). Therate constant estimates typically deviate much more thanthe error or the removal sensitivities in all cases, de-spite the differing measurement targets and reaction net-works. Table I shows the correlation coefficients betweenmeasurement quantities for each reaction in each pro-cess; there are moderate to weak correlations between
Eq. ( III ) Eq. ( IV ) - - Time ( s ) S O ℓ CompleteNetwork IncompleteFit ( ms ) X H × (a) (b) FIG. 7. Counterpart of Fig. 3 for pyrolysis reactions inEqs. (III)-(IV), with N rem = N ret = 200. Incomplete networkmodels are fitted to the observed yields of X oH and X oCH tosimulate measurements of the rate constants. all quantities. Thus, in both these processes, the localsensitivity metric correlates weakly with the rate con-stant estimates and is not a very reliable measure of linkimportance. Even when the removed reactions have lo-cal sensitivity totaling to much less than the sensitiv-ity of the measured links, the rate constant estimatescan vary considerably, as shown in Fig. 9(e)-(f). In thenetworks we generated, for example, the smallest val-ues of S r /S m leading to order-of-magnitude changes with | log ( k est /k ) | > .
005 for Eq. (I), 0 . .
019 for Eq. (III), and 0 . N r e m Eq. ( III ) Eq. ( IV ) log ( k est / k ) N r e m S r / S m
40 200 360 520 680 84040200360520680840 N ret N r e m
40 200 360 520 680 840 N ret log (ϵ)- - - - FIG. 8. Counterpart of Fig. 4 for ethylene pyrolysis. Therate constant estimate k est , removal sensitivity S r , and error (cid:15) follow the same general trends as for natural gas combustion,but estimates are generally better for Eq. (III) and worse forEq. (IV). P r opo r t i on (a) Eq. ( I ) (c) Eq. ( II ) Eq. ( I ) Eq. ( II ) S r / S m (e) N rem / N tot N ret / N tot log ( k est / k ) P r opo r t i on (c) Eq. ( III ) log ( k est / k ) (d) Eq. ( IV ) Eq. ( III ) Eq. ( IV ) log ( k est / k ) S r / S m (f) FIG. 9. Distributions of rate constant estimates for Eqs. (I)-(IV). (a-d) Histograms of the rate constant estimates for Eqs. (I)-(IV), with the red-green color blend indicating the proportion of removed and retained reactions for each set of histograms.(e-f) Relative sensitivity of the removed reactions vs rate constant estimates for Eqs. (I)-(IV), with all data divided into 25evenly-occupied bins in each axis and with error bars showing the 90% quantiles for each bin. sider the effect of varying the number of removed andretained reactions in the simulated measurements. Wedivide the plane spanned by N ret and N rem into an ac-ceptable region (in which more than 95% of the rate con-stant estimates from fits of incomplete networks deviatefrom the actual value by less than a factor of two) andan unacceptable region (in which this is not the case).Figure 10 compares the acceptable regions in the com-bustion and pyrolysis cases by normalizing the axes by N tot , the total number of reactions in each network. Inorder to achieve acceptable measurements with just 10%of reactions missing for all cases, the retained reactionsmust amount to more than 15% of the network. Whilesomewhat smaller N ret /N tot is adequate for acceptablemeasurements for Eqs. (I)-(III) in this case, the propor-tion of retained reactions required for acceptable mea-surements rises to about 15% in all cases when 50% ofthe reactions are missing.Interestingly, the natural gas network shows a strongerdependence on the proportion of removed reactions than x y (I) (II) (III) (IV) S r /S m log ( (cid:15) ) 0 .
48 0 .
49 0 .
59 0 . S r /S m | log ( k est /k ) | .
43 0 .
46 0 .
47 0 . | log ( k est /k ) | log ( (cid:15) ) 0 .
30 0 .
38 0 .
32 0 . x and y for the distribution of simulated rate constant mea-surements in the combustion reactions in Eqs. (I) and (II) andthe pyrolysis reactions in Eqs. (III) and (IV). the pyrolysis network, indicating that the pyrolysis net-work has a greater proportion of reactions that are unim-portant to the dynamics of the measurement. This mayresult from the more complex chain reaction involved inignition compared to the slower and more temporally dis-tributed pyrolysis process. N rem / N tot N r e t / N t o t CombustionAcceptableUnacceptableEq. ( I ) Eq. ( II ) ( a ) N rem / N tot PyrolysisAcceptableUnacceptableEq. ( III ) Eq. ( IV ) ( b ) FIG. 10. Acceptable and unacceptable regions of networkcompleteness for (a) the combustion network and (b) the py-rolysis network. The rate constant estimates are deemed ac-ceptable only if more than 95% of incomplete network fitsresult in rate constant estimates which differ from the actualrate constant by less than a factor of two. Thin lines show howthe boundary varies as the acceptable threshold is reduced to90%, 85%, 75%, and 70%.
V. DISCUSSION
Our results show that observed variations in rate con-stants of combustion and pyrolysis networks can belargely attributed to the use of incomplete network mod-els that are missing reactions that were presumed to benegligible. We demonstrated that even when the ne-glected reactions have small cumulative sensitivity com-pared to the sensitivity of the measured reaction, rateconstant estimates in incomplete networks can vary byone order of magnitude. Furthermore, rate constant es-timate deviations and local sensitivity analysis showedweak statistical correlations. While these results high-light significant challenges in constructing a reliable andtransferable kinetic model for these important chemicalprocesses, they also reveal that current observations ofvariability in rate constant estimates are not necessarilyan indication that a complete network model applica-ble across all conditions is unattainable. Yet, since mea-surements of individual rate constants using the currentincomplete network models cannot produce reliable re-sults, iteratively refining individual parameters throughtargeted experiments is unlikely to ever produce accu-rate estimates. It would therefore be desirable to ac-cumulate large data sets from measurements across allrelevant conditions in order to simultaneously estimateall rate constants in the model.Our results also have implications for other processesdescribed by complex chemical reaction networks, includ-ing atmospheric processes [27–29] as well as biochemi-cal processes, such as those mediated by metabolic net-works [30–32]. Compared to the combustion and pyrol-ysis networks considered here, parameter estimation inmetabolic networks in particular is still in its infancy andcould benefit significantly from more informed formula-tions at the outset. One promising direction for improve-ment is suggested by information-geometric and sloppymodeling approaches [33–35]. Using such methods, itmay be possible to identify measurement targets as al-gebraic combinations of rate constants that are more ro-bust to the variations caused by missing network links.Another promising information theoretic approach forpredicting network parameters from data would be toemploy maximum entropy models, whose degeneracy is-sues have been recently addressed [36]. Alternatively, itmay be possible to directly estimate reaction fluxes ratherthan species concentrations in order to disentangle reac-tions. In principle, molecular dynamics simulations canbe used to estimate fluxes even if they are experimentallyinaccessible, provided that the molecular force fields canbe modeled realistically [37, 38].Beyond chemical reaction networks, social networks[39–41] and infrastructure networks may also benefit fromour findings. In power grids [42, 43], for example, thenetwork components are assumed to be known but caninvolve uncertainties and the network itself may also vary,as not all components are necessarily operational at alltimes. Our results suggest that power-grid models maybecome unreliable even if the component parameters areaccurate unless they are updated in real time to ac-count for missing components. In other applications, ourresults also emphasize an important trade-off between model accuracy and parameter accuracy when dimensionreduction is implemented by network trimming. That is,trimmed networks may lead to accurate results underspecific conditions at the cost of requiring parameter val-ues that differ significantly from their actual values andthat may change substantially as conditions change. Weanticipate that the impact of missing information on pa-rameter estimation will ultimately depend on the classof networks under consideration, much in the same wayas optimal link detection varies across scientific domains[44].
ACKNOWLEDGMENTS
The authors thank Igor Mezi´c for insightful discus-sions. This work was supported by MURI Grant No.W911NF-14-1-0359 and ARO Grant No. W911NF-19-1-0383.
Appendix A: Electric circuit analysis
We first derive the measured currents for the circuitFig. 1 using Kirchhoff’s laws. The voltage across theblue resistor is known, so the current moving from left toright across the blue resistor is I = ( V adj − V ) /R . Sincethe black nodes are neither sources nor sinks of current,the current from top to bottom across the red resistormust be I + I = ( V adj − V ) /R . This same currentmust pass through the horizontal 1Ω resistor, so that I + I = V / (1Ω). It follows that V = V adj / ( R + 1Ω)and I = V adj / ( R + 1Ω) − ( V adj − V ) /R . Similarly, thecurrent across the vertical 1Ω resistor must be I − I = V / (1Ω), and thus I = V / (1Ω) + ( V adj − V ) /R .Next, we consider the model of the experimenter whoneglects the red resistor. According to this model, allthe current drawn through the adjustable voltage sourcetravels across the blue resistor, so that the estimated cur-rent is I (cid:48) = − ( V adj − V ) /R est , where R est is the estimateof the resistance of the blue resistor. Similarly, the exper-imenter estimates the current drawn from the left voltagesource to be I (cid:48) = V / (1Ω) + ( V adj − V ) /R est . The squareerror for the incomplete model, (cid:15) ≡ ( I (cid:48) − I ) +( I (cid:48) − I ) ,is thus (cid:15) = ∆ + (cid:18) V adj R + 1Ω + ∆ (cid:19) , (A1)where ∆ = ( V adj − V )(1 /R est − /R ). The least squareestimate for the resistance in Fig. 1(b) follows by min-imizing (cid:15) with respect to R est . The local sensitivity S ij ≡ ( R j /I i )( ∂I i /∂R j ) quantifies how the i th currentvaries with the j th resistor. It seems intuitive that theleast square estimate derived from the model that omits R can become unreliable when S i becomes large com-pared to S i . Indeed, for this circuit, the ratio S /S =( R + 1Ω) ( V adj − V ) /R R V adj approaches zero as V adj approaches V , where the estimate in Fig. 1(b) becomesunreliable. Exp. T (K) P (atm) φ t oig (s) X oO peak k est /k . . . × − . × − .
022 1300 0 . . . × − . × − .
013 1300 0 . . . × − . × − .
714 1300 1 . . . × − . × − .
035 1300 1 . . . × − . × − .
096 1300 1 . . . × − . × − .
277 1300 5 . . . × − . × − .
118 1300 5 . . . × − . × − .
309 1300 5 . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . . . . × − . × − . T , pressure P , and the fuel toair ratio φ are varied, resulting in different observed ignition times t oig and peak O radical mole fraction X oO peak . An example ofthe estimated rate constant k est /k for fits of the corresponding observation for a network with 40 reactions removed is shownin the last column. Appendix B: Kinetic models and reaction sensitivity
The chemical reaction networks we consider are de-scribed by state variables corresponding to the molarfraction X A i of the i th species with name A i in an idealgas and thermodynamic variables including the tempera-ture T and pressure P , with volume held fixed and underadiabatic conditions. For simplicity, we consider well-mixed systems, so that there is no spatial dependenceon variables. Molar fractions vary because of elementarychemical reactions between species, such as the (cid:96) th re-action (cid:80) i ν i (cid:96) A i −−→ (cid:80) j η j (cid:96) A j , which is assumed tooccur at a rate governed by mass-action kinetics (i.e.,proportional to a rate constant κ (cid:96) times the product ofthe reactants molar concentrations raised to their sto-ichiometric coefficients). Each molar fraction increasesat a rate given by the reactions that produce it and de-creases at a rate given by the reactions that consume it,as d X A i / d t = (cid:80) (cid:96) κ (cid:96) ( η i(cid:96) − ν i(cid:96) ) (cid:81) j X ν j(cid:96) A j . The energy re-leased or consumed from each reaction (determined fromthermodynamic data about the species) is converted toheat, which enters an energy conservation equation gov-erning the evolution of the temperature, and the pres-sure and temperature are related through a constitutiverelation, which we take as the ideal gas law. Each rate constant has temperature dependence which is based ona modified Arrhenius equation κ = kT b exp( − E/RT )where k , b , and E are model parameters. Some three-body reactions also depend on pressure with a Troefalloff, and reversible reactions have rate constants de-rived from detailed balance. The local sensitivity, S i(cid:96) ≡ κ (cid:96) /X i × ∂X i /∂κ (cid:96) , quantifies how the state variable X i changes as the parameter κ (cid:96) varies [21]. Small changesin the rate constants for reactions with large sensitivitiesproduce large changes in the species molar fractions, indi-cating that sensitivity may be a useful metric for reactionimportance. Appendix C: Simulated combustion measurements
We consider 27 thermodynamic conditions for our mea-surement simulations that evenly span the range of ther-modynamic values over which the GRI network was de-signed to model ignition, as listed in Table C1. The initialconcentrations in the simulation are a mixture of air andmethane, with 8 parts N , 2 parts O , and φ parts fuelCH . For the measurements in air diluted with argon, anadditional 1 part Ar was added to the initial conditions.Experiments cannot generally observe the concentra-tion of all species or reaction fluxes during the course of Exp. T (K) P (atm) φ X oH X oCH k est /k .
01 0 . . × − . × − .
10 0 . . × − . × − .
00 0 . . × − . × − .
01 1 . . × − . × − .
10 1 . . × − . × − .
00 1 . . × − . × − .
01 2 . . × − . × − .
10 2 . . × − . × − .
00 2 . . × − . × − .
01 0 . . × − . × − .
10 0 . . × − . × − .
00 0 . . × − . × − .
01 1 . . × − . × − .
10 1 . . × − . × − .
00 1 . . × − . × − .
01 2 . . × − . × − .
10 2 . . × − . × − .
00 2 . . × − . × − .
01 0 . . × − . × − .
10 0 . . × − . × − .
00 0 . . × − . × − .
01 1 . . × − . × − .
10 1 . . × − . × − .
00 1 . . × − . × − .
01 2 . . × − . × − .
10 2 . . × − . × − .
00 2 . . × − . × − T , pressure P , and fuel purity φ are varied, resulting in different net yields X oH and X oCH . An example of the estimated rate constant k est /k for fits of thecorresponding observation for a network with 200 reactions removed is shown in the last column. ignition. Instead, specific targets are identified to mea-sure and fit. For this study, we consider the ignitiontime t ig and the peak in the oxygen radical mole fraction X O peak that occurs at ignition, where t ig is taken as thetime of the oxygen peak. For simplicity, we fix the valuesof b and E in the rate constants to their values in the orig-inal network and optimize only over the pre-exponentialfactor k . We optimize over the error quantity (cid:15) ≡ (cid:32) t oig − t fig t oig (cid:33) + (cid:32) X oO peak − X fO peak X oO peak (cid:33) , (C1)where the superscripts o and f indicate the observed andfitted values, respectively. When the observation frommultiple thermodynamic conditions are combined to pro-duce a rate constant estimate, the square errors are to-taled to give the aggregated error. The particular opti-mization target in Eq. (C1) is intended to emulate choicesmade in real combustion experiments [15].We use a Brent optimization algorithm, with a bracketfound from an initial interval of [ k / , k ], where k is the original network’s rate constant value. To simu-late dynamics, we used the open-source software Canterato numerically integrate the coupled ordinary differen-tial equations in the combustion and pyrolysis networks. The 2-norm (over the time indexes) of the local sensi-tivity S O (cid:96) (and S H (cid:96) in the case of pyrolysis below) wascalculated for all reactions and averaged over all exper-imental conditions to rank reactions by their sensitivity.The removal sensitivity S r is the total sensitivity of allreactions removed to generate the incomplete network,and the measurement sensitivity S m is the sensitivity ofthe reaction whose rate constant is being estimated. Appendix D: Simulated pyrolysis measurements
Table D1 shows the experimental conditions for thepyrolysis rate constant measurements. The fuel purity φ in this case indicates the ratio of ethylene to N in theinitial condition. These conditions were chosen for theirapplicability to low pressure chemical vapor deposition.For ethylene pyrolysis, we optimize over the measurementerror (cid:15) ≡ (cid:32) X oCH − X fCH X oCH (cid:33) + (cid:32) X oH − X fH X oH (cid:33) , (D1)where the yield concentrations are evaluated at a timethat the H concentration has attained its equilibrium0value. The particular optimization target in Eq. (D1)based on methane and hydrogen yields is intended to represent the kinds of limited data that are available inchemical vapor deposition experiments. [1] A. Clauset, C. Moore, and M. E. Newman, Nature ,98 (2008).[2] R. Guimer`a and M. Sales-Pardo, Proc. Natl. Acad. Sci.U. S. A. , 22073 (2009).[3] L. L¨u, and T. Zhou, Physica A , 184 (2013).[5] E. Y. Yu, D. B. Chen, and J. Y. Zhao. Sci. Rep. , 14469(2018).[6] R. Pech, D. Hao, Y. L. Lee, Y. Yuan, and T. Zhou. Phys-ica A , 121319 (2019).[7] T. P. Peixoto, Phys. Rev. X , 041011 (2018).[8] L. L¨u, L. Pan, T. Zhou, Y. C. Zhang, and H. E. Stanley,Proc. Natl. Acad. Sci. U. S. A. , 2325 (2015).[9] F. Papadopoulos, C. Psomas, and D. Krioukov, Perform.Eval. Rev. , 104 (2012).[10] D. Hric, T. P. Peixoto, and S. Fortunato. Phys. Rev. X , 031038 (2016).[11] X. Wu, W. Wang, and W. X. Zheng, Phys. Rev. E ,046106 (2012).[12] H. Liao, A. Zeng, and Y. C. Zhang, Physica A , 216(2015).[13] J. Casadiego, M. Nitzan, S. Hallerberg, and M. Timme,Nat. Commun. , 2192 (2017).[14] B. J. L¨unsmann, C. Kirst, and M. Timme, PLOS One , e0186624 (2017).[15] D. L. Baulch, C. T. Bowman, C. J. Cobos, R. A. Cox, Th.Just, J. A. Kerr, M. J. Pilling, D. Stocker, J. Troe, W.Tsang et al., J. Phys. Chem. Ref. Data , 757 (2005).[16] J. M. Simmie, Prog. Energy Combust. Sci. , 599(2003).[17] J. Li, Z. Zhao, A. Kazakov, and F. L. Dryer, Int. J. Chem.Kinet. , 566 (2004).[18] M. ´O Conaire, H. J. Curran, J. M. Simmie, W. J. Pitz,and C. K. Westbrook, Int. J. Chem. Kinet. , 603(2004).[19] E. Ranzi, A. Frassoldati, R. Grana, A. Cuoci, T. Far-avelli, A. Kelley, and C. Law, Prog. Energy Combust.Sci. , 468 (2012).[20] E. P. Dougherty and H. Rabitz, J. Chem. Phys. , 6571(1980).[21] T. Tur´anyi, J. Math. Chem. , 203 (1990).[22] A. S. Tomlin, Proc. Combust. Inst. , 159 (2013).[23] G. P. Smith, D. M. Golden, M. Frenklach, N. W. Mo-riarty, B. Eiteneer, M. Goldenberg, C. T. Bowman, R.K. Hanson, S. Song, W. C. Gardiner, Jr. et al., Gas Re-search Institute (1999), http://combustion.berkeley.edu/gri-mech/ .[24] D. G. Goodwin, H. K. Moffat, and R. L. Speth, Can-tera: An Object-Oriented Software Toolkit for Chemi-cal Kinetics, Thermodynamics, and Transport Processes , (2018). [25] Source code and data sets for the combustion and pyrol-ysis chemical kinetics are available at https://github.com/znicolaou/ratemeasure .[26] See Supplemental Material at http://link.aps.org/supplemental/10.1103/PhysRevResearch.2.043135 foranimations illustrating the evolution of the networks inFig. 2 and Fig. 6.[27] R. Atkinson, D. Baulch, R. Cox, J. Crowley, R. Hamp-son, R. Hynes, M. Jenkin, M. Rossi, J. Troe, and T.Wallington, Atmospheric Chem. Phys. , 4141 (2008).[28] P. E. A. Debiagi, G. Gentile, M. Pelucchi, A. Frassoldati,A. Cuoci, T. Faravelli, and E. Ranzi, Biomass Bioenergy , 60 (2016).[29] J. Runge, V. Petoukhov, J. F. Donges, J. Hlinka, N. Ja-jcay, M. Vejmelka, D. Hartman, N. Marwan, M. Paluˇs,and J. Kurths, Nat. Commun. , 8502 (2015).[30] A. Chakrabarti, L. Miskovic, K. C. Soh, and V. Hatzi-manikatis, Biotechnol. J. , 1043 (2013).[31] A. Khodayari, A. R. Zomorrodi, J. C. Liao, and C. D.Maranas, Metab. Eng. , 50 (2014).[32] H. Link, D. Christodoulou, and U. Sauer, Curr. Opin.Biotechnol. , 8 (2014).[33] M. K. Transtrum and P. Qiu, Phys. Rev. Lett. ,098701 (2014).[34] M. K. Transtrum, B. B. Machta, K. S. Brown, B. C.Daniels, C. R. Myers, and J. P. Sethna, J. Chem. Phys. , 010901 (2015).[35] A. White, M. Tolman, H. D. Thames, H. R. Withers, K.A. Mason, and M. K. Transtrum, PLOS Comput. Biol. , e1005227 (2016).[36] S. Horv´at, E. Czabarka, and Z. Toroczkai, Phys. Rev.Lett. , 158701 (2015).[37] M. D¨ontgen, M. D. Przybylski-Freund, L. C. Kr¨oger, W.A. Kopp, A. E. Ismail, and K. Leonhard, J. Chem. The-ory Comput. , 2517 (2015).[38] M. Alaghemandi, L. B. Newcomb, and J. R. Green. J.Phys. Chem. A , 1686 (2017).[39] S. Eubank, H. Guclu, V. S. A. Kumar, M. V. Marathe,A. Srinivasan, Z. Toroczkai, and N. Wang. Nature ,180 (2004).[40] D. Liben-Nowell and J. Kleinberg, J. Assoc. Inf. Sci.Technol. , 1019 (2007).[41] T. D. Jorgensen, K. J. Forney, J. A. Hall, and S. M. Giles,Soc. Networks , 26 (2018).[42] S. N. Backhaus and M. Chertkov, Phys. Today (5), 42(2013).[43] A. E. Motter, S. A. Myers, M. Anghel, and T. Nishikawa,Nat. Phys. , 191 (2013).[44] A. Ghasemian, H. Hosseinmardi, A. Galstyan, E. M.Airoldi, and A. Clauset, Proc. Natl. Acad. Sci. USA117