[PDF] A multifactorial evaluation framework for gene regulatory network reconstruction

Abstract

In the past years, many computational methods have been developed to infer the structure of gene regulatory networks from time-series data. However, the applicability and accuracy presumptions of such algorithms remain unclear due to experimental heterogeneity. This paper assesses the performance of recent and successful network inference strategies under a novel, multifactorial evaluation framework in order to highlight pragmatic tradeoffs in experimental design. The effects of data quantity and systems perturbations are addressed, thereby formulating guidelines for efficient resource management. Realistic data were generated from six widely used benchmark models of rhythmic and non-rhythmic gene regulatory systems with random perturbations mimicking the effect of gene knock-out or chemical treatments. Then, time-series data of increasing lengths were provided to five state-of-the-art network inference algorithms representing distinctive mathematical paradigms. The performances of such network reconstruction methodologies are uncovered under various experimental conditions. We report that the algorithms do not benefit equally from data increments. Furthermore, for rhythmic systems, it is more profitable for network inference strategies to be run on long time-series rather than short time-series with multiple perturbations. By contrast, for the non-rhythmic systems, increasing the number of perturbation experiments yielded better results than increasing the sampling frequency. We expect that future benchmark and algorithm design would integrate such multifactorial considerations to promote their widespread and conscientious usage.

Full PDF

AA multifactorial evaluation framework forgene regulatory network reconstruction (cid:63)

Laurent Mombaerts ∗ Atte Aalto ∗ Johan Markdahl ∗ Jorge Gon¸calves ∗∗ Luxembourg Centre for Systems Biomedicine, University ofLuxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg (e-mail:[email protected], [email protected], [email protected],[email protected]).

Abstract:

In the past years, many computational methods have been developed to infer thestructure of gene regulatory networks from time-series data. However, the applicability andaccuracy presumptions of such algorithms remain unclear due to experimental heterogeneity.This paper assesses the performance of recent and successful network inference strategiesunder a novel, multifactorial evaluation framework in order to highlight pragmatic tradeoﬀsin experimental design. The eﬀects of data quantity and systems perturbations are addressed,thereby formulating guidelines for eﬃcient resource management.Realistic data were generated from six widely used benchmark models of rhythmic and non-rhythmic gene regulatory systems with random perturbations mimicking the eﬀect of geneknock-out or chemical treatments. Then, time-series data of increasing lengths were providedto ﬁve state-of-the-art network inference algorithms representing distinctive mathematicalparadigms. The performances of such network reconstruction methodologies are uncoveredunder various experimental conditions. We report that the algorithms do not beneﬁt equallyfrom data increments. Furthermore, for rhythmic systems, it is more proﬁtable for networkinference strategies to be run on long time-series rather than short time-series with multipleperturbations. By contrast, for the non-rhythmic systems, increasing the number of perturbationexperiments yielded better results than increasing the sampling frequency. We expect that futurebenchmark and algorithm design would integrate such multifactorial considerations to promotetheir widespread and conscientious usage.

Keywords:

Network Inference; Modelling; Dynamics and Control; Systems medicine1. INTRODUCTIONGenes in living organisms do not function in isolation,but may activate or suppress the activity of other genes,forming intricate networks of regulatory interactions. Theinvestigation of the struture of gene regulatory networks(GRN), which consists in unveiling the set of biochemicalinteractions that control gene expression, is a fundamentalstep in the understanding of disease mechanisms and thedevelopment of future drugs and therapies. Recovering thetopology of such networks from gene expression proﬁlesremains, however, a crucial challenge in systems biologydue to their vast complexity, intrinsic stochasticity, andlimited observability.In the last few years, many computational methods basedon a variety of mathematical paradigms have been devel-oped to reconstruct GRNs from time-series data, and theirperformances assessed on relevant mathematical models(Casadiego et al., 2017; Aderhold et al., 2017; Marbachet al., 2012). However, experimentalists are still facedwith diﬃcult questions: Which method to use with an (cid:63)

AA and JM are supported by University of Luxembourg internalresearh projects PPPD (both) and OptBioSys (AA). available dataset? On the other hand, with ﬁxed amountof resources, what kind of experiments to carry out toensure optimal use of resources? How much can be gainedby investing into a few more measurements?The purpose of this study is to provide guidelines forconscientious management of biological resources by un-veiling the performances of state-of-the-art network in-ference strategies under various experimental designs. Tothis end, the eﬀects of data quantity and multi-experimentavailability are assessed simultaneously on the accuracy ofthe topological reconstruction.Navigating experimental tradeoﬀs for GRN inference isnot an entirely new concern, although literature on thetopic is surprisingly scarce (Haque et al., 2019). Such amultifactorial approach has never been undertaken for sys-tematic evaluation of network reconstruction algorithms.Sefer et al. (2016) studied the tradeoﬀs between denseand replicate sampling strategies. Their results showedthat, under reasonable noise assumptions, gene expressionproﬁles reconstructed from dense sampling are more ac-curate than those resulting from replicate sampling. Geieret al. (2007) showed that at equivalent data size, short-time, gene knock-out experiments contain more informa- a r X i v : . [ q - b i o . M N ] J un ion about the GRN structure than single experiment,longer recordings of non-rhythmic systems. The GRN in-ference algorithms used in their study, however, are nolonger state-of-the-art. Markdahl et al. (2017) studied thecell cycle of Saccharomyces cerevisiae as a case-study toanalyze the eﬀect of temporal resolution on the qualityof the inferred network. The performance as a functionof time-series length resulting from a LASSO methodol-ogy (Tibshirani, 1996) resembled a sigmoid shape witha plateauing eﬀect at the end. Mombaerts et al. (2016)used a model of the circadian network of Arabidopsisthaliana, a rhythmic system with period of about 24 hours,to show that sampling from transient dynamics providesricher information for the identiﬁcation of the underlyingGRN topology. Muldoon et al. (2019) identiﬁed previouslyunrecognized factors that aﬀect inference outcomes, suchas stimulus-speciﬁc experimental design and network mo-tifs in the vicinity of a stimulus. Following the DREAM3competition, Marbach et al. (2010) investigated strengthsand weaknesses of algorithms in recognizing types of motifsthat appear in gene networks. Finally, it has also beenshown that, for mutual-information based techniques, theaccuracy reaches a saturation point after a speciﬁc datasize (Chaitankar et al., 2010). Algorithms based on corre-lation or mutual information are, however, excluded fromthis research as they cannot detect causality between genes(Marbach et al., 2012).Realistic time-series datasets were generated from onerhythmic and ﬁve non-rhythmic models of gene regulatorynetworks that have been widely used as benchmarks inthe literature (Aderhold et al., 2017; Prill et al., 2010).External interventions, i.e., gene deletion (knock-out) andchemical treatments, are explicitly simulated to providea comprehensive picture of the performance of each algo-rithm under a range of experimental conditions. Increas-ingly rich multi-experiment time series datasets are fed toﬁve network inference algorithms. Performance is assessedusing standard techniques for classiﬁcation algorithms,studying the area under the receiver operating charac-teristics (ROC) and the precision-recall (PR) curves. Weexpect such pragmatic considerations to contribute to thedevelopment of conscientious experimental designs andappropriate mathematical benchmarks.2. METHODS The use of in silico networks is preferable to randomgraphs, as they account for realistic structural propertiesof biological networks (Schaﬀter et al., 2011). The oscil-lating gene regulatory system used as a benchmark in thispaper is a model of circadian regulation of Arabidopsisthaliana, hereafter referred to as Millar 10 (Figure 1A)(Pokhilko et al., 2010). Conceptually, it is composed of 2interconnected feedback loops and an input pathway thatincorporates external light cues in order to synchronizethe plant to the surrounding light conditions. When sub-jected to another external regime, e.g., constant light orconstant darkness, the system displays transient dynamicsand reaches a new limit cycle. It is expected that measure-ments of the transient period contain more informationthan measurements of a limit cycle. This hypothesis was Fig. 1. Gene regulatory networks used as benchmarks.Blue pointed arrows and red blunt arrows representactivation and inhibition reactions respectively. A. Millar 10 Model (Rhythmic)

B.–F.

DREAM4 models1–5conﬁrmed by Mombaerts et al. (2016). To complementthese results, the performances of the network inferencestrategies are ﬁrst analyzed under such transient dynam-ics. For this purpose, the model has been simulated for240 hours in light/dark cycles to remove initial systemtransients and then switched to constant light regime.Time windows of 48 hours are then extracted from thelight/dark limit cycle, at the transition to constant lightand 48 hours after transition to constant light for furthercomparison, emulating the computational framework ofMombaerts et al. (2016). In the following analysis, timewindows starting from the transition to constant light andup to 3 days of observations (24-36-48-60-72 hours) areconsidered.The simulations are based on Langevin equations (stochas-tic diﬀerential equations) to model the intrinsic stochas-ticity in the dynamics of gene regulatory networks (ac-counting for molecular noise in both transcription andtranslation processes) (Gillespie, 2000). The intrinsic noiseis expected to have a signiﬁcant impact on the behavior ofthe system (Guerriero et al., 2011). Then, data are down-sampled to resemble realistic experimental design (every4 hours in the case of circadian experiments). Finally,the protein concentrations are not made available in theprovided datasets (only mRNA concentrations levels).The Millar 10 model has been simulated to reproduce geneknock-out experiments. Knock-out experiments are veryinformative, more than knock-down experiments, as theyprovide network response to individual and large perturba-tions (genes are deleted) (Marbach et al., 2012). Knock-outexperiments were simulated as in Aderhold et al. (2014),by replacing the transcription rates of the targeted genesby random noise drawn from a truncated normal distribu-tion to ensure non-negativity of the concentrations. Genesthat have been knocked out are, thefore, not inﬂuenced bytheir structural regulators anymore. The datasets in thenumerical experiments consist of a wildtype (WT) timeseries, and up to three randomly chosen knock-out time se-ries at a time. This selection has been randomized 6 timesto account for the uneven informative potential of diﬀerentgenes in the network. Furthermore, such simulations beingstochastic, each experiment was replicated 10 times toprovide a representative view of the performance of eachomputational method. In total, 950 = 10 · · · · · · · · Formally, the reconstruction of the structure of gene reg-ulatory networks corresponds to the identiﬁcation of theregulators π i for all given genes i ∈ { , .., n } (where n isthe amount of genes in the system), so that dy i ( t ) dt = f ( π i ( t )) − α i y i ( t ) (1)where y i ( t ) is the mRNA concentration of gene i at time t and the term α i y i ( t ) corresponds to the degradation rate of y i ( t ). The nonlinear function f represents the inﬂuence ofthe transcription factors of the parents on the target genes.In practice, however, the time course data of the regulators π i consist of gene expression levels as the protein levels aretypically not available to us. Most methods then considera simpliﬁed model (1) where gene expression values of theparent genes are used as proxies to transcription factors,and then π i ( t ) ⊂ { y ( t ) , ..., y n ( t ) } . The concatenation ofthe structural regulators π i of every gene, then, formsthe network underlying gene regulation, which is mostlysparse. The problem is typically undetermined, since thenumber of available time points is small compared to thenumber of possible solutions and since not all species areobservable. Moreover, regulatory interactions can either beadditive or multiplicative, while proteins might undergopost-translational modiﬁcations, adding another layer ofcomplexity. The diﬃculty for the development of inferencealgorithms is to decide upon the complexity of the strategywhile being aware of the inherent overﬁtting danger. We aim to highlight the advantages and optimal applicabilityranges for current state-of-the-art methodologies by esti-mating their performances on various experimental scenar-ios. While the results presented are by no means exhaustivein terms of strategies or experimental design scenarios,they support the eﬃcient use of biological resources.The network inference methods included in the comparisonrepresent the function f at various levels of complexity.They are All-to-all (ATA) (Mombaerts et al., 2019), Gaus-sian Process Dynamical Models (GPDM) (Aalto et al.,2018), dynamical GEne Network Inference with Ensembleof trees (dynGENIE3) (Huynh-Thu and Geurts, 2018),Algorithm for Revealing Network Interactions (ARNI)(Casadiego et al., 2017) and Improved Chemical ModelAveraging (iCheMA) (Aderhold et al., 2017). For details,we refer to publications presenting the methods.The All-to-all method is a parametric method based onﬁtting a linear model (of order one, by default, but higherorder models are also possible) i.e., f is linear in (1),between each pair of genes, one pair at a time. A ﬁtnessscore is computed for each pair, which is regarded asa conﬁdence level on the existence of the correspondingregulation. Here, the methodology presented by Mom-baerts et al. (2019) has been extended to deal with multi-experiment datasets. The datasets are merged together sothat the dynamics to be identiﬁed are identical through allexperimental conditions, assuming that the signal-to-noiseratios are similar in diﬀerent experiments. The ﬁtness scoreover multiple experiments is ﬁt = 1 − (cid:112)(cid:80) t ∈ T y t ) − ˆ y t | θ ))2+ ... + (cid:112)(cid:80) t ∈ Tn ( yn ( t ) − ˆ yn ( t | θ ))2 (cid:112)(cid:80) t ∈ T y t ) − ¯ y t ))2+ ... + (cid:112)(cid:80) t ∈ Tn ( yn ( t ) − ¯ yn ( t ))2 where n represents the amount of experimental conditions, T the sampling times of experiments, y the gene expressionlevel, ˆ y the modeled gene expression, and ¯ y is the averagevalue of the gene expression level.GPDM, a non-parametric method, models gene expressionas a nonlinear stochastic diﬀerential equation (cid:26) y j = x ( t j ) + v j dx = g ( x, θ ) dt + du where the dynamics function g is modeled as a Gaussianprocess with some hyperparameters θ , v j ’s correspond tomeasurement errors, and u is a Brownian motion. Thisdeﬁnes gene expression as a stochastic process whose re-alizations can be sampled using a Markov Chain Monte-Carlo (MCMC) strategy. Network inference is based onestimating the hyperparameters of the covariance functionof the GP. Multi-experiments are taken into account byassuming that all time series are produced by the samedynamics function g . Independent samplers are then con-structed for trajectories x corresponding to diﬀerent ex-periments. Performance of GPDM was recently comparedto the best performers of the DREAM4 challenge andconsistently shown superior in dealing with short timeseries data (Aalto et al., 2018).ARNI is a recently developed non-parametric methodused for the estimation of network topologies that per-formed well in network inference from a large collection ofshort time series. The derivatives are estimated explicitlythrough a diﬀerence approximation and the relationshipsbetween nodes in the network are estimated by solving nonlinear regression problem, with a user-selected li-brary of nonlinear basis functions, using a greedy approach(Casadiego et al., 2017). In the experiments, we usedpolynomial basis functions of degree at most 3.The semi-parametric method dynGENIE3 is an adapta-tion of the GENIE3 method for time series data. GENIE3was the best performer in the DREAM4 MultifactorialNetwork Inference challenge and the DREAM5 NetworkInference Challenge. The transcription function f in (1)is represented by an ensemble of regression trees which isestimated from the gene expression data and their deriva-tives, estimated using a diﬀerence approximation (Huynh-Thu and Geurts, 2018).Alternatively, iCheMA estimates derivatives from the databy ﬁtting a smooth Gaussian process to the time-series.Then, gene expression proﬁles are modeled using theMichaelis-Menten formula for mass-action kinetics: dy i ( t ) dt = (cid:88) u ∈ π i v u,i I u,i y u ( t ) + (1 − I u,i ) k u,i y u ( t ) + k u,i − α i y i ( t )where k u,i corresponds to the Michaelis-Menten param-eters and I u,i indicates whether the regulation is an ac-tivation ( I u,i = 1) or an inhibition ( I u,i = 0). Networkinference is then based on estimating the parameters usingan MCMC approach. iCheMA goes exhaustively throughall possible combinations of regulators (typically, up to 3 ata time), which makes it a computationally heavy algorithmthat does not scale easily to large systems.Table 1 summarizes the properties of the methods includedin the comparison. A method is deemed a continuous-time method if it is based on continuous trajectory-ﬁtting,or modeling from a continuous-time system. Methodsestimating derivatives from the data and then solvinginput-output regression are deemed discrete-time meth-ods. Combinatorial eﬀects mean dynamics of the form˙ y i = f ( y j , y k ) where f ( y j , y k ) cannot be represented as asum f ( y j , y k ) = g ( y j )+ h ( y k ). The table shows whether themethods explicitly take into account combinatorial eﬀects.The computational time is based on 48 hours recordings(13 datapoints) of the Millar 10 model. Performance rank-ing is based on average AUPREC values.3. RESULTSThe performances of each algorithm are here assessed interms of the resulting Area Under the Receiver Oper-ating Characteristic (AUROC) and the Area Under thePrecision-Recall (AUPREC). Precision-Recall curves allowTable 1. Properties of the diﬀerent methods. Method N o n li n e a r d y n a m i c s C o n t i nu o u s - t i m e C o m b i n a t o r i a l e ﬀ ec t s H i dd e nn o d e s C o m pu t a t i o n t i m e ( s ) P e r f o r m a n ce M ill a r / D R E A M All-to-all (cid:88) ( (cid:88) ) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) ( (cid:88) ) (cid:88) ( (cid:88) ) If higher-order dynamics are used. Discussed in the article, but not in the implementation.

Fig. 2. Evaluation of the eﬀects of dynamical transientson the rhythmic model (Millar 10). Statistical signif-icance between predictions from transient dynamicsand other conditions are indicated by star symbols.They are computed with the Mann-Whitney U-test.for a more accurate picture of algorithms performances forsparse GRNs and is commonly used for the comparison ofinference algorithms. Auto-regulatory interactions are notconsidered.Decomposing the time-series resulting from the rhythmicmodel into synchronized-desynchronized states showedthat, on average, the accuracy of the network reconstruc-tion is improved by considering transient dynamics (Fig-ure 2). While GPDM, ATA, and dynGENIE3 beneﬁt —to a varying degree— from the transition to the desyn-chronized state, change in performance of iCheMA wasnot statistically signiﬁcant and ARNI’s performance wasslightly impaired. It should be noted that a signiﬁcantincrease in accuracy is observed for the strategies that donot explicitly estimate derivatives.Figures 3 and 4 display the performance of each algorithmresulting respectively from the simulations with data fromthe Millar 10 model and the steady-state systems underseveral combinations of data types.On one hand, these graphs show that GPDM outperformsthe other approaches for almost every system and exper-imental conﬁguration considered. It is outperformed byATA in only one case with 24h wildype only data fromthe Millar 10 model. This illustrates the importance ofvarious experimental scenarios in benchmarking networkinference strategies and motivates the work undertakenin this paper. Interestingly, the simple pairwise low orderlinear modeling (ATA) seems to outperform dynGENIE3,iCheMA and ARNI in terms of AUPREC for every ob-servation length and system perturbation considered inthe rhythmic model. Only the AUROC values of the non-parametric approach ARNI exceed those of ATA for the 3mutations case, starting from 36 hours of observation.It is further interesting to notice that not all algorithmsbeneﬁt equally from data increments. The gain of accuracyresulting from increasing the amount of data in the rhyth-ig. 3. Area Under the ROC curve and the Precision-Recall curve resulting from the inference of the Mil-lar 10 model, for multiple combinations of observa-tion lengths and system perturbations. The bars aregrouped by the amount of additional recordings re-sulting from perturbations applied to the system (upto 3) and decomposed into data observation lengthsof [24-36-48-60-72] hours (from left to right).mic model is only mild for the linear modeling strategy andiCheMA while it is signiﬁcant for GPDM, dynGENIE3,and ARNI. In this regard, in average, GPDM beneﬁts fromthe largest increase in accuracy whereas dynGENIE3 andARNI compete at a slightly lower level for the experimen-tal conditions presented. A saturation eﬀect, however, canbe observed at AUPREC values of around 0.8 for GPDM,0.63 for the ATA, and 0.58 for ARNI.The analysis of the DREAM competition models deliv-ers a diﬀerent view on network reconstruction as notall nodes are stimulated in a given system perturbation.For those networks, the beneﬁt of additional system per-turbations is considerable as they allow investigation ofnovel, previously unstimulated segments of the network.In the experimental design cases presented, none of thealgorithms seemed to approach a saturation point for thedata combinations considered. While the GPDM succeedsin providing the best accuracy for the DREAM networksas well, dynGENIE3 ranks second, ARNI third, iCheMAfourth and ATA last. The reason why the linear modelingstrategy is surpassing dynGENIE3 and ARNI for the 1perturbation only case and does not improve for additionaldatasets is likely related to the partially stimulated natureof the whole dynamical system and has yet to be investi-gated.On the other hand, Figures 3 and 4 allow for proper visu-alization of experimental tradeoﬀs. Doubling the amountof datapoints by performing another experiment does notprovide the same level of information than doubling theamount of datapoints in a given experimental setup. Ta- Fig. 4. Area Under the ROC curve and the Precision-Recall curve resulting from the inference of theDREAM models, for multiple combination of datasampling rates and systems perturbations quantity.The bars are grouped by the amount of systemsperturbations (up to 4) and decomposed into dataquantity of [11-21-41] datapoints (from left to right).ble 2 summarizes the amount of datapoints in each ofthe presented experimental setups. While the cost of eachdatapoint might not be equivalent whether it originatesfrom a novel system perturbation or from longer recording,these tables provide insight on how to choose an appro-priate experimental scenario regarding the performancesof each of the algorithms presented in Figures 3 and 4.For instance, sequencing a gene in WT every 4 hoursduring 72 hours requires a similar amount of datapointsas the WT with 2 mutations for 24 hours or the WT with1 mutation for 36 hours. In this case, the experimentaldesign that provides the best results would be a singlerecording of 72 hours resulting in an AUPREC of 0.84using GPDM, compared to 0.75 or 0.7. Regardless of thealgorithm and assuming an equivalent cost per datapoint,it can be observed that, as a general rule of thumb and fortop performing strategies, it is often preferable to observethe rhythmic system for a longer amount of time. By con-trast, increasing the sampling frequency of the steady-statesystems only resulted in a marginal improvement in theaccuracy of network reconstruction. Surely, a lower boundon the sampling rate is required for a reliable constructionof those systems but it was not reached in this analysis.4. DISCUSSIONChoosing between diﬀerent experimental designs and net-work inference strategies depends on the research questionand the resource constraints. For this purpose, performinga complete cycle study involving multiple inference strate-gies and speciﬁc benchmarks, such as in (Madhamshetti-war et al., 2012), is not uncommon. However, while suchanalysis provides a comprehensive idea of the relevance ofable 2. Left: Numbers of measurements inthe Millar 10 experiments. Sampling rate isalways 4h. Right: Numbers of measurementsin the DREAM experiments. Window lengthis always 20.

Window (h) 24 36 48 60 72WT 7 10 13 16 19WT + 1 Mut 14 20 26 32 38WT + 2 Mut 21 30 39 48 57WT + 3 Mut 28 40 52 64 76 ∆ t the inferred network topology, it represents a signiﬁcantinvestment of both time and money, and is sometimes noteven possible.In this paper, the general eﬀects of data quantity andsystem perturbations on the accuracy of the GRN recon-struction were evaluated for one rhythmic model of generegulation and ﬁve steady-state models. Our contributionis threefold. We showed the relevance of multifactorialbenchmarks to assess the performances of network infer-ence strategies, the importance of an appropriate choiceof model complexity given data availability, and revealedpragmatic considerations for experimental designs. De-pending on the cost of performing more experiments orincreasing the amount of datapoints, one of those choicesshould be preferred.The algorithms considered here showed consistent perfor-mances across the 6 investigated networks. All networkinference strategies did not, however, beneﬁt equally fromthe increasing amount of data. Nevertheless, the fact thatthe parameter free, Gaussian process strategy GPDM hasbeen consistently outperforming all strategies presented isnoticeable and of further interest. In addition, by look-ing at the data expense and the resulting reconstructionaccuracy, it should be further noted that GRN inferencealgorithms should improve the way various time seriesexperiments originating from the same biological systemare taken into account.Marbach et al. (2012) showed that, on average, a com-bination of network inference strategies leads to the bestnetwork reconstruction. As such, we noticed that the or-der in which the links were inferred by each algorithm,and experiment, was diﬀerent. Further research, therefore,should learn the ranks, or conﬁdence levels, of each linkin the network reconstruction process and design a propercombination of the algorithms that optimizes their syn-ergy, depending on the experimental conditions.Multifactorial studies such as the one presented here re-quire a considerable amount of simulations. Some algo-rithms, such as those involving MCMC sampling, tookseveral days to run on a 24 core workstation. As such,a complete analysis of the experimental design space isnot possible but other decisional factors exist and requirefurther inspection. Among those, Sefer et al. (2016) showedthat denser sampling is preferable to additional replicates.Such strategy could be particularly proﬁtable for transientdata and to algorithms that explicitly estimate derivatives.Muldoon et al. (2019) used time series data originatingfrom the DREAM 4 challenge to show that using only halfof the perturbation data (without the recovery to steady-state) might be beneﬁcial to some algorithms. Further-more, some methods are able to incorporate information on external inputs, such as perturbations (with the targetsstill unknown), which increases the average performance.In addition, in practice, gene regulatory networks are oftenof bigger dimension which is not always accessible to themost computationally expensive algorithms.Finally, our study did not take into account prior knowl-edge of the system, which could potentially be iterativelyintegrated into each step of the network reconstruction.For example, a strategy such as the one presented by Ud-Dean and Gunawan (2015) actively optimizes the precisionof the predictions by proposing the next most informativeknock out. In such case, the aforementioned results wouldlikely understimate the resulting accuracy of the recon-struction. In fact, doubling the amount of data pointsby doubling the observation time or by performing anadditional experiment not only provides diﬀerent levels ofinformation, but can reveal diﬀerent parts of the networkas well. Such strategy might be necessary to cope with themost isolated genes. REFERENCESAalto, A., Viitasaari, L., Ilmonen, P., and Gon¸calves,J. (2018). Continuous time Gaussian process dy-namical models in gene regulatory network inference. ArXiv:1808.08161 .Aderhold, A., Husmeier, D., and Grzegorczyk, M. (2017).Approximate Bayesian inference in semi-mechanisticmodels.

Statistics and Computing , 27(4), 1003–1040.Aderhold, A., Husmeier, D., and Grzegorczyk, M. (2014).Statistical inference of regulatory networks for circadianregulation.

Statistical Applications in Genetics andMolecular Biology , 13(3), 227–273. doi:10.1515/sagmb-2013-0051.Casadiego, J., Nitzan, M., Hallerberg, S., and Timme,M. (2017). Model-free inference of direct networkinteractions from nonlinear collective dynamics.

NatureCommunications , 8(1), 2192.Chaitankar, V., Ghosh, P., Perkins, E.J., Gong, P., andZhang, C. (2010). Time lagged information theoreticapproaches to the reverse engineering of gene regulatorynetworks.

BMC Bioinformatics , 11(6), S19.Geier, F., Timmer, J., and Fleck, C. (2007). Reconstruct-ing gene-regulatory networks from time series, knock-out data, and prior knowledge.

BMC systems biology ,1(1), 11.Gillespie, D. (2000). The chemical Langevin equation.

TheJournal of Chemical Physics , 113(1), 297–306.Guerriero, M., Pokhilko, A., Fern´andez, A., Halliday, K.,Millar, A., and Hillston, J. (2011). Stochastic propertiesof the plant circadian clock.

Journal of the Royal SocietyInterface , 9(69), 744–756.Haque, S., Ahmad, J.S., Clark, N.M., Williams, C.M., andSozzani, R. (2019). Computational prediction of generegulatory networks in plant growth and development.

Current Opinion in Plant Biology , 47, 96 – 105. doi:https://doi.org/10.1016/j.pbi.2018.10.005. Growth anddevelopment.Huynh-Thu, V. and Geurts, P. (2018). dynGENIE3:dynamical GENIE3 for the inference of gene networksfrom time series expression data.

Nature ScientiﬁcReports , 8(1), 3384.Huynh-Thu, V.A. and Geurts, P. (2018). dynGENIE3:dynamical GENIE3 for the inference of gene networksrom time series expression data.

Scientiﬁc Reports ,8(1), 3384. doi:10.1038/s41598-018-21715-0.Madhamshettiwar, P.B., Maetschke, S.R., Davis, M.J.,Reverter, A., and Ragan, M.A. (2012). Gene regulatorynetwork inference: evaluation and application to ovariancancer allows the prioritization of drug targets.

GenomeMedicine , 4(5), 41.Marbach, D., Prill, R., Schaﬀter, T., Mattiussi, C., Flore-ano, D., and Stolovitzky, G. (2010). Revealing strengthsand weaknesses of methods for gene network inference.

PNAS , 107(14), 6286–6291.Marbach, D., Schaﬀter, T., Mattiussi, C., and Floreano, D.(2009). Generating realistic in silico gene networks forperformance assessment of reverse engineering methods.

Journal of Computational Biology , 16(2), 229–239.Marbach, D., Costello, J.C., K¨uﬀner, R., Vega, N.M.,Prill, R.J., Camacho, D.M., Allison, K.R., Kellis, M.,Collins, J.J., Aderhold, A., Stolovitzky, G., Bonneau,R., Chen, Y., Cordero, F., Crane, M., Dondelinger, F.,Drton, M., Esposito, R., Foygel, R., De La Fuente, A.,Gertheiss, J., Geurts, P., Greenﬁeld, A., Grzegorczyk,M., Haury, A.C., Holmes, B., Hothorn, T., Husmeier, D.,Huynh-Thu, V.A., Irrthum, A., Karlebach, G., L`ebre,S., De Leo, V., Madar, A., Mani, S., Mordelet, F.,Ostrer, H., Ouyang, Z., Pandya, R., Petri, T., Pinna,A., Poultney, C.S., Rezny, S., Ruskin, H.J., Saeys, Y.,Shamir, R., Sˆırbu, A., Song, M., Soranzo, N., Statnikov,A., Vega, N., Vera-Licona, P., Vert, J.P., Visconti, A.,Wang, H., Wehenkel, L., Windhager, L., Zhang, Y., andZimmer, R. (2012). Wisdom of crowds for robust genenetwork inference.

Nature Methods , 9(8), 796–804. doi:10.1038/nmeth.2016.Markdahl, J., Colombo, N., Thunberg, J., and Goncalves,J. (2017). Experimental design tradeoﬀs for gene regu-latory network inference: An in silico study of the yeastSaccharomyces cerevisiae cell cycle.

IEEE Conferenceon Decision and Control (CDC) , 423–428.Mombaerts, L., Carignano, A., Robertson, F., Hearn, T.,Junyang, J., Hayden, D., Rutterford, Z., Hotta, C., Hub-bard, K., Maria, M., Yuan, Y., Hannah, M., Goncalves,J., and Webb, A. (2019). Dynamical diﬀerential expres-sion (DyDE) reveals the period control mechanisms ofthe Arabidopsis circadian oscillator.

PLoS Computa-tional Biology , 15(1), e1006674.Mombaerts, L., Mauroy, A., and Goncalves, J. (2016). Op-timising time-series experimental design for modelling ofcircadian rhythms: the value of transient data.

IFAC-PapersOnLine , 49(26), 109–113.Muldoon, J., Yu, J., Fassia, M., and Bagheri, N.(2019). Network inference performance complex-ity: a consequence of topological, experimental, andalgorithmic determinants.

Bioinformatics . doi:10.1093/bioinformatics/btz105.Pokhilko, A., Hodge, S.K., Stratford, K., Knox, K., Ed-wards, K.D., Thomson, A.W., Mizuno, T., and Millar,A.J. (2010). Data assimilation constrains new connec-tions and components in a complex, eukaryotic circadianclock model.

Molecular Systems Biology , 6(1). doi:10.1038/msb.2010.69.Prill, R., Marbach, D., Saez-Rodriguez, J., Sorger, P.,Alexopoulos, L., Xue, X., Clarke, N., Altan-Bonnet,G., and Stolovitzky, G. (2010). Towards a rigorousassessment of systems biology models: the DREAM3 challenges.

PLoS ONE , 5(2), e9202.Schaﬀter, T., Marbach, D., and Floreano, D. (2011).GeneNetWeaver: In silico benchmark generation andperformance proﬁling of network inference meth-ods.

Bioinformatics , 27(16), 2263–2270. doi:10.1093/bioinformatics/btr373.Sefer, E., Kleyman, M., and Bar-Joseph, Z. (2016). Trade-oﬀs between dense and replicate sampling strategies forhigh-throughput time series experiments.

Cell Systems ,3, 35–42.Tibshirani, R. (1996). Regression shrinkage and selectionvia the Lasso.

Journal of the Royal Statistical Society,Ser. B , 58, 267–288.Ud-Dean, S.M. and Gunawan, R. (2015). Optimal designof gene knockout experiments for gene regulatory net-work inference.