bioRxiv | 2021

Countering reproducibility issues in mathematical models with software engineering techniques: A case study using a one-dimensional mathematical model of the atrioventricular node

 
 
 
 
 

Abstract


One should assume that in silico experiments in systems biology are less susceptible to reproducibility issues than their wet-lab counterparts, because they are free from natural biological variations and their environment can be fully controlled. However, recent studies show that only half of the published mathematical models of biological systems can be reproduced without substantial effort. In this article we examine the potential causes for failed or cumbersome reproductions in a case study of a one-dimensional mathematical model of the atrioventricular node, which took us four months to reproduce. The model demonstrates that even otherwise rigorous studies can be hard to reproduce due to missing information, errors in equations and parameters, a lack in available data files, non-executable code, missing or incomplete experiment protocols, and missing rationales behind equations. Many of these issues seem similar to problems that have been solved in software engineering using techniques such as unit testing, regression tests, continuous integration, version control, archival services, and a thorough modular design with extensive documentation. Applying these techniques, we reimplement the examined model using the modeling language Modelica. The resulting workflow is independent of the model and can be translated to SBML, CellML, and other languages. It guarantees methods reproducibility by executing automated tests in a virtual machine on a server that is physically separated from the development environment. Additionally, it facilitates results reproducibility, because the model is more understandable and because the complete model code, experiment protocols, and simulation data are published and can be accessed in the exact version that was used in this article. We found the additional design and documentation effort well justified, even just considering the immediate benefits during development such as easier and faster debugging, increased understandability of equations, and a reduced requirement for looking up details from the literature. Author summary Reproducibility is one of the cornerstones of the scientific method. In order to draw reliable conclusions, an experiment must yield the same results when it is repeated using the same methods. However, biological systems are complex, which makes experiments cumbersome. It is therefore desirable to build a mathematical representation of the biological system, which captures its essential behavior in a set of variables and equations and allows for easier and faster experimentation. Unfortunately, recent studies have shown that half of the published mathematical models are not immediately reproducible due to missing information, mathematical errors, and incomplete documentation. These issues are similar to those faced in software engineering: A single missing file or a buggy line of code can render any kind of software useless. Software engineering has turned to rigorous software testing, automated development pipelines, and version control systems to overcome these challenges, but these techniques are not yet widely applied to mathematical modeling. In this paper we demonstrate their benefit for the reproducibility of a large mathematical model of the atrioventricular node. The software engineering solutions that we employ can be applied to any mathematical model and could therefore facilitate scientific progress by encouraging and simplifying model reuse.

Volume None
Pages None
DOI 10.1101/2021.02.19.431951
Language English
Journal bioRxiv

Full Text