Using Jupyter for reproducible scientific workflows
Marijan Beg, Juliette Taka, Thomas Kluyver, Alexander Konovalov, Min Ragan-Kelley, Nicolas M. Thiéry, Hans Fangohr
UUsing Jupyter for reproduciblescientific workflows
Marijan Beg
Faculty of Engineering and Physical Sciences, University of Southampton, University Road, SO17 1BJSouthampton, United Kingdom
Juliette Taka
Logilab, 104 Boulevard Auguste Blanqui, 75013 Paris, France
Thomas Kluyver
European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany
Alexander Konovalov
School of Computer Science, University of St Andrews, Jack Cole Building, North Haugh, KY16 9SX StAndrews, United Kingdom
Min Ragan-Kelley
Simula Research Laboratory, Martin Linges vei 25, 1364 Fornebu, Norway
Nicolas M. Thi´ery
Laboratoire de Recherche en Informatique, Universit ´e Paris-Saclay, CNRS, 91405 Orsay, France
Hans Fangohr
Max Planck Institute for the Structure and Dynamics of Matter, Luruper Chaussee 149, 22761 Hamburg,GermanyCenter for Free-Electron Laser Science, Luruper Chaussee 149, 22761 Hamburg, GermanyEuropean XFEL, Holzkoppel 4, 22869 Schenefeld, GermanyFaculty of Engineering and Physical Sciences, University of Southampton, University Road, SO17 1BJSouthampton, United Kingdom
Abstract —Literate computing has emerged as an important tool for computational studies andopen science, with growing folklore of best practices. In this work, we report two case studies –one in computational magnetism and another in computational mathematics – wheredomain-specific software was exposed to the Jupyter environment. This enables high levelcontrol of simulations and computation, interactive exploration of computational results, batchprocessing on HPC resources, and reproducible workflow documentation in Jupyter notebooks.In the first study, Ubermag drives existing computational micromagnetics software through adomain-specific language embedded in Python. In the second study, a dedicated Jupyter kernelinterfaces with the GAP system for computational discrete algebra and its dedicatedprogramming language. In light of these case studies, we discuss the benefits of this approach,including progress towards more reproducible and re-usable research results and outputs,notably through the use of infrastructure such as JupyterHub and Binder. I NTRODUCTION
Research usually results in a publication that presents and shares the obtained findings andconclusions. For a publication to be scientifically
Published by the IEEE Computer Society c (cid:13)
IEEE This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. alid, it must present the methodology rigorously,so that readers can follow the “recipe” and repro-duce the results. If this criterion is met, the pub-lication is considered reproducible.
Reproducible publications are more easily re-usable and thusprovide a significant opportunity to make (oftentax-payer funded) research more impactful. How-ever, the reproducibility of computational workis usually hindered not only by a lack of dataor meta-data but also by a lack of details on theprocedure and tools used:1) The source code of the software used is notavailable.2) Information on the computing environment,such as the hardware, operating system,supporting libraries, and (if required) codecompilation details is not revealed.3) The exact procedure which led to the resultsreported in the publication is not shared.This should include the set of parametersused, the simulation and data analysis pro-cedure, and any additional data cleaning,processing, and visualization. Ideally, theseare shared as open-source code and analy-sis scripts used to perform the simulationand to read, analyze, and visualize the re-sulting data. This way, the entire processcan be repeated by re-running simulationand/or analysis scripts. A human-readabledocument detailing the computational stepstaken, despite being “better-than-nothing”,is still insufficient to ensure reproducibility,and keeping a detailed log of all steps takenduring a computational study is often impos-sible.Reproducibility is a challenging question andspans a range of different topics. In this work, wefocus on one of them. We describe the featuresand capabilities of the Jupyter environment that,in our view, make it a highly productive environ-ment for computational science and mathematics,while facilitating reproducibility.The topic of bitwise reproducibility is outsidethe scope of this work: even with the samehardware and same software, it may be diffi-cult to reproduce computational results to bebitwise identical. This can originate from the non-associativity of floating-point operations com-bined with parallel execution or from compiler optimizations. Bitwise reproducibility is not al-ways required to be achieved.In the last decade, literate computing hasemerged as an important tool for computationalstudies and open science, with an ever-growingset of best practices. In this paper, we review andexpand some of these best practices in the contextof two case studies: computational magnetismand mathematics. This is based on the experienceof enabling and applying Jupyter environmentsin these fields as a part of the OpenDreamKit(https://opendreamkit.org/) project.To be able to run computational studies fromthe Jupyter environment, it is necessary to eitherhave the simulation and/or analysis code exposedto a general-purpose programming language sup-ported by Jupyter, or have a dedicated Jupyterkernel for the computational libraries. Althoughthe main topic of this work is the overview offeatures and capabilities of the Jupyter environ-ment for reproducible workflows, we begin bydiscussing how a computational library can beexposed to Jupyter as a necessary prerequisite.
Prerequisite: Exposing computationallibraries to the Jupyter environment
Computational studies often use existing com-putational (legacy) tools. These could be executa-bles called from the command line or librariesthat are used within a programming language. Forthe approach suggested here, these computationaltools need to be accessible to scientists froma general-purpose programming language sup-ported by Jupyter (such as Python). For some do-mains, such as pure mathematics research, thereare domain-specific languages with enough powerto be used directly as the programming languagein notebooks (e.g., Singular and GAP). In otherareas, exposing computational tools to a general-purpose programming language is the key to inte-grating them into researchers’ custom code. A keybenefit of making computational tools availablein a general-purpose programming language isthat the computation can be driven flexibly usingthe control structures provided by that language.For example, a simulation can conveniently berepeated with a range of parameters through a for-loop, rather than having to change a configurationfile for each value and trigger execution of thesimulation manually. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. aking the computational capability acces-sible from a general-purpose programming lan-guage supported through a Jupyter kernel suchas Python may be trivial – for example, if therequired code is already a Python library. Whenthe computational functionality is locked into anexecutable, one can create an interface layer sothat functionality can be accessed via a Pythonfunction or class [1]: input parameters will thenbe translated into configuration files, the exe-cutable called, outputs retrieved, and finally, theresults returned.If the computational tool uses a programminglanguage that Jupyter does not support, anotherpossibility is to implement a Jupyter kernel forthat language so that the computational librarycan be exposed to the Jupyter environment (asdone for GAP and SageMath for example).Over time, scientific communities tend toaccumulate functions and classes that are usedrepeatedly, and occasionally, through organicchanges or a systematic restructuring of thosecomputational capabilities, a domain-specific lan-guage is created, which is embedded in a general-purpose programming language such as Python.Depending on the design of this language, itsexistence and joint use by researchers of thatdomain can help to unify and improve computa-tional tasks in the community, avoid duplicationof work, support transfer of knowledge and re-producibility. Examples of such domain-specificlanguages include Ubermag in magnetism, Sage-Math in pure mathematics, and the atomic simu-lation environment in chemistry [2].
Features of the Jupyter researchenvironment
Project Jupyter is a set of open-source soft-ware projects for interactive and exploratory com-puting emerging from IPython. The central com-ponent offered by Jupyter is the Jupyter Notebook– a web-based interactive computing platform. Itallows users to create data- and code-driven narra-tives that combine live (re-executable) code, equa-tions, narrative text, interactive dashboards, andother rich media. Jupyter Notebook documentsprovide a complete and executable record of acomputation that can be shared with others in away that has not been possible before [3]. Withinthe Jupyter Notebook, all libraries available in Python can be imported and combined flexibly.Other languages (such as Julia, R, Haskell, Bash,and many more) are supported through otherJupyter Notebook kernels. In this work, we sug-gest using a Jupyter research environment fromwhich computational studies can be driven andconducted efficiently. In this section, we discussthe benefits of using the Jupyter environment forreproducible scientific workflows.
1. One study – one document
The notebook allows us to carry out an entirestudy within a single notebook and provides acomplete and executable record of the process. Itis possible to put the interpretation of the resultsinto the same document, immediately below thegraphical, tabular or text-based output that needsto be described. The “one study – one document”approach has immediate advantages: • Scientists can be more efficient as they do nothave to search for parts of the study (scripts,data files, plots) when trying to understand thedata and authoring the associated paper. • The study is more easily reproducible (see item6, below).However, putting all the code, data, and nar-rative into a single notebook could substantiallyaffect the notebook’s readability. Thus, it is nec-essary to decide which parts of the code shouldbe in libraries and imported in the notebook.
2. Easily shareable
Jupyter notebooks can be converted to otherfile formats, such as HTML, LaTeX and PDF.This is useful because someone working on anotebook can share it with collaborators, super-visors, or management without asking them toinstall any additional software.
3. Interactive execution or as batch job
Using a Jupyter notebook often involves in-teractively editing it, executing cells, inspectingcomputed outputs, modifying commands, and re-executing, while understanding the computationalresearch question. Once a useful processing se-quence has been found, the researcher oftenwants to repeat that, potentially with differentinput data. For such scenarios, a notebook canbe executed from the command line (using the Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 bconvert tool), treating the notebook like ascript or a batch job. As the notebook executesin batch mode, it computes the output cells,including images and other multimedia, as if itwere executed interactively, and the outputs arestored into the notebook file for later analysis andinspection. Execution of notebooks as a script isa convenient way to use the computational powerof a high-performance computing facility wheresuch notebook jobs can be submitted to the batchqueue.Where input data needs to be varied, twosolutions are available: nbparameterise and papermill . With these tools, assignments inthe first cells of a notebook can be modifiedbefore the notebook is executed as a script.
4. Static and interactive software documention
Writing research software documentation isa particular challenge in academia. Small teamsmay not see the need to document their researchcode, as they can learn about it directly from oneanother.Jupyter notebooks offer an efficient methodfor creating documentation. The popular
Sphinx documentation software can use Jupyter note-books as the documentation source with the nbsphinx plugin, and create HTML and PDFdocuments. Demos and tutorials written in note-books can complement reference documentationin Sphinx’s default reStructuredText input format.Notebooks have several benefits for extendedexamples in documentation: • It takes less time to create documentation asthe author can type commands and explana-tions into the same document, and the outputsthat the commands produce (text and images)appear immediately in the notebook. • After changing the user interface or computa-tional algorithms, re-executing the documen-tation notebooks will often show where thedocumentation needs changing. • Tools like nbval can automatically re-executethe notebooks and raise test errors if the ex-ecution fails, or the computed outputs havechanged. This means continuous integrationcan be used to check the documentation andwarn developers if changes in the code affectthe illustrated behaviour. • Using Binder, the documentation notebook canbe executed interactively by the user (see item5, below).
5. Executable interactive documents in thecloud (Binder)
The open-source Binder project [4] andBinder instances such as myBinder offer cus-tomized computational environments in the cloudon-demand, in which notebooks can be exe-cuted interactively. To use the free myBinderservice, one needs to create a publicly read-able git-repository containing Jupyter notebooksand a specification of the software requiredto execute these notebooks. This specifica-tion follows existing standards, such as aPython-style requirements.txt file, conda environment.yml file, or
Dockerfile .The myBinder service is invoked when a URLis requested containing the path to the GitHubrepository. The myBinder service searches thatrepository for the software specification, createsa suitable container, adds a Jupyter server tothe container, and exposes that server to theuser. Figure 1 offers an artistic illustration of atypical scenario for using Binder in the researchworkflow. Other use cases include: • Providing a computational environment forworkshops or teaching purposes: participantsare given the URL to invoke the service, andare presented a Jupyter session, in which theyfind the notebooks the presenter/teacher hasprepared. No software installation (other thanhaving a modern web browser) is required forparticipants. • Providing interactive documentation: GivenBinder-compatible specifications, documenta-tion can be presented as an executable note-book through myBinder, allowing the personreading the documentation to interactively ex-plore the software’s behavior, for example byrepeatedly modifying and executing commandsprovided as part of the documentation. • Demonstrating and disseminating small com-putational studies: Jupyter notebooks can beused to document computational processes. Forexample, for dissemination or to demonstratereproducibility, as we explain in item 6, below.The related
Voil`a project can execute note- Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 igure 1.
An artistic depiction of a scenario in which a researcher shares her computational workflow withothers in the Jupyter environment, taking advantage of the Binder project. Licensed under “Creative CommonsAttribution Share Alike 4.0 International” – Juliette Taka and Nicolas M. Thi ´ery. Publishing reproducible logbooksexplainer comic strip. Zenodo. DOI: 10.5281/zenodo.4421040 (2018). books (for example on myBinder) and hide allcode cells, making an interactive dashboard todisplay and explore data without the source code.
Reproducibility of scientific results is a cor-nerstone of our interpretation of science: onlyresults that can be reproduced are accepted asproven insight. We see an emerging trend thatjournals and research councils increasingly (andjustifiably) ask for details on how published re-sults can be reproduced, or at least expect authorsto provide that information if a reader requests.It is often impossible to truly documentan entire computational workflow, software re-quirements, hardware used, and other parameterswithin a conventional manuscript submission. TheJupyter-based research environment can help be-cause it makes the process of publishing repro- ducible computational results easily achievable: • The “one document – one study” model au-tomatically records all parameters, process-ing commands, and outputs, demonstrating theprocess leading to the result obtained withthat notebook. By sharing the notebooks ina public repository, a DOI can be assignedvia Zenodo to preserve the repository’s contentpermanently and make it citable. • Notebooks that create central figures and state-ments of publications will likely need under-lying libraries. To re-execute the notebook, weneed a way to specify a computational envi-ronment containing these libraries and Binderprovides that possibility. Although specifyingexact versions of underlying libraries is rec-ommended, Binder does not guarantee thatthis would lead to the same computationalenvironment at any point in the future, and Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 igure 2.
An artistic illustration of a configurable JupyterHub where a lecturer provides a customized softwareenvironment to support their teaching. JupyterHub can be accessed and used through a web browser and doesnot require local installation of any software. Institutional computing and storage resources are used, and usershave to authenticate themselves. Licensed under “Creative Commons Attribution Share Alike 4.0 International”– Juliette Taka and Nicolas M. Thi ´ery. On demand customizable Virtual Environments with JupyterHub explainercomic strip. Zenodo. DOI: 10.5281/zenodo.4432267 (2019). therefore, it cannot entirely address the issue ofso-called software collapse where the underly-ing libraries and interfaces become deprecated,compilers and compiler optimization methodschange, etc. • By publishing the notebooks reproducing cen-tral results together with software environmentspecifications for Binder in an open repository,anyone with Internet access and a browser caninspect and re-execute these notebooks andthus reproduce the publication.A key benefit of being able to reproduce apublication in this way is that the study canbe modified and extended easily: reproducibilityenables re-usability . This can provide efficiencygains for science overall as it allows scientists tofocus on new insights rather than having to spend time re-creating known knowledge as a startingpoint of their new study.
7. Remote access to institutional computeresources – JupyterHub
The discussion above assumed that notebookswere running on the user’s computer. The Jupyter-Hub software allows institutional provision ofJupyter Notebook services. It allows users ofan institution to authenticate with their organi-zational credentials and access a Jupyter environ-ment running on the institution’s infrastructure.Typically, any files and folders the user is allowedto access will also be made available to themthrough JupyterHub, including access to shareddata and folders where they can save their note-books.The institution generally predefines the soft- Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 are environment in which the notebook serverexecutes. However, the technology is available touse the software specification as for Binder tocreate a customized computing environment on-demand. A vital point of the user experience isthat only a web browser is required to access theJupyterHub and to carry out computational workusing these resources remotely. Figure 2 showsan artistic illustration of the scenario where aninstructor works with their institution to providestudents with a customized software environment.Other use cases of JupyterHub installations in-clude research facilities and universities provid-ing access to their (high-performance) computingresources through Jupyter notebooks, where tra-ditionally ssh or remote desktops may have beenused.
8. Blending script and GUI-driven explorationmethods
The
IPyWidgets
Jupyter extension pro-vides selection menus, sliders, radio buttons, andother GUI-like graphical interaction widgets toJupyter notebooks. The Notebook allows embed-ding such graphical widgets inside the notebook,and users can combine the usual scripted analysiswith activation of such widgets where desired.They can be used, for example, to vary theinput parameter values and explore a data set orcomputational results. Although less reproduciblethan typed commands, widgets can be useful forrapid feedback on different possibilities.
9. Potential disadvantages
Above we focused on the features and ca-pabilities of the Jupyter research environmentto support computational workflows in science.Here, we want to discuss some downsides thathave come up either in our work or as feedbackfrom users of Jupyter-based computational toolswe developed. (a) Undefined notebook state
Top-to-bottom arrangement of cells in a notebookimplies that they should be executed in that order.One of the Jupyter Notebook’s key features isthat the code cells can be executed in an arbitraryorder – the user can select (and modify) any celland then execute it. This can be useful whileexploring a data set or a property of computation, or even to debug the cell’s code. The executionorder used in a notebook is not stored whenthe notebook is saved. Therefore, it is critical toremember that, by executing cells out of order,we may create different results from when weexecute all cells in order.There is a practical solution to this. When theexploratory phase is completed, the best practiceis to restart the kernel to ensure the notebook’sstate is forgotten and then execute all cells fromtop to bottom. This ensures that the results in thenotebook are obtained from running the cells inorder, and this version of the notebook should besaved and shared. (b) Opening Jupyter Notebook
Among the feedback we receive from some userswho come across Jupyter notebooks for the firsttime is that the way a Jupyter server is startedis “strange”. Users who are not used to thecommand prompt may find it unusual to open anapplication that way, instead of “double-clicking”. (c) Rapid development of Jupyter ecosystem
Improvements to Project Jupyter and the sur-rounding software ecosystem appear at a rapidrate. For instance, for the issues described initems (a) and (b), contributions providing solu-tions have already emerged, and there is no spacehere to introduce more of the multitude of high-productivity tools that have been created. It ischallenging to follow all the developments andfind the most appropriate tool for a given task.Conferences such as JupyterCon help disseminatenew contributions and help to avoid duplicationof development efforts. (d) Sustainability of myBinder.org
Since 2016 (and at time of writing), a federationof Binder instances is operated as a service avail-able on the world wide web at mybinder.org .The federation is operated by the Jupyter team, incollaboration with the Turing Institute and GESIS(Leibniz Institute for Social Sciences). Comput-ing resources are sponsored by Google Cloud,OVHCloud, the Turing Institute, and GESIS. Thefederation serves approximately 25,000 Binderinstances on a typical weekday, with the GoogleCloud instance serving approximately 70% ofthis traffic. These sponsorships are mostly re-newed annually and can result in members ofthe federation halting the operation due to periods Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 ithout funding. We hope that the sustainabilityof the Binder federation will improve if morefinancially-stable members join, for example, asa part of the European Open Science Cloudinitiative.
Case studies
Computational magnetism
Computational magnetism complements the-oretical and experimental methods to supportresearch in magnetism. For example, it is usedto develop sensors as well as data storage andinformation processing devices. It is used bothin academia and industry to explain experimentalobservations, design experiments, improve deviceand product-designs virtually, and verify theoret-ical predictions.The Object-Oriented MicroMagnetic Frame-work (OOMMF) [5] is a micromagnetic simula-tion tool, initially developed during the 1990s atthe National Institute of Standards and Technol-ogy (NIST). It solves non-linear time-dependentpartial differential equations using the finite-difference method. It is probably the most widelyused and most trusted simulation tool in the com-putational magnetism community. It was writtenin C++, wrapped with Tcl, and driven throughconfiguration files that follow the Tcl syntax.The typical computational workflow the usermust follow to simulate a particular problem isto write a configuration file. After that, the userruns OOMMF by providing the configuration fileto the OOMMF executable. When the OOMMFrun is complete, results are saved in OOMMF-specific file formats. Finally, the user analyzes theresult files.One of the specific goals of a computationalmicromagnetic study is parameter-space explo-ration. More precisely, the user repeats the simu-lation for different values of input parameters bychanging them in the configuration file. It is oftendifficult to automate this, and it is challenging forthe user to keep a log of all steps performed in theentire micromagnetic study. Besides, postprocess-ing and analysis of results is performed outsideOOMMF, using techniques and scripts that aremostly developed by the user, or carried out man-ually. Consequently, it is hard to track, record, andconvey the exact simulation procedure. Without this information, resulting publications are gener-ally not reproducible.To address this situation, we developed aPython interface to the OOMMF executable. Thisallows us to conduct computational magnetismsimulations from within the Jupyter notebook tocapitalize on the benefits of this environment.We developed a set of Python libraries werefer to as Ubermag, which expose the compu-tational capabilities of OOMMF so that it can becontrolled from Python. These Python librariesprovide a domain-specific language to define amicromagnetic problem [1]. A micromagneticmodel, defined using the domain-specific lan-guage, is not aware of the particular simulationtool that will perform the actual micromagneticsimulation, and it is only used to describe themodel. When a simulation is required, the modelis translated into the OOMMF configuration file,the OOMMF executable is called, and the outputfiles are read. By exposing the micromagneticsimulation capabilities to Python and drivingthe research from Jupyter Notebook, we haveavailable all the benefits of the Jupyter researchenvironment.To demonstrate the use of Ubermag, we usestandard problem 3 as an example. Standardproblem 3 is a standardized problem posed by themicromagnetic community to test, validate, andcompare different simulation tools. It describesa magnetic cube of edge length L with twodifferent magnetization states that can occur aslocal energy minima, called the flower state andthe vortex state. The main question of standardproblem 3 is “For what edge length L havethe flower state and the vortex state the sameenergy?”In the conventional OOMMF workflow, it isnecessary to run the micromagnetic simulationsfor different edge lengths and different initialmagnetization states. After every simulation, thetotal energy is recorded and saved within a tab-separated data file. Finally, one extracts the mag-netic energy values from all the saved files andplots them as a function of edge length for bothmagnetization states. From the plot, an estimationof the energy crossing would be made.By using our Python interface to OOMMFintegrated into a Jupyter notebook, we can loopover different input parameters to obtain this Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 = 8.0, m_init_vortex Running OOMMF ... (2.2 s)L= 8.0, m_init_flower Running OOMMF ... (1.1 s) L= 8.25, m_init_vortex Running OOMMF ... (1.8 s)L= 8.25, m_init_flower Running OOMMF ... (1.1 s)L= 8.5, m_init_vortex Running OOMMF ... (1.7 s) L= 8.5, m_init_flower Running OOMMF ... (1.1 s) L= 8.75, m_init_vortex Running OOMMF ... (1.5 s)L= 8.75, m_init_flower Running OOMMF ... (1.1 s)L= 9.0, m_init_vortex Running OOMMF ... (1.5 s) L= 9.0, m_init_flower Running OOMMF ... (1.1 s) In [6]: L_array = np . linspace(8, 9, 5)vortex_energies, flower_energies = [], [] for L in L_array: vortex = minimise_system_energy(L, m_init_vortex) flower = minimise_system_energy(L, m_init_flower) vortex_energies . append(vortex . table . data . tail(1)['E'][0]) flower_energies . append(flower . table . data . tail(1)['E'][0]) import matplotlib.pyplot as plt plt . figure(figsize = (6, 4)) plt . plot(L_array, vortex_energies, 'o-', label = 'vortex') plt . plot(L_array, flower_energies, 'o-', label = 'flower') plt . xlabel(r'$L (l_{ex}$)') plt . ylabel(r'$E$ (J)') plt . grid() plt . legend(); L= 8.4, m_init_vortex Running OOMMF ... (1.7 s)L= 8.4, m_init_flower Running OOMMF ... (1.1 s)L= 8.6, m_init_vortex Running OOMMF ... (1.6 s) L= 8.6, m_init_flower Running OOMMF ... (1.1 s) L= 8.5, m_init_vortex Running OOMMF ... (1.7 s) L= 8.5, m_init_flower Running OOMMF ... (1.1 s) L= 8.45, m_init_vortex Running OOMMF ... (1.6 s)L= 8.45, m_init_flower Running OOMMF ... (1.1 s) L= 8.425, m_init_vortex Running OOMMF ... (1.8 s) L= 8.425, m_init_flower Running OOMMF ... (1.2 s) L= 8.4375, m_init_vortex Running OOMMF ... (1.6 s) L= 8.4375, m_init_flower Running OOMMF ... (1.2 s) The energy crossing occurs at L = 8.4375*lexIn [7]: from scipy.optimize import bisect def energy_difference(L): vortex = minimise_system_energy(L, m_init_vortex) flower = minimise_system_energy(L, m_init_flower) return (vortex . table . data . tail(1)['E'][0] - flower . table . data . tail(1)['E'][0]) cross_section = bisect(energy_difference, 8.4, 8.6, xtol = Figure 3.
Running computational magnetism simu-lations through Python in a Jupyter notebook allowsthe use of the Python scientific stack and results in aself-contained record combining narrative, code, andresults. crossing in a plot. Furthermore, we can make useof the Python scientific stack, in particular, a root-finding method such as bisect from scipy .A Jupyter notebook solving standard problem 3can be found in the repository accompanying thiswork [M. Beg et al.
Using Jupyter for repro-ducible scientific workflows. GitHub: https://github.com/marijanbeg/2021-paper-jupyter-reproducible-workflows, DOI: 10.5281/zenodo.4382225(2021)]. We show the two most relevant codecells inside the Jupyter notebook in Figure 3.Ubermag and the Jupyter environment sim-plify the efforts to make computational mag-netism publications reproducible. For each figurein the publication, one notebook can be provided(find examples in Refs. [6], [7]). Using Binder,the community can inspect and re-run all thecalculations in the cloud and make the publicationreproducible.
Computational studies in mathematicalresearch
Many of the leading open-source mathemat-ical software systems (including GAP, LinBox,PARI/GP, OSCAR, SageMath, and Singular) havebeen made inter-operable with the Jupyter ecosys-tem through bespoke or general-purpose kernels(C++, Python, Julia, . . . ). Focusing on one ofthese systems, for the sake of concreteness, we il-lustrate how this supports sharing and publishingreproducible computational studies in mathemati-cal research together with the underlying researchcode.GAP is an open-source system for discretecomputational algebra, with particular empha-sis on computational group theory. It is usedroutinely by mathematicians in these fields andbeyond to support teaching and research, notablythrough computational exploration. It provides adomain-specific language, also called GAP, anda runtime system with a command-line interface.It can also be used as a library by other systemssuch as SageMath or OSCAR.GAP has been developed for decades by acommunity of researchers, teachers, and researchsoftware engineers. It has an established mecha-nism for user-contributed extensions, called pack-ages , which may be submitted for the redistri-bution with the system, and a formal refereeingprocess. The current release of GAP (4.11.0) Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 ncludes 152 packages that serve different pur-poses, from providing data libraries and extendingthe system’s infrastructure for testing and writ-ing documentation, to adding new functionalityand sharing research codes that underpin theirauthors’ publications. The latter scenario mayrequire specific expertise and motivation froma working mathematician who uses GAP, andnot everyone will be able to invest efforts insharing their code in this way. Furthermore, itis not always justifiable to organize a supplemen-tary code for a paper as a new GAP package.Instead, authors can combine Jupyter researchenvironments with additional services and parts ofthe infrastructure for GAP packages to share re-producible computational studies while followinggood code development practices from the start.Let us illustrate this with the publication inRef. [8], which presents a polynomial-time algo-rithm for solving a major problem in computa-tional group theory, which remained open since1999 [9]. An essential addition to the paper is theauthor’s GAP implementation of the algorithm.The authors published this implementation in thepublicly hosted repository. At once, this ensureslong term archival through the Software Heritageproject, and with a small additional step, it makesit citable through Zenodo. The repository containsan interactive narrative document – a Jupyternotebook using the GAP Jupyter kernel [10] –combining text, mathematics, inputs, and outputs,and may even be viewed as a slideshow (onecould, of course, have separate notebooks fordifferent purposes).Following best practices for organizing re-producible computational studies (see e.g.,Ref. [11]), the code is not written in the notebookitself but loaded from external source files. Theseare text files that can be easily managed withversion control, reused from multiple Jupyternotebooks, and tested using the GAP automatedtesting setup. Also, the authors made the repos-itory
Binder-ready . Any user (e.g., readers orreferees of the paper) can run the notebook andreproduce its execution on Binder itself or – withadditional expertise to install the required assets –on their own computing resource. To achieve this,the authors followed the template in Ref. [12],which also brings in continuous integration toautomatically check the code against several past, current, and development releases of GAP, andproduce coverage reports on how thoroughly thetests exercise the code. It boils down to cre-ating a tst directory with the test files, andadapting the configuration files .travis.yml and .codecov.yml for Travis CI and Codecovservices, respectively.Bringing Jupyter interfaces to command-linebased computational mathematics tools makes itpossible to interface it with numerous JavaScriptlibraries, notably for visualization. For example,GAP packages Francy and JupyterViz extend theGAP Jupyter kernel [10] with interactive widgetsand plotting tools, which can be tried from their
Binder-ready repositories.
Conclusions
In this article, we discuss some of the chal-lenges researchers in computational science andmathematics experience in their everyday work.We focus on making computational explorationand workflows more efficient, more reproducible,and re-usable. We demonstrate the benefits of thisapproach by showing computational magnetismand computational mathematics use cases. Webelieve that Project Jupyter and its ecosystem,including JupyterHub and Binder, which allowno-installation browser-based use of notebooksand remote compute resources, can contribute sig-nificantly towards more efficient computationalworkflows, reproducibility and re-usability in sci-ence. These conclusions are part of a widespreadtrend among researchers in the computationalcommunity advocating for the use of literatecomputing – for example using Jupyter – forenhancing reproducible research.
Acknowledgments
This work was financially supported by theHorizon 2020 European Research Projects Open-DreamKit (676541) and PaNOSC (823852), andthe EPSRC Programme grant on Skyrmionics(EP/N032128/1).
REFERENCES
1. M. Beg, R. A. Pepper, and H. Fangohr, “User interfacesfor computational science: A domain specific languagefor OOMMF embedded in Python”,
AIP Advances ,vol. 7, no. 5, p. 056025, 2017. Copyright (c) 2021 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected] is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MCSE.2021.3052101 . A. H. Larsen, J. J. Mortensen, J. Blomqvist, I. E.Castelli, R. Christensen, M. Dułak, J. Friis, M. N.Groves, B. Hammer, C. Hargus, E. D. Hermes, P. C.Jennings, P. B. Jensen, J. Kermode, J. R. Kitchin, E. L.Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J. B.Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Pe-terson, C. Rostgaard, J. Schiøtz, O. Sch¨utt, M. Strange,K. S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter,Z. Zeng, and K. W. Jacobsen, “The atomic simulationenvironment—a python library for working with atoms”,
Journal of Physics: Condensed Matter , vol. 29, no. 27,p. 273002, 2017.3. T. Kluyver, B. Ragan-Kelley, F. P´erez, B. Granger,M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick,J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla,C. Willing and Jupyter Development Team, “JupyterNotebooks – a publishing format for reproducible com-putational workflows”,
Positioning and Power in Aca-demic Publishing: Players, Agents and Agendas , 87-90,2016.4. Project Jupyter, M. Bussonnier, J. Forde, J. Freeman,B. Granger, T. Head, C. Holdgraf, K. Kelley, G. Nalvarte,A. Osheroff, M. Pacer, Y. Panda, F. Perez, B. Ragan Kel-ley, and C. Willing, “Binder 2.0 - Reproducible, inter-active, shareable environments for science at scale”,
Proceedings of the 17th Python in Science Conference ,pp. 113 – 120, 2018.5. M. J. Donahue and D. G. Porter, “OOMMF User’s Guide,Version 1.0”,
Interagency Report NISTIR 6376 NationalInstitute of Standards and Technology, Gaithersburg,MD , 1999.6. M. Beg, R. A. Pepper, D. Cort´es-Ortu˜no, B. Atie, M.-A. Bisotti, G. Downing, T. Kluyver, O. Hovorka, andH. Fangohr, “Stable and manipulable Bloch point,”
Sci-entific Reports , vol. 9, no. 1, p. 7959, 2019. Coderepository for reproducibility at https://github.com/marijanbeg/2019-paper-bloch-point-stability.7. M. Albert, M. Beg, D. Chernyshenko, M.-A. Bisotti,R. L. Carey, H. Fangohr, and P. J. Metaxas, “Frequency-based nanoparticle sensing over large field ranges us-ing the ferromagnetic resonances of a magnetic nan-odisc”,
Nanotechnology , vol. 27, no. 45, p. 455502,2016. Code repository for reproducibility at https://github.com/maxalbert/paper-supplement-nanoparticle-sensing.8. A. Borovik and S¸ . Yalc¸ınkaya, “Adjoint representationsof black box groups
PSL ( F q ) ,” J. Algebra , vol. 506,pp. 540–591, 2018.9. L. Babai and R. Beals,
A polynomial-time theory of blackbox groups I , ser. London Mathematical Society Lecture Note Series. Cambridge University Press, 1999, vol. 1,pp. 30–64, 1998.10. M. Pfeiffer and M. Martins, “JupyterKernel, Version1.3”, h t t p s : / / g a p - p a c k a g e s . g i t h u b. i o / J u p y te r Ke r n e l / https://gap-packages.github.io/JupyterKernel/ , Feb 2019, GAP package.11. A. Rule, A. Birmingham, C. Zuniga, I. Altintas, S.-C.Huang, R. Knight, N. Moshiri, M. H. Nguyen, S. B.Rosenthal, F. P´erez, and P. W. Rose, “Ten simplerules for writing and sharing computational analysesin Jupyter notebooks”,
PLOS Computational Biology ,vol. 15, no. 7, pp. 1–8, 2019.12. A. Konovalov, “Template for publishing reproducibleGAP experiments in Jupyter notebooks runnable onBinder,” Feb 2020. [Online]. Available: https://doi.org/10.5281/zenodo.366215511