[PDF] Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

Abstract

The increasing availability of cloud computing services for science has changed the way scientific code can be developed, deployed, and run. Many modern scientific workflows are capable of running on cloud computing resources. Consequently, there is an increasing interest in the scientific computing community in methods, tools, and implementations that enable moving an application to the cloud and simplifying the process, and decreasing the time to meaningful scientific results. In this paper, we have applied the concepts of containerization for portability and multi-cloud automated deployment with industry-standard tools to three scientific workflows. We show how our implementations provide reduced complexity to portability of both the applications themselves, and their deployment across private and public clouds. Each application has been packaged in a Docker container with its dependencies and necessary environment setup for production runs. Terraform and Ansible have been used to automate the provisioning of compute resources and the deployment of each scientific application in a Multi-VM cluster. Each application has been deployed on the AWS and Aristotle Cloud Federation platforms. Variation in data management constraints, Multi-VM MPI communication, and embarrassingly parallel instance deployments were all explored and reported on. We thus present a sample of scientific workflows that can be simplified using the tools and our proposed implementation to deploy and run in a variety of cloud environments.

Full PDF

aa r X i v : . [ c s . D C ] J un Reproducible and Portable Workflows for Scientific Computing and HPC inthe Cloud

PETER VAILLANCOURT,

Cornell University, USA

BENNETT WINEHOLT,

Cornell University, USA

BRANDON BARKER,

Cornell University, USA

PLATO DELIYANNIS ∗ , Cornell University, USA

JACKIE ZHENG ∗ , Cornell University, USA

AKSHAY SURESH,

Cornell University, USA

ADAM BRAZIER,

Cornell University, USA

RICH KNEPPER,

Cornell University, USA

RICH WOLSKI,

University of California, Santa Barbara, USA

The increasing availability of cloud computing services for science has changed the way scientiﬁc code can be developed, deployed,and run. Many modern scientiﬁc workﬂows are capable of running on cloud computing resources. Consequently, there is an increasinginterest in the scientiﬁc computing community in methods, tools, and implementations that enable moving an application to the cloudand simplifying the process, and decreasing the time to meaningful scientiﬁc results. In this paper, we have applied the concepts ofcontainerization for portability and multi-cloud automated deployment with industry-standard tools to three scientiﬁc workﬂows. Weshow how our implementations provide reduced complexity to portability of both the applications themselves, and their deploymentacross private and public clouds. Each application has been packaged in a Docker container with its dependencies and necessaryenvironment setup for production runs. Terraform and Ansible have been used to automate the provisioning of compute resourcesand the deployment of each scientiﬁc application in a Multi-VM cluster. Each application has been deployed on the AWS and AristotleCloud Federation platforms. Variation in data management constraints, Multi-VM MPI communication, and embarrassingly parallelinstance deployments were all explored and reported on. We thus present a sample of scientiﬁc workﬂows that can be simpliﬁedusing the tools and our proposed implementation to deploy and run in a variety of cloud environments.CCS Concepts: •

Applied computing → Astronomy ; Earth and atmospheric sciences ; Environmental sciences ; •

Computing method-ologies → Distributed computing methodologies ; •

General and reference → Evaluation ; •

Software and its engineering → Cloud computing ; •

Computer systems organization → Cloud computing .Additional Key Words and Phrases: Cloud, Scientiﬁc Computing, HPC, Automated Deployment, Docker Containers, Terraform, An-sible, Multi-VM MPI

ACM Reference Format:

Peter Vaillancourt, Bennett Wineholt, Brandon Barker, Plato Deliyannis, Jackie Zheng, Akshay Suresh, Adam Brazier, Rich Knep-per, and Rich Wolski. 2020. Reproducible and Portable Workﬂows for Scientiﬁc Computing and HPC in the Cloud. In

Practice and ∗ REU Student at Cornell University Center for Advanced ComputingPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers orto redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from [email protected].© 2020 Association for Computing Machinery.Manuscript submitted to ACM 1

EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al.

Experience in Advanced Research Computing (PEARC ’20), July 26–30, 2020, Portland, OR, USA.

ACM, New York, NY, USA, 16 pages.https://doi.org/10.1145/3311790.3396659

Scientiﬁc computing applications often make use of large-scale, high-performance resources (computers, networking,and storage, such as those provided by XSEDE) to achieve “capability” results – new scientiﬁc results that are madepossible by the capability of the resources. Because these resources are expensive to provision and maintain, theyare often deployed in a bespoke conﬁguration that requires highly optimized coding, data management, and accessmethodologies to ensure maximal utilization.However, there are a number of scientiﬁc workloads that innovate through other forms of computational scientiﬁcexploration and discovery. Speciﬁcally, researchers investigating new algorithms or developing new computationallysupported processes often require fast turn-around times (to support rapid prototyping), resource portability (to enablecollaboration), and maximal developer productivity.Cloud computing has evolved as a commercial approach to meeting these goals for consumer-facing services. Often,cloud applications have life-cycles measured in days or weeks that are developed by a geographically distributed setof collaborating developers whose labor cost has a considerable budgetary limitation. Because cloud computing isoptimized for web services, however, it has proven diﬃcult to exploit for scientiﬁc workloads (even those driven bydeveloper productivity and not resource capability). In particular, the life-cycle of many scientiﬁc codes is long, creatinga valuable legacy that cannot easily be supplanted by new development.In this paper, we explore the use of cloud computing, Linux containers [9, 14, 28], and an automated deploymentscheme as productivity enhancing technologies for scientiﬁc applications that include or are based on legacy software.In particular, for many researchers, the ease of implementing and running software in multiple cloud environmentsbecomes a key element of leveraging the ﬂexibility and eﬃciency of the cloud computing paradigm. As a result, porta-bility and reproducibility of application installation, deployment, and decommissioning (i.e. the reproducibility of thesoftware life-cycle) becomes critical .Containerization software, which provides application software a lightweight virtualized environment to run in,has recently become a popular strategy for deploying and running scientiﬁc software, portability across diﬀerenttypes of systems, and ease of adoption for researchers[11]. Further, software containers coupled with partially or fullyautomated cloud deployment schemes oﬀer intriguing beneﬁts for a wide range of computational tasks in scientiﬁcresearch, in the form of robust, scalable, and portable software deployments that can be used during developmentthrough production[23].This paper will describe work successfully performed to encapsulate, deploy, and run three diﬀerent existing scien-tiﬁc workﬂows – which are broadly representative of common computational science applications – in multiple cloudsusing automated containerized deployment. Our system automatically • manages the myriad of diﬀerent possible deployment options available from computing clouds, • conﬁgures the cloud-hosted networking to support virtualized parallel application execution, and • translates the legacy build and deployment mechanisms that accompany many application (e.g. from a clusteror batch HPC environment) to the equivalent mechanisms in the cloud. Note that this deﬁnition of “reproducibility” refers to the reproducibility of the software as a capability available to its user or users and not to numericalreproducibility across heterogeneous hardware platforms. 2 eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA

We make use of Docker containerization technology to provide portability and reproducibility, and Terraform[21] andAnsible[22] to deploy, manage, and provision cloud resources automatically. In the following sections we describeeach of the scientiﬁc workﬂows in detail, their data and computational requirements, the particular technical detailsfor containerized implementation, how the choice of deployment context aﬀected the implementation, an evaluationof software runs performed, and discuss the practical outcomes of the experience, including the beneﬁts and disad-vantages of this approach. Each of these workﬂows was run on Amazon Web Services (AWS)[2], and Aristotle CloudFederation[27].The Aristotle Cloud Federation is an NSF-funded project between the Cornell University Center for AdvancedComputing, University at Buﬀalo Center for Computational Research, and the University of California Santa BarbaraDepartment of Computer Science, with the goal of joining cloud computing resources at each of these institutionsin order to develop a federated model for science users to easily access data, scale research problems using cloudcomputing techniques, and lessen the time to science of research teams. The federated model allows resources to beshared between the Aristotle member clouds, including individual data sets, access to specialized software, and accessto site-speciﬁc resources. By leveraging the strengths of each of the member institutions, the overall cloud is able toprovide larger overall scale and more resources than each of those institutions separately. Use cases described beloware largely the result of collaborations between the Aristotle Science Team members and the Infrastructure groupwhich drove the requirements for containerized applications.

The following scientiﬁc workﬂows (selected from Aristotle Cloud Federation Science Use Cases [5]) represent a broadrange of scientiﬁc disciplines. Each case represents a user community that seeks the potential productivity gains oﬀeredby cloud computing. At the same time, these three examples cover some of the common challenges encountered whenmoving scientiﬁc code to the cloud. In section 2.1, we discuss a message passing interface (MPI) application, called

Lake_Problem_DPS , used in environmental science research that typically utilizes multiple nodes with low amountsof MPI communication. Section 2.2 is a commonly used application in HPC for atmospheric sciences called

WRF , whichutilizes higher levels of MPI communication. Our ﬁnal workﬂow in 2.3 does not require MPI or even communicationbetween nodes, but instead requires high data throughput for processing large radio astronomy datasets.

Lake_Problem_DPS

In environmental science, complex systems are studied computationally using the Many-Objective Robust DecisionMaking (MORDM) framework, which enables understanding when decisions must be made while these systems arechanging[26]. There is a classic problem – called the shallow lake problem – where a town with a lake must makepolicy decisions about pollution that will impact the lake’s water quality as well as the town’s economy[10]. JulianneQuinn et. al. demonstrated the

Lake_Problem_DPS software[24], based on the MORDM framework, in solving thisproblem using Direct Policy Search (DPS)[43] and intertemporal open loop control[41].The software was originally run on an HPC cluster, and utilizes low amounts of MPI communication throughout therun. There are no external input data requirements to verify functionality, so the Aristotle Cloud Federation ScienceTeam was able to begin with a fork of the

Lake_Problem_DPS software repository to containerize, deploy, run, andevaluate the software in the cloud environment[25]. To eﬀect an automated cloud deployment, our team translated EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. the legacy cluster submission scripts from PBS to Python, and added the environment initialization to the container-ization step. In order to reproduce the results of Quinn et. al., we ran the DPS and intertemporal optimization routines,performed a re-evaluation, and then generated the ﬁgures for comparison to those generated by an unmodiﬁed run.

WRF

The weather forecasting community has historically valued large data sets for predictive power, and utilized analysesof past similar situations to both inﬁll data and make future projections. The computational Weather Research andForecasting (WRF) Model[46] is popular for weather simulation with a long history of development and use by theNational Center for Atmospheric Research (NCAR), contributors, and consumers. The software is widely used by acommunity of more than 48,000 researchers across 160 countries to produce a wide variety of results ranging from con-tributions to real-time weather prediction, long term climate simulations, large-scale low resolution idealized physicssimulations, and small-scale high resolution detailed physics simulations leveraging large quantities of observationalgrounding data as model inputs.

Numerical computation, data input, and data output can all grow large very quicklywhen simulating detailed physics at high grid resolutions or long timescales. The communication of intermediateresults at grid boundaries – necessary to advance simulation steps at suﬃcient accuracy – can also place a burden onnetwork capacity. Therefore, to achieve desired modeling ﬁdelity, WRF is thus commonly run on resources with anabundance of computational capacity, disk storage, and network throughput. Common technologies used to meet theseneeds include managed compute cluster resources with provided Fortran compilation guides and packages to facilitateeﬃcient numerical simulation. Network communication is facilitated by MPI libraries which may be optimized forlow latency use of specialized network hardware. Disk storage may be fulﬁlled by high capacity Lustre distributed ﬁlesystem hosting.The speciﬁc WRF model we choose requires parallel execution across compute resources to allow for faster andmore detailed numerical grid simulation of weather properties, namely of interest to simulating wind speed near windturbine farms at high spatial and temporal resolution. Useful simulation data for climate observations include windspeed and temperature, as well as dependent measures such as estimated wind turbine power production. In order toobtain these measures in a reasonable timeframe, it is necessary to leverage large computational resources to quicklyand accurately simulate many numerical values over grids of varying density, with associated network communicationat tile boundaries and large demand for disk storage both for tile boundary grounding conditions derived from datainput as well as intermediate result storage. Consequently, this scientiﬁc workﬂow represents an example of resource-intensive HPC applications and the challenges they present to eﬀective cloud deployment.NCAR provides many public data sets, analysis tools, and regression suites suitable for conﬁrming the validity of thenumerical simulations produced by the model. We use these regression tests to validate the correctness of our cloudWRF executions.

A sample WRF version 4.0 run using an NCAR regression test Docker build wasrun using 1.3GB of novel weather Global Forecast System data published by NCEP. Geographic reference data for griddomain preprocessing and high resolution physics totaled 30GB. This sample was run inside a Docker container on a4 virtual CPU AWS instance. The observed runtime was 9 minutes 20 seconds and should be scalable to moderatelylarger data sets of similar nature. Similar past simulation runs executing private builds and data have leveraged Dockerto execute for long time periods on Aristotle Cloud Federation and XSEDE Jetstream cloud resources. eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA FRB_pipeline

Fast Radio Bursts (FRBs) are astrophysical phenomena that occur as transient high energy pulses or bursts in radioastronomy data. FRBs are expected to occur thousands of times per day, but conﬁrmed detections of unique sources arebelow a hundred[13] since the ﬁrst recorded detection occurred in 2007[31]. Since radio telescopes are on the earth’ssurface, radio astronomy data is plagued by large amounts of Radio Frequency Interference (RFI), which can block ordistort signals, making transient signals like FRBs even harder to detect, despite large quantities of data available tosearch.

A standard data presentation in time-domain radio observations is the dynamic spectrum, a plotof intensity vs. time and frequency. Typically, the time of arrival of an astrophysical radio transient is longer at lowerfrequencies due to dispersion by plasma in the interstellar medium. In particular, the time of arrival is proportional tothe inverse square of frequency and the dispersion measure (DM), a constant equal to the integral of electron densityin the interstellar medium along the line of sight. Thus, in a dynamic spectrum, radio transients exhibit a characteristicquadratic shape. A number of existing transient search techniques are based on de-dispersing dynamic spectra usinga bank of several plausible DMs, ﬂattening into a time series, and performing matched ﬁltering with the time series.These methods are well-tested and have been reliable in discovering new FRBs in the past decade.Exploring new detection techniques that have the potential to oﬀer advantages in accuracy or computation cost isalso of great interest to astronomers. In the past three years, researchers have conducted successful transient searchesby applying multi-layer convolutional neural networks to dynamic spectra [12][1]. The "Friends-Of-Friends" (FOF)algorithm is a straightforward and eﬃcient way to locate radio transient candidates by identifying and characterizingclusters of high signal pixels in the dynamic spectrum directly.

For this scientiﬁc workﬂow, the software is called the

FRB_pipeline [38] and was developed by theAristotle Cloud Federation Science Team in collaboration with Cornell researchers established within the radio astron-omy community. Thus, the software was designed with the computational needs of the science in mind, as well as theﬂexibility to expand as new methods of radio transient detection develop. The

FRB_pipeline is a customizable scien-tiﬁc software package written in Python 3 designed to simplify the process of combing large datasets – from any of avariety of radio telescopes – to detect FRBs. The package enables ﬂexible use of established methods to ﬁlter RFI, detectcandidates, and determine viability of candidates as well as the availability of new methods, or even addition of cus-tomized methods by the user. A commonly used package within radio astronomy, called PRESTO[42], is a dependency,and newer methods such as our FOF algorithm are included as well. (1) Average the raw dynamic spectrum(2) Compute the root mean square (RMS) background white noise, using an iterative method that discards outlierpixels until a convergence threshold is reached(3) Mark each pixel with signal greater than a constant parameter m times the RMS background noise(4) Group the high-signal pixels marked in (2) in close proximity – deﬁned as being within a constant number oftime bins and constant number of frequency bins (parameters) – together to form clusters, keeping those witha total intensity higher a given threshold m • N - number of pixels EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. • Cluster signal-to-noise ratio (SNR) - mean pixel SNR × N • Signal Mean/Max - mean/maximum pixel intensity • Pixel SNR Mean/Max - signal mean/maximum divided by RMS background noise • Time Start/End Bin - beginning/end in time domain • Frequency Start/End Bin - beginning/end in freq. domain • Slope - orthogonal distance regression linear best ﬁt slope • DM - physical dispersion measure, from quadratic best ﬁt with orthogonal distance regression(6) Using either the linear ﬁt or the quadratic ﬁt, group the clusters, and extrapolate each cluster across the entiredynamic spectrum to form "superclusters"(7) Output: a text ﬁle containing a list of candidates (clusters) and their metrics that can be sorted by any statistic,and a plot of each section of dynamic spectrum with clusters highlighted(8) Plot the top candidates

Smoothing and averaging reduce the variation in intensity between adjacent pixels,which is essential for FOF to work properly. Additionally, averaging reduces the size of the data by a signiﬁcant factor(typically order 100), vastly reducing the computation time of FOF and other search algorithms. However, smoothingdoes take signiﬁcant computational time. The computational complexity of pure averaging is m × n × k , with n , m thenumber of time and frequency bins respectively in the dynamic spectrum, and k the number of pixels averaged together.For example, if smoothing with a 2-D Gaussian ﬁlter followed by decimation is used, a computational complexity of nm × log ( n ) log ( m ) is achieved using fast FFT based convolution. While smoothing has a signiﬁcant cost, savings maderunning FOF on data whose size is multiple orders smaller than the raw data are essential. The FOF algorithm itself hasa computational complexity of m × n , with n the number of time bins and m the number of frequency bins in the dynamicspectrum. Both computing the RMS background noise and comparing each pixel’s signal to the ﬁrst threshold for thebulk of computation time, while the two least squares regressions computed for each cluster are relatively insigniﬁcant. In contrast to techniques using de-dispersion and matched ﬁltering, FOF is com-pletely agnostic to signal shape. On one hand, this means that FOF will have no trouble identifying astrophysical signalswith any

DMs, but also that FOF is vulnerable to a high rate of false positives due to RFI. Additionally, because the DMof a given signal is unknown a priori, de-dispersion and matched ﬁltering must be performed on a large set of trialvalues, which entails a computational complexity equal to the number of trial DMs times the n log ( n ) time of matchedﬁltering. In many cases, FOF will be faster than these methods. For example, for the analysis in which FRB121101, 5016trial DMs between 0 and 2038pccm − (twice the expected maximum galactic DM) were used by Spitler et al. [44], whileFOF is eﬀective on the same dataset averaged to only 100 frequency bins. The Breakthrough Listen (BL) project is a comprehensive search for extraterrestrial intelligence usingradio and optical telescopes. The BL target list includes nearby stars and galaxies, as well as other peculiar astrophysicalsources broadly termed âĂĲexoticaâĂİ[29]. As part of the latter category, BL observed the ﬁrst discovered repeatingfast radio burst, FRB 121102, for 5 hours at 4 - 8 GHz using the Robert C. Byrd Green Bank Telescope. Using the GPU-optimized software

HEIMDALL [4] to perform dedispersion and matched ﬁltering, Gajjar et al.[19] detected 21 FRBswithin the ﬁrst hour of observation. Zhang et al.[47] subsequently applied supervised machine learning on the samedata to identify 93 pulses with <2% false positive rate. The large number of FRB 121102 pulses detected in this BL eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA observation, together with the completeness of FRB detections by Zhang et al. render these data as ideal testbeds forevaluating the performance of our FOF algorithm.Furthermore, the dataset containing the detected FRBs is publicly available[6][7], with suﬃciently large ﬁle sizesto demonstrate the power and ﬂexibility of the cloud to scale deployments to meet data processing needs. Since thereare large amounts of data to process, with a variety of possible processing methods, this scientiﬁc workﬂow lendsitself well to an embarrassingly parallel implementation. When deployed to a cluster, the software does not requirecommunication such as MPI, but it does require careful data management. Due to the aforementioned diﬀerences in the scientiﬁc workﬂows, not all of the details of implementation are the same.However, the core steps of the process were the same, and we detail where they diﬀered in the coming sections. Ingeneral, for each workﬂow: • A Docker container was built to enable portability and reproducibility • Data was stored in locations easily accessible to cloud computing virtual machines (VMs) • Compute VMs and communication networks were deployed with Terraform • Secure shell (ssh) keys for communication between VMs were conﬁgured with Ansible • Application containers and associated setup were deployed to multiple cloud VMs using Ansible • Output data was staged on remote storage for user retrieval • All compute resources were decommissioned using Terraform to curtail ongoing costAll of these applications are non-interactive and batch-style, utilizing a single container image for each node (orVM). Therefore, they do not require much or any orchestration of interacting services. The utilized data storage wascommonly simple object storage such as Amazon S3[3] or a deployed NFS server on an additional cloud VM when aﬁle system mount was needed by the application. These compute runs were small-scale (relative to large jobs on anHPC system) and intended as proof-of-concept for the applications. Therefore, no more than 3 maximum size VMswere provisioned per cluster in our testing runs performed for this work. Other deployments have been made upto 8 or more VMs, but a thorough study of maximum eﬀective cluster size or the point of decreasing gains at scaledue to communication overhead is deferred to future work. Reproducibility was gauged by application completion onbasic testing scenarios, and was not thoroughly evaluated with an eye to numerical precision errors nor subtle kerneldiﬀerences that Docker cannot eliminate, being bound to run on the VM kernel as deployed.

Containers are OS-level virtualization: a single OS kernel can host multiple isolated environments. Containers arelighter weight than using hypervisor-based VMs wherein each VM runs a separate kernel, but comes with the obviousrestriction that all containers on the host must use the same OS kernel and the same version of the kernel. But in manycases, this has been a more than acceptable trade-oﬀ. While container technology originated from Solaris Zones[40]and FreeBSD Jails[39], the prevalence of Linux on commodity cloud hardware has synergistically aided in creating aconvergence of Linux containerization technology.Docker has not only been responsible for both the popularity of Linux containers in the industry, it has seeminglyexpanded the deﬁnition of a container. The colloquial deﬁnition now includes the ability to distribute and deployapplications with minimal conﬁguration: i.e., everything is self-contained within the container. Singularity, another EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al.

Linux container technology, targeted HPC users by trading security concerns: Singularity requires the user to grantan application container access to all of the user’s ﬁles rather than running a container virtualization service as aprivileged user, as is the case with Docker. In so doing, many of the traditional notions of a container were furtherbroken down, though Singularity has optional parameters to enable isolation [45].Other container technologies exist for Linux as well, including that which is provided by a core service of mostLinux distributions, systemd[18]. However, systemd-nspawn containers typically do not include the more modernconnotation of a container being a packaged application. But, it has been used by other technologies for this purpose,such as nix-containers [17], a container technology for the NixOS [8, 15, 16] Linux distribution that allows packages tobe shared from the host’s package-store. While we have not yet used Nix containers in this work, we have containerizedNix within Docker, which aﬀords its own advantages.Nix provides a high degree of reproducibility due to package deﬁnitions being carefully check-summed for anysources of diﬀerences, e.g.: URL change of binary or source blobs used for the package, checksum diﬀerences in thebinary or source blobs, version changes, conﬁguration changes, semantic changes in the package deﬁnition (i.e. Nixexpression) — such as build or runtime conﬁguration, or any such changes in dependencies of the packages. Once a nixexpression is written, it can then be shared for use within other Nix projects, without the need to worry about how tointegrate it into a container deﬁnition ﬁle. Additionally, by using Nix, the environment could easily be run bare-metalon NixOS or as a Nix container in the future.

For this work, specialized Docker containers were developed for the

Lake_Problem_DPS and

FRB_pipeline applications, but NCAR WRF has a publicly available Docker Container for WRF including a regres-sion test[35]. The

Lake_Problem_DPS container makes full use of Nix within Docker to ensure reproducibility, evenusing a Nix expression to simplify the process of including proprietary software within the container without publiclysharing the software in a public GitHub repository[25]. The

FRB_pipeline container was based upon a container de-veloped for North American Nanohertz Observatory for Gravitational Waves (NANOGrav)[33][34], but updated forour work[37][36].

Terraform is an open source tool used for infrastructure management and provisioning developed by HashiCorp. Thecommon use case for Terraform is managing resources on multiple cloud infrastructure providers – such as AWS andGoogle Cloud – with minimal diﬀerences in scripts. Terraform is under active development and supports a varietyof providers including AWS, Google Cloud Platform (GCP)[30], Microsoft Azure[32] , and OpenStack infrastructureproviders which includes XSEDE Jetstream, Aristotle Cloud Federation, and the Cornell CAC Red Cloud on-premisehosting environments. Terraform uses the HashiCorp Conﬁguration Language (HCL)[20] to automate the deploymentof various cloud resources among diﬀerent cloud vendors. Terraform does this in a declarative manner, meaning thatcloud resource states are written in a Terraform ﬁle and Terraform attempts to create the declared resource or modifythe resource into the declared state. It manages the existing resource using metadata created from running a Terraformconﬁguration ﬁle.Ansible is an open source tool used for software provisioning, conﬁguration management, and automation. Ansibleuses YAML to write conﬁgurations. Similar to Terraform, Ansible is declarative, though it can also perform operationsprocedurally. An Ansible YAML ﬁle declares states of various Ansible modules which are subsequently set up on the eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA remote machine. Compared to using scripts for conﬁguration management, Ansible can achieve the same things scriptscan on multiple machines or VMs in parallel, which is advantageous to cluster management.We used Terraform for provisioning and infrastructure management, and Ansible for conﬁguration management.Concretely, Terraform was used to create the VMs and networks while Ansible was used to set up Docker applicationcontainers including the associated environments for the science workﬂows to run on VMs and to issue commandsto initiate and control science runs. To create a cluster with communication, Terraform ﬁrst sets up a single base VMwith a custom network conﬁguration. From the network conﬁgurations, the VM can only receive ssh traﬃc from aspeciﬁed IP and TCP connection with another VM on the same network. There is no restriction on how the VM cansend traﬃc. Next, Ansible imports the Docker containers. Afterwards, Terraform creates copies of the VM to form acluster. Finally, Ansible sets up the VMs in parallel for OpenMPI communication.It is important to note that while the Ansible script is portable across diﬀerent cloud infrastructure providers, theTerraform script is not. Ansible requires the IP addresses of the VMs while Terraform resources are dependent on thecloud infrastructure providers. Generally, to use Terraform on the various providers requires some form of credentialsand slight modiﬁcations to the Terraform script. By using simple VM hosting with standard OS images rather thanprovider-speciﬁc services, our infrastructure level scripting is easily portable even though it requires some provisioningcode translation to adapt to diﬀerent underlying cloud providers based on commonly available templates, includingour own new public examples. After provisioning server and network resources, we conﬁgure them using both Ansible scripting and Docker containerdeployment. A simple way to think of the process is that Terraform creates a server we can access, Ansible installslibraries we need (much like user shell commands issued over ssh), and Docker will fetch a particular applicationbundle to be run.For on-demand MPI clusters in particular, Ansible scripts perform some additional cluster level conﬁguration, suchas pushing a list of cluster hosts to each member upon setup. The Docker images we created to run WRF, for example,on the on-demand MPI cluster similarly have cluster level host conﬁguration injected by a scripted build processto lower the burden of manual user setup on each cluster deployment. Ansible can then perform scripted clustertests to verify success of all deployed components and successful networked science code execution. Notably, theentire process can be written as code or templates and run from Terraform commands on one researcher workstation.These concepts are similar to functions provided by another popular technology for cluster deployment – Kubernetes– which specializes in deploying redundant web applications across multiple hosts and networks to provide highavailability and uptime for diverse workloads. However, due to MPI communication requiring long-lived guaranteedhosting and our current focus on a small set of related applications per cluster deployment, neither the additional usercomplexity of scripted Kubernetes setup nor the monetary and researcher familiarity costs associated with hostedKubernetes solutions sacriﬁcing portability are justiﬁed under the stated goals of this work. By using simple andstandard technologies summoned by a small set of scripts, users can easily create on-demand MPI clusters to achievescience results.Default usage of Terraform and Ansible examples will create network resources and servers and then conﬁgurethem to perform useful work, but in some cases advanced recurring communication between server instances pro-vides for a new classes of applications to be hosted for eﬃcient computation. Multiple VMs can be organized intoclusters running message passing interface (MPI) applications to deliver high-performance computations commonly EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. associated with large managed hardware clusters. Although we use cloud providers with commodity network andstorage services as well as hardware level hyperthreading or time sharing in this work, options exist to pay a premiumfor specialized hardware appliances, dedicated hardware, and bare metal native execution with varying amounts ofincreased conﬁguration complexity. Here we defer detailed cost, complexity, and performance analyses to future workand recount our experiences developing and deploying on-demand OpenMPI clusters with real scientiﬁc research ap-plications on basic cloud provider oﬀerings. As noted above, the technologies used – namely Terraform, Ansible, andDocker – are open source or publicly available, and were chosen to facilitate smooth researcher user experience, whichcan be further facilitated by referencing our own published examples.Our primary on-demand MPI application target for cluster deployment is the NCAR WRF Model widely used forweather simulation. This application has intense compute, networking, and storage demands that lend well to scalingdiﬀerent grid tiles of simulation time steps to multiple VMs with periodic communication of intermediate results andgrounding grid tile boundary conditions against provided observational data. (1) User checks out deployment code and conﬁgures their desired cloud provider credentials and tools, and setscluster size(2) User installs Terraform and Ansible with provided commands to perform deployment(3) User executes deployment which creates a cluster of the desired size on the cloud provider and runs short tests(4) User may use deployment script outputs to access the cluster for speciﬁc manual application runs(5) User copies out result data and cleans up cluster using Terraform destroy commandCluster deployment including resource provisioning and server instance conﬁguration is entirely automated, trig-gered by the user calling a single Terraform command, reviewing the proposed changes, and choosing to execute.

In more detail, the cluster provisioning automates the following processes:(1) Terraform reads the resources desired from the appropriate cloud provider template folder(2) Terraform reads the provided cloud provider credentials(3) Terraform plans the resources it must create, namely networks, security groups, and VM instances(4) Terraform requests approval to create the resources, which will incur potential costs(5) Terraform uses an underlying provisioning API to create resources, starting with networks(6) Terraform creates a VM where cluster software packages will be installed(7) Ansible waits for the VM to come up, then installs software packages needed by all cluster nodes, includingDocker images with scientiﬁc application code(8) Terraform tears down the VM to ensure a clean disk snapshot(9) Terraform is notiﬁed of Ansible completion and takes a server image (1) Terraform creates more VM instances as above using the server image as a base copy(2) Ansible gets IPs of created cluster VMs(3) Ansible builds cluster speciﬁc Docker images with cluster info, and builds in ssh conﬁguration for later use withMPI(4) Terraform creates NFS server for the cluster (if needed by the application) eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA (5) Ansible conﬁrms NFS mounts on each host (if needed by the application) (1) Ansible is invoked upon completion of cluster node provisioning to start tests(2) Tests are run over ssh inside of Docker on a designated head node and launch mpirun commands(3) For speciﬁc user data processing jobs, the user uploads input data to the chosen cloud storage or cluster NFSserver(4) Ansible or manual ssh performs a fetch of data needed, and then Ansible initiates scientiﬁc computation(5) Upon completion of the application run, the user can fetch data using scp, aws s3 command-line, or otherconvenient data movement tools(6) The Terraform destroy command can be used to remove all compute resources and leave data and server image,or all resources entirely(7) Should a user desire to persist the cluster to restart, cloud provider speciﬁc commands can pause/restart thecompute resources The public cloud provider we deployed full scientiﬁc workﬂow runs on for this work is AWS, and we wrote and testedTerraform and Ansible scripts for these applications (as well as other scientiﬁc workﬂows) on GCP. Our choice ofbasic network and VM infrastructure provisioning with Terraform also allows us to support extensions to other publicclouds including Microsoft Azure and many more. Deployment to other platforms have been explored, but have notbeen fully automated at this time.

There are a dizzying array of available cloud provider services, with accompanying power andconvenience in equal measure to the volume of user documentation and choices presented to a user. Towards the goal ofenabling users to quickly deploy scientiﬁc software to run massively parallel or on a communicating cluster to engagemany compute resources, we do not burden users immediately with undue choices nor present the full complexity ofcloud provider oﬀerings available. Sensible defaults are chosen for the applications presented and these ﬁner detailsare exposed in the Terraform infrastructure provisioning descriptions and can be modiﬁed as users see ﬁt, and as theydesire to learn more powerful controls of the performance and cost of the deployed systems.The largest diﬀerences between the cloud infrastructure providers are the names of diﬀerent resources described inthe Terraform resource descriptions, written in a format similar to JSON dubbed HashiCorp Conﬁguration Language(HCL). A GCP compute network is similar to an AWS virtual private cloud. Both are used to designate the networkthe VMs are created on. To control the ingress and egress traﬃc of the VMs we used the AWS security group whichis equivalent to the GCP compute ﬁrewall. In order to facilitate MPI communication for on-demand cluster creation,various security group port settings in combination with instance level Docker container settings and internal con-tainer process launch commands were attempted to arrive at working conﬁgurations that have been captured in thedeployment, build, and run scripts we present. Cluster access is secured by appropriate default network controls andssh access key conﬁguration. To make copies of the base VM in AWS, we used the Terraform resource "AMI frominstance" to create an image from the base VM and multiple VMs from the created image. In GCP, we ﬁrst created aGoogle compute snapshot of the VM, which was then converted to a Google compute disk, which was then used tocreate a Google compute image. From there, multiple VMs can be created from the image. Individual cloud providers EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. tend to host the VM server instances using industry standard KVM or Xen hypervisors, but any hypervisor deriveddiﬀerences in execution or performance are beyond the scope of this analysis, with reproducibility veriﬁed at the levelof application numerical results.

Aristotle Cloud Federation is a collection of on campus infrastructure hosting resources at mul-tiple universities that utilize the open source OpenStack infrastructure hosting services to provide storage, network,and computing services for users. In this work, we deploy on the OpenStack resources of Cornell University Red Cloudusing standard networks and VMs that are similarly available on other federation sites and OpenStack providers else-where.The largest diﬀerences between public and private cloud providers are the Terraform provider used for infrastructureresource creation, which means diﬀerent section names in the desired resource description rendered in HCL. As far asarchitecture structure and resource creation decisions, the network descriptions are again the largest point of diﬀerence,but similarly provide communication to and among the deployed VMs. Details of private network creation and themany resources necessary to make the ﬁrst reachable VM on the infrastructure target are thus automatically createdwithout further user involvement nor navigation of novel linked webpage deployment instructions.

The primary beneﬁt of this work is the ability to quickly and easily move a scientiﬁc computing application – includ-ing HPC applications which are communication-bound – with its dependencies and associated setup to any numberof cloud infrastructures. This has several resulting outcomes that are beneﬁcial, with a few disadvantages, and has re-sulted in many lessons learned. Our approach increases the accessibility of the cloud computing paradigm for scientiﬁccomputing, has the ability to leverage multi-cloud deployments, adds portability and reproducibility naturally to theprocess of scientiﬁc software deployment in the cloud, and can simplify the process of iterative software developmentdue to rapid deployment options.Our implementation drastically eases or removes the infrastructure implementation required of scientists and re-searchers, both in understanding and in time to develop, freeing up precious time to target scientiﬁc results or perfor-mance within an application. For researchers who do not otherwise have access to large-scale computational resources,or who have access to the cloud but not the understanding of speciﬁc deployment contexts and tools to be able to lever-age the cloud eﬀectively, our provided scripts can be employed to enable access more readily. Furthermore, researchstaﬀ responsible for supporting researchers in scientiﬁc computing can apply this work to on-premise clouds to auto-mate deployments, or to public cloud deployments to expedite researcher progress.Since Terraform and Ansible are already designed to handle deployments in a variety of clouds, there is no addedwork to switch from one cloud to another other than the cloud-speciﬁc details that must change, but would need toregardless of deployment method. The same container can be used on any of the public cloud vendors and many privateclouds. Furthermore, the same container can be utilized on a personal computer for development, then deployed inmultiple clouds, increasing the ability to push changes and rapidly deploy improvements or new scientiﬁc tests.A disadvantage is if you need to change the implementation, it may require understanding some cloud infrastructurespeciﬁc details (though it would anyway) and knowing enough Terraform or Ansible to be able to make the changes. Itis a smaller overall cost than complete manual deployment, but especially a downside if you are already familiar withor targeting a single cloud vendor and already have understanding of other tools. A further disadvantage could be if a eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA container does not already exist for your application, or one of the cloud services you would like to leverage has notalready been scripted, then time would need to be spent in development.The experience of containerizing, automating the deployment of, and running the computations for these scientiﬁcapplications has provided a wealth of lessons on both the diﬃculties and the simpliﬁcations available when movingscientiﬁc research from a variety of disciplines to the cloud, which we shall summarize. The simplest deployment,especially if a Multi-VM setup is required, can be the best way to illustrate requirements that might otherwise havebeen overlooked due to the diﬀerences between the cloud computing paradigm and other compute resources. Aswith any new system, it pays to start with the simplest use case, and build up incrementally to the full scale of theapplication. For deployment in the cloud, this means not only starting with a small-scale run, but also a small datainput and output, a single VM (as far as possible), a minimal-size container, as so on. Starting with large data sizes cancause undue complexity to the process of deployment and computation, whether in choosing the appropriate VM sizeand type to handle the load, in large long-term storage costs while still developing, or in large egress charges on publiccloud (though this is not an issue for some private campus clouds such as Aristotle Cloud Federation). Thus, it is alsoimportant to become familiar with the cost model of your chosen cloud provider(s) and determine cost requirementsconcurrently with cloud infrastructure requirements. After a successful run of a small deployment of an application,these requirements become clearer.The importance of the choice of software tools one uses cannot be overstated. While there is a plethora of tools –whether created by speciﬁc cloud vendors, industry partners, or otherwise – that can be used to facilitate conﬁguration,deployment, automation, and computation in the cloud, it is vital to select the tools that are not only the best for the job,but also enable the user to get scientiﬁc code running quickly. Across a variety of clouds, we have found that Terraformand Ansible provide rapid conﬁguration, deployment, and management of compute resources for scientiﬁc workﬂowsin a manner simpliﬁed for those familiar with scripting and similar tools. Applications using Python, bash scripting,or similar tools are convenient to run from Ansible, empowering the user to increase automation of the applicationruntime in the deployed environment with low eﬀort. Initial adoption of cloud computing for deployment of scientiﬁc and HPC workﬂows can require a large lead time tolearn new technologies, develop containers that support software development and production work, comprehensionof how to translate requirements to cloud infrastructure options, and even learning the nuances of how particularcloud vendors operate. The technologies we’ve presented in this paper can be very useful tools to reduce this lead timeand deploy scientiﬁc runs more rapidly, while increasing reproducibility and portability of the scientiﬁc workﬂowsin the process. We have provided open source code and examples in the hopes that others can leverage this work toincrease their own understanding of the cloud as an infrastructure to support scientiﬁc computing, and to deploy newworkﬂows to the beneﬁt of the scientiﬁc community at large.However, moving scientiﬁc research applications to the cloud has the potential to be an undertaking, and this is es-pecially true for the aforementioned HPC applications which are communication-bound. For the researcher interestedin moving an MPI application to the cloud, this approach may decrease the deployment time and add other beneﬁts,but several other factors are of concern as well in making the decision: cost, availability of compute resources, exis-tence of a container template for the application (or time and knowledge to develop one), and familiarity with thecloud and associated tools. A compelling case for cloud is when compute cycles are simply not available elsewhere, orcloud resources (such as campus clouds) are more readily available than HPC-style resources. Conversely, the public EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. cloud might be cost-prohibitive for very large-scale or large-output applications, but one is able to secure time on asupercomputer. More work is needed to fully examine when is a good time to turn to cloud as a resource for scientiﬁccomputing and HPC applications.Our plan for future work includes completing automation of these workﬂows on Microsoft Azure and Google Cloud,cost analysis of deployments, and bare-metal performance comparisons on HPC resources. There are also more appli-cation areas to explore using this approach, including those that have need of specialized hardware (such as GPUs).Furthermore, while we have stated some reasons for selecting this approach in favor of others for the deployment ofthese workﬂows, a broader analysis of other deployment methods (such as Kubernetes and vendor-speciﬁc strategies)is another next step.

ACKNOWLEDGMENTS

The authors would like to thank the following researchers who contributed to the development of this project: JimCordes, Shami Chatterjee, Julianne Quinn, Tristan Shepherd, Robert Wharton, and Marty Sullivan, as well as studentssupported by the CAC: Elizabeth Holzknecht and Shiva Lakshaman; and Cornell student Shen Wang for FRB_Pipelinecontributions. The authors would also like to thank the anonymous referees for taking the time to review our workand provide feedback.This work has been supported by the Cornell University Center for Advanced Computing and the Aristotle CloudFederation project. This work is supported by National Science Foundation under Grant Number: OAC-1541215. CloudCredits supporting this research were provided via Amazon AWS Research Credits, Google Research Cloud Program,and Microsoft Azure for Research.

REFERENCES [1] Devansh Agarwal, Kshitij Aggarwal, Sarah Burke-Spolaor, Duncan R. Lorimer, and Nathaniel Garver-Daniels. 2019. Towards Deeper NeuralNetworks for Fast Radio Burst Detection. arXiv:1902.06343 [astro-ph.IM][2] Amazon Web Services Inc. 2020.

Amazon Web Services (AWS) . Amazon Web Services Inc. Retrieved October 23, 2019 from https://aws.amazon.com/[3] Amazon Web Services Inc. 2020.

Amazon Web Services (AWS) storage services . Amazon Web Services Inc. Retrieved October 23, 2019 fromhttps://aws.amazon.com/products/storage/[4] Andrew Jameson and Ben Barsdell. 2019.

HEIMDALL: Transient Detection Pipeline . SourceForge. Retrieved February 13, 2020 fromhttps://sourceforge.net/projects/heimdall-astro/[5] Aristotle Cloud Federation. 2020.

Aristotle Cloud Federation Science Use Cases . Aristotle Cloud Federation. Retrieved February 16, 2020 fromhttps://federatedcloud.org/science/index.php[6] Berkeley SETI Research Center. 2018.

Breakthrough Listen: 4-8 GHz Detections of FRB 121102 . Berkeley SETI Research Center. Retrieved January28, 2020 from https://seti.berkeley.edu/frb121102/technical.html[7] Berkeley SETI Research Center. 2018.

Breakthrough Listen: Machine Learning Enables New Detections of FRB 121102 . Berkeley SETI Research Center.Retrieved January 28, 2020 from https://seti.berkeley.edu/frb-machine/technical.html[8] Bruno Bzeznik, Oliver Henriot, Valentin Reis, Olivier Richard, and Laure Tavard. 2017. Nix As HPC Package Management System. In

Proceedingsof the Fourth International Workshop on HPC User Support Tools (Denver, CO, USA) (HUST’17) . ACM, New York, NY, USA, Article 4, 6 pages.https://doi.org/10.1145/3152493.3152556[9] Canonical Ltd. 2020.

Linux Containers . Canonical Ltd. Retrieved February 17, 2020 from https://linuxcontainers.org[10] S. R. Carpenter, D. Ludwig, and W. A. Brock. 1999. Management oF Eutrophication for Lakes Subject to Potentially Irreversible Change.

EcologicalApplications

9, 3 (1999), 751–771. https://doi.org/10.1890/1051-0761(1999)009[0751:MOEFLS]2.0.CO;2[11] Ryan Chamberlain and Jennifer Schommer. 2014. Using Docker to Support Reproducible Research.

DOI: https://doi. org/10.6084/m9.ﬁgshare

The Astronomical Journal

Annual Review of Astronomy and Astrophysics

57, 1 (Aug2019), 417âĂŞ465. https://doi.org/10.1146/annurev-astro-091918-104501[14] Docker Inc. 2020.

Docker eproducible and Portable Workflows for Scientific Computing and HPC in the Cloud PEARC ’20, July 26–30, 2020, Portland, OR, USA [15] Eelco Dolstra. 2006.

The purely functional software deployment model . Ph.D. Dissertation. Utrecht University.[16] NixOS Foundation. 2020.

Nix Package Manager . NixOS Foundation. Retrieved October 22, 2019 from https://nixos.org/nix/[17] NixOS Foundation. 2020.

NixOS Manual . NixOS Foundation. Retrieved February 06, 2020 from https://nixos.org/nixos/manual/

Systemd-nspawn

The AstrophysicalJournal

HashiCorp Conﬁguration Language (HCL) . HashiCorp Inc. Retrieved February 17, 2020 from https://github.com/hashicorp/hcl[21] HashiCorp Inc. 2020.

Terraform

Ansible

Asian Journal ofPharmaceutical and Clinical Research

10 (07 2017), 471. https://doi.org/10.22159/ajpcr.2017.v10s1.20519[24] Julianne Quinn. 2019.

Lake_Problem_DPS . Cornell University. Retrieved January 28, 2020 from https://github.com/julianneq/Lake_Problem_DPS[25] Julianne Quinn and Peter Vaillancourt. 2019.

Lake_Problem_DPS . Aristotle Cloud Federation. Retrieved January 28, 2020 fromhttps://github.com/federatedcloud/Lake_Problem_DPS[26] Joseph R. Kasprzyk, Shanthi Nataraj, Patrick M. Reed, and Robert J. Lempert. 2013. Many objective robust decision making for complex environ-mental systems undergoing change.

Environmental Modelling & Software

42 (2013), 55 – 71. https://doi.org/10.1016/j.envsoft.2012.12.007[27] Richard Knepper, Susan Mehringer, Adam Brazier, Brandon Barker, and Resa Reynolds. 2019. Red Cloud and Aristotle: Campus Clouds andFederations. In

Proceedings of the Humans in the Loop: Enabling and Facilitating Research on Cloud Computing (Chicago, IL, USA) (HARC ’19) .Association for Computing Machinery, New York, NY, USA, Article 4, 6 pages. https://doi.org/10.1145/3355738.3355755[28] Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. 2017. Singularity: Scientiﬁc containers for mobility of compute.

PLOS ONE

12, 5 (052017), 1–20. https://doi.org/10.1371/journal.pone.0177459[29] Matthew Lebofsky, Steve Croft, Andrew P. V. Siemion, Danny C. Price, J. Emilio Enriquez, Howard Isaacson, David H. E. MacMahon, DavidAnderson, Bryan Brzycki, Jeﬀ Cobb, Daniel Czech, David DeBoer, Julia DeMarines, Jamie Drew, Griﬃn Foster, Vishal Gajjar, Nectaria Gizani, GregHellbourg, Eric J. Korpela, Brian Lacki, Soﬁa Sheikh, Dan Werthimer, Pete Worden, Alex Yu, and Yunfan Gerry Zhang. 2019. The BreakthroughListen Search for Intelligent Life: Public Data, Formats, Reduction, and Archiving.

Publications of the Astronomical Society of the Paciﬁc

Google Cloud . Google LLC. Retrieved October 23, 2019 from https://cloud.google.com/[31] D. R. Lorimer, M. Bailes, M. A. McLaughlin, D. J. Narkevic, and F. Crawford. 2007. A Bright Millisecond Radio Burst of Extragalactic Origin.

Science

Microsoft Azure Cloud Computing Service . Microsoft. Retrieved May 18, 2020 from https://azure.microsoft.com/en-us/[33] NANOGrav. 2020.

North American Nanohertz Observatory for Gravitational Waves (NANOGrav) . NANOGrav. Retrieved January 28, 2020 fromhttp://nanograv.org/[34] Nate Garver. 2019. nanopulsar Docker Container . NANOGrav. Retrieved January 28, 2020 from https://github.com/nanograv/nanopulsar[35] National Center for Atmospheric Research (NCAR). 2019.

WRF_DOCKER . NCAR. Retrieved February 16, 2020 fromhttps://github.com/NCAR/WRF_DOCKER/[36] Peter Vaillancourt and Adam Brazier. 2019. modulation_index Docker container . Aristotle Cloud Federation. Retrieved January 28, 2020 fromhttps://github.com/federatedcloud/modulation_index/tree/master/docker[37] Peter Vaillancourt and Nate Garver. 2019. nanopulsar Docker container updated . Aristotle Cloud Federation. Retrieved January 28, 2020 fromhttps://github.com/federatedcloud/nanopulsar[38] Peter Vaillancourt, Plato Deliyannis, and Akshay Suresh. 2020.

FRB_Pipeline . Aristotle Cloud Federation. Retrieved January 28, 2020 fromhttps://github.com/federatedcloud/FRB_pipeline[39] Poul-henning Kamp and Robert N. M. Watson. 2000.

Jails: Conﬁning the omnipotent root

Proceedings of the18th USENIX Conference on System Administration (Atlanta, GA) (LISA âĂŹ04) . USENIX Association, USA, 241âĂŞ254.[41] Julianne D. Quinn, Patrick M. Reed, and Klaus Keller. 2017. Direct policy search for robust multi-objective management of deeply uncertainsocio-ecological tipping points.

Environmental Modelling & Software

92 (2017), 125 – 141. https://doi.org/10.1016/j.envsoft.2017.02.017[42] Scott Mitchell Ransom. 2001.

New search techniques for binary pulsars . Ph.D. Dissertation. Harvard University.[43] Michael Rosenstein and Andrew Barto. 2001. Robot Weightlifting By Direct Policy Search.[44] L. G. Spitler, J. M. Cordes, J. W. T. Hessels, D. R. Lorimer, M. A. McLaughlin, S. Chatterjee, F. Crawford, J. S. Deneva, V. M. Kaspi, R. S.Wharton, and et al. 2014. Fast Radio Burst Discovered in the Arecibo Pulsar ALFA Survey.

The Astrophysical Journal

EARC ’20, July 26–30, 2020, Portland, OR, USA Vaillancourt and Wineholt et al. [45] Sylabs. 2020. Running Services - Singularity Container 3.0 Documentation. https://sylabs.io/guides/3.0/user-guide/running_services.html[Online;accessed 13-Jan-2020].[46] WRF Release Committee. 2020.

WRF