[PDF] Automated System Performance Testing at MongoDB

Abstract

Distributed Systems Infrastructure (DSI) is MongoDB's framework for running fully automated system performance tests in our Continuous Integration (CI) environment. To run in CI it needs to automate everything end-to-end: provisioning and deploying multi-node clusters, executing tests, tuning the system for repeatable results, and collecting and analyzing the results. Today DSI is MongoDB's most used and most useful performance testing tool. It runs almost 200 different benchmarks in daily CI, and we also use it for manual performance investigations. As we can alert the responsible engineer in a timely fashion, all but one of the major regressions were fixed before the 4.2.0 release. We are also able to catch net new improvements, of which DSI caught 17. We open sourced DSI in March 2020.

Full PDF

AAutomated System Performance Testing at MongoDB

Henrik Ingo [email protected] Inc

David Daly [email protected] Inc

Figure 1: Timeseries of daily build results for YCSB load

ABSTRACT

Distributed Systems Infrastructure (DSI) is MongoDB’s frameworkfor running fully automated system performance tests in our Con-tinuous Integration (CI) environment. To run in CI it needs toautomate everything end-to-end: provisioning and deploying multi-node clusters, executing tests, tuning the system for repeatableresults, and collecting and analyzing the results. Today DSI is Mon-goDB’s most used and most useful performance testing tool. It runsalmost 200 different benchmarks in daily CI, and we also use it formanual performance investigations. As we can alert the responsibleengineer in a timely fashion, all but one of the major regressionswere fixed before the 4.2.0 release. We are also able to catch netnew improvements, of which DSI caught 17. We open sourced DSIin March 2020.

CCS CONCEPTS • Software and its engineering → Acceptance testing ; •

Gen-eral and reference → Performance ; Validation ; •

Informationsystems → Parallel and distributed DBMSs ; •

Computer sys-tems organization → Cloud computing.

KEYWORDS

Databases, Distributed Databases, Testing, Performance, MongoDB,Python, Cloud

ACM Reference Format:

Henrik Ingo and David Daly. 2020. Automated System Performance Testingat MongoDB. In ,. ACM, New York, NY, USA, 6 pages. https://doi.org/.../...

MongoDB created a dedicated performance testing team in 2013.The first approach was to build a team of systems engineers andperformance experts that would manually benchmark new releasesof MongoDB, and was found to be problematic. It was not scalable,and it also had issues with repeatability.The focus then shifted to running system performance tests ina daily Continuous Integration (CI) build, just like is done for any

Author Preprint, Arxiv other testing. But where all our other tests can run on a single server,the system performance builds needed to deploy realistic MongoDBclusters in EC2. Initially this was accomplished by moving someshell scripts, and Terraform[9] config files, from the manual bench-marking, into the CI configuration.From these humble beginnings arose a software developmentproject to design and rewrite the same system more properly inPython. The framework was called

Distributed System Infrastructure ,or

DSI for short.In this paper, we present the architecture and design of thisframework and results from using it to benchmark MongoDB. DSIis used in MongoDB to run hundreds of automated benchmarksper day. The benchmarks in CI are fully automated end-to-end: pro-visioning EC2 infrastructure and deploying the MongoDB cluster,orchestrating test execution, and collecting and analyzing results.It is also used by engineers to create new tests, reproduce resultsfrom CI and manual "explorative" benchmarking. Since 2018, DSI isour most used and most useful performance testing tool. We haveopen sourced it in March 2020 [1].

The focus for DSI was serving the more complex requirementsof end-to-end system performance tests on real clusters, automat-ing every step including provisioning of hardware, and generatingconsistent, repeatable results .More specifically, the high level goals of the project were:(1) Full end-to-end automation.(2) Support both CI and manual benchmarking.(3) Elastic, public cloud infrastructure.(4) Everything configurable.(5) All configuration is via a single, consistent, YAML basedsystem.(6) Clear modularity for executables and configuration.(7) Diagnosability.(8) Repeatability. a r X i v : . [ c s . D B ] A p r uthor Preprint, Arxiv Henrik Ingo and David Daly The main design goal was to move from a model where each newconfiguration required a new shell script, to a model where every-thing was driven by configuration files and no code changes wouldbe needed to add a new MongoDB configuration or test.Further, even for parts that had configuration files, it was be-coming tedious to deal with so many of them. Terraform has JSON-like configuration files. MongoDB configuration files are in YAML.YCSB[10] configuration is a Java properties file. We wanted all con-figuration to be in one place and in a homogeneous format so thata user could easily understand the whole configuration.We also wanted to avoid redundancy in configuration. WhenTerraform needs an SSH key to connect to the hosts, and when welater use SFTP to collect log files, both of these operations shoulduse the same SSH key and it should exist in the configuration onlyonce.We wanted maximum flexibility. The framework needed to sup-port any infrastructure, any possible MongoDB configuration op-tion, any cluster topology, including deploying more than a singlecluster, and needed to be capable of executing any benchmark soft-ware.The configuration also needed to be modular. It must be pos-sible to execute the same YCSB test against different MongoDBconfigurations, and it must be possible to use the same MongoDBconfiguration on different hardware.We also needed to be able to document and trace every bit ofconfiguration. A common organizational setup is a separate teamthat develops and deploys the operating system images. If thatteam changes the image without the developers or performanceexperts noticing, and such changes lead to performance changes,then much time can be wasted as engineers futilely try to diagnosea change they do not have visibility into. We use vanilla OS imagesavailable in EC2. All operating system configuration is done byscripts that are part of DSI. This means that the entire configuration,and changes to it, can be reviewed from a single commit history.While developing DSI, we learned that our system performancetesting also had issues with repeatability of test results. So as aparallel project we learned to configure EC2, Linux and the teststhemselves to minimize system noise. We have reported on thiswork previously in [15]. DSI encapsulates the results of that workin its configuration. When MongoDB engineers use DSI for perfor-mance testing, they automatically deploy systems configured forminimal noise and maximum repeatability.

In 2016 MongoDB was already heavily invested in using publiccloud infrastructure for testing and CI. We had even developed ourown CI system, Evergreen[13], to replace Jenkins[18] with a systemwhere everything was built from the ground up to use elastic cloudresources. As commits tend to happen during daytime, the needfor CI builds varies over the day. To minimize the turnaround timefrom commit to a completed CI build, we try to parallelize testexecution as much as possible.For system performance testing, the elasticity of cloud infras-tructure is even more valuable. We want to test realistic clusters.A weekly build over a cluster with 3 shards has 16 servers. To test scaling to the largest instance sizes we occasionally have testedservers with 96 CPUs or 500 GB of RAM. To procure such serversas on-premise hardware would not be realistic.At the start of the project, an open question was whether publiccloud infrastructure could be relied on for performance testing atall. We found when comparing on-premise and cloud that therewas no difference in terms of repeatable performance [15].

The primary use case is to run daily CI builds on public cloudservers. This requires full end-to-end automation.A challenge in diagnosing regressions and test errors, is thatby the time a human is looking at the results, the servers are nolonger available. For example, we once had a failure where thestorage engine aborted during a test. A possible reason for theabort could simply be that the disk was full. But we had no way toknow whether the disk had been full.At the end of the test, DSI therefore collects all the log files backto the control host, before the test cluster is terminated. The re-sults are then uploaded to S3. Around 2017 the MongoDB serverhad itself developed rather sophisticated metrics collection. FullTime Diagnostic Data Capture [3] collects over a thousand met-rics, including system level metrics like CPU, disk, network, andmemory utilization. All we needed to do was to save those files.To augment these metrics, we also borrowed a script used by oursupport team [4] which collects additional information from thehost, such as how full the disks are.An important question, when diagnosing test results, is to verifywhat configuration exactly was used for the test. Therefore the DSIinput configuration is added to the same archive file as the logs andtest results. This configuration can also be used to reproduce theregression if required.

In addition to running benchmarks in Evergreen, DSI also needs tobe usable on its own. This is important both for engineers develop-ing DSI itself, and for MongoDB engineers who need to reproduceand diagnose performance issues assigned to them. This require-ment may seem obvious, but at the start of the project, the shellscript based system was so entangled with the evergreen configu-ration file, that it had become practically impossible to reproducetests outside of Evergreen. It was faster to submit a new CI job thanto try to execute the same manually!As these system-level tests are complex and have many parame-ters, commands that did support command line options could oftenstretch up to 3 lines. We felt this was poor usability, and in factwent against the goal of easy reproducibility of tests, as it was easyto miss some option. As a result, we decided to ban command lineoptions completely, and force all options into configuration files.This also ensured a full audit trail, since configuration files wouldbe archived after each test, while command line options may notbe recorded anywhere.The Evergreen configuration for MongoDB system performancetests is stored in the MongoDB source code repository [2], whileDSI itself is its own repository. This often led to situations where a utomated System Performance Testing at MongoDB Author Preprint, Arxiv change had to be committed part to DSI, part to MongoDB, and inthe latter case often to 3 different stable branches too. This addedboth unnecessary complexity and stress. A high-priority goal wasfor the interface between the two repositories to be as minimal andstable as possible.

Based on experience from the existing system, we could identifyeight separate modules, that together constitute a full execution ofone or more benchmarks.

Bootstrap.

As the system began to take shape, we realized thatautomating some repetitive setup tasks from the beginning madesense: Copying the configuration files into place, finding EC2 cre-dentials, installing correct version of Terraform. This evolved intothe bootstrap module.The bootstrap.yml configuration file acts as the interface be-tween the MongoDB repository and the DSI repository. It declaresthe composition of configuration files wanted for all the other mod-ules. For example: YCSB on a three-node replica set. Those otherconfiguration files are stored inside DSI repository. Thus, the de-tails of each configuration is abstracted away from the MongoDBrepository.

Infrastructure provisioning.

Infrastructure provisioning uses Ter-raform to deploy EC2 resources. The DSI configuration, in YAMLformat, is significantly simpler than the Terraform files used behindthe scenes. It also ensures that if the deployment fails, the Terraform destroy command is called to cleanup any half-deployed resources.

System setup.

As of this writing, system setup was never imple-mented as its own top level executable. It remains a "remote script"executed by Terraform. One drawback is that it is therefore un-aware of the rich DSI configuration files and essentially installs thesame software and formats the same disks each time.

Workload setup.

This module installs dependencies for the specifictest, such as Java for YCSB[10] and Linkbench[11]. An interestingquestion was whether workload setup should happen before orafter MongoDB setup. Some tests need to provision a data directory,and this must be done before MongoDB starts. Other tests needto specify a shard key, and this must be done against a runningMongoDB cluster. In the end we came to the conclusion that work-load_setup is run first, and the test control module offers pre_task and pre_test hooks to cover for the second use case.

MongoDB setup.

MongoDB setup deploys one or more MongoDBclusters.

Test control.

The test control module supports several in-house andthird-party benchmark tools. In addition to running the benchmarkitself, it supports the aforementioned pre_task and pre_test hooks.After the test it collects a number of log files and diagnostics backto the control host.

Analysis.

Analysis of results comes in two parts. Our main interest isto determine whether benchmark results deviate from some recenthistory of results, and secondarily from past releases. We eventuallyconcluded that this question is better answered by signal processingalgorithms that look at our daily results holistically as a timeseries, rather than focusing on the single point in time that is the currentlyfinished benchmark result. We have reported on this work in [12].These algorithms are developed and executed separately from DSI.Some static checks remain in DSI: Does the MongoDB serverlog file contain errors or stack traces, or are there core files on thecluster hosts, etc? The scope of DSI is therefore everything thatconcerns a single system performance task execution. Anythingthat requires historical data as input is in the signal processingproject.

Infrastructure teardown.

Infrastructure teardown uses Terraform torelease the cloud resources.The modules should be executed in the above order. To allow forflexibility and modularity, it is however allowed to skip some. Forexample, a user who wants to test an existing MongoDB clustercould just point test_control directly at that.

The analysis code was already written in Python. We ended up writ-ing the other modules in Python as well. Maybe it was mostly as afunction of gravity, but we have found Python a rather suitable lan-guage for this task. The combination of being a real programminglanguage and yet a scripting language that does not require compi-lation has afforded a lot of flexibility during the active developmentphase.

The configuration files are the heart of DSI. Since we do not allowcommand line options at all, the configuration files are essentiallythe user interface. The docs/ directory in the source code [1] docu-ments all the available configuration options. The configurations/ directory contains a growing selection of canned configurations,including all configurations used by daily CI builds.As mentioned in the design goals (see 2.1), ideally we wantedall configuration in one big file, but modularity requires that userscan mix and match different configurations. The actual implemen-tation has one YAML configuration file for each module: infrastruc-ture_provisioning.yml, workload_setup.yml, etc...To ensure that all modules read the configuration in a consistentway, a library is used to access the configuration. The library readsall the YAML files into one big Python dictionary structure. So fromthe DSI developer’s perspective it truly is like all configuration wasprovided from a single big YAML file. The same library also pro-vides various additional functionality. All defaults are centralizedin a defaults.yml file, from which a default value is transparentlyreturned when the actual config file does not specify a value. SinceYAML anchors cannot be used between different YAML files, thelibrary also provides a ${variable.reference} syntax to reference oneconfiguration value from another part of the configuration.Listings 1 and 2 show two simplified configuration files. Keys mongod_config_file and workload_config embed literal MongoDBand YCSB configuration files. When executing the benchmark, DSIwill extract these into their own files, that are used as input to mongod and ycsb respectively.Since we do not know the IP address of an EC2 host ahead of time,the infrastructure_provisioning module will store that informationin a special .out.yml file. The MongoDB topology configuration uthor Preprint, Arxiv Henrik Ingo and David Daly

Listing 1: mongodb_setup.yml mongod_config_file: storage: engine: wiredTiger replication: replSetName: rs0 topology: - cluster_type: replset id: rs0 mongod: - public_ip: ${infrastructure_provisioning.out.mongod.0.public_ip} - public_ip: ${infrastructure_provisioning.out.mongod.1.public_ip} - public_ip: ${infrastructure_provisioning.out.mongod.2.public_ip} meta: hosts: ${mongodb_setup.topology.0.mongod.0.private_ip}:27017 hostname: ${mongodb_setup.topology.0.mongod.0.private_ip} mongodb_url: mongodb://${mongodb_setup.meta.hosts}/test?replicaSet=rs0 is_replset: true Listing 2: test_control.yml run: - id: ycsb_load type: ycsb cmd: ./bin/ycsb load mongodb -s -P ../../workloadEvergreen -threads 8 config_filename: workloadEvergreen workload_config: | mongodb.url=${mongodb_setup.meta.mongodb_url} recordcount=5000000 workload=com.yahoo.ycsb.workloads.CoreWorkload - id: ycsb_100read type: ycsb cmd: ./bin/ycsb run mongodb -s -P ../../workloadEvergreen_100read -threads 32 config_filename: workloadEvergreen_100read workload_config: | mongodb.url=${mongodb_setup.meta.mongodb_url} recordcount=5000000 maxexecutiontime=240 workload=com.yahoo.ycsb.workloads.CoreWorkload readproportion=1.0 uses those IP addresses via references. The config library resolvesthese automatically.The meta section facilitates the modular mixing of config files.Since the MongoDB URL depends on the type of MongoDB cluster,the URL to use is provided together with the MongoDB configu-ration. The YCSB configuration then uses the provided URL viareference.The system also supports a variety of hooks for setup and postprocessing. For example, a pre_cluster_start hook can be used todownload and provision database files before starting MongoDB,and a pre_task hook might create indexes or shard collectionsbefore the test starts.For a consistent experience, all configuration is in YAML. Ter-raform and MongoDB Atlas API expect JSON syntax as input.DSI transparently transforms the YAML configuration for thosecomponents to JSON.

We have been early adopters of Terraform [9]. Generally it hasserved us well, but some details have required workarounds alongthe way. Early on Terraform had a hard limit for how many serversit could provision at once. As a workaround we needed to deploy oursharded cluster in two separate steps. Another problem was that the Atlas is MongoDB’s Database as a Service product plugin that configures AWS Virtual Private Cloud (VPC) networkslacked dependency information. This caused server deploymentsto fail when they tried to use a VPC that did not exist yet.Newer Terraform versions almost never supported the configu-ration files from the older versions. This caused us to upgrade onlyrarely, as it was often days of work.Originally we reused clusters between Evergreen tasks. This wasimplemented because EC2 billing was hourly, and we wanted toavoid situations where we would have to double pay for an hourdue to terminating and instantly re-deploying the same cluster.As EC2 moved to per minute billing, we wanted to remove thiscode path that is used in CI only and therefore hard to test andeasy to forget. Yet doing this caused

InsufficientInstanceCapacity errors from EC2. It may have been related to some special EC2configurations we use: either dedicated instances or placementgroups. For whatever reason it was not possible to terminate acluster of N nodes and then expect to immediately start the sameN nodes again. Recently we revisited this issue, and it seems thisis no longer an issue. So now we were finally able to remove there-use of clusters.

Many challenges are human rather than technical:DSI takes a laissez faire attitude to all the configuration. DSI’srole is pass thru in nature: YAML is turned into a Python dictionarystructure, part of which is passed as input to mongod and ycsb . Ifthere is an error in the configuration, then at this point mongod or ycsb will print an error and fail. Trying to duplicate such an errorchecking procedure in DSI would be futile: MongoDB supportshundreds of configuration options and new ones are added everyyear.Yet software engineers are trained to validate user input. Teach-ing a software team to not do so is surprisingly hard. Early on amodule had been written in Go, a strongly typed language. Eachtime we wanted to pass a new option to the test, we had to add anattribute to the Go class that held the JSON input. This was a hugeimpediment until the Go code was replaced by Python.A similar struggle has been around the issue of providing de-faults from a centralized defaults.yml file. The instinct to alwaysprovide a default in the code — such as by sprinkling Python con-stants around the Python code — is strong among well educatedsoftware engineers. Accepting that there is a default, it’s just pro-vided by a library and therefore not visible in the lines of code beingwritten at this moment, can be hard. Yet in the DSI architecture,providing default values in the code can be considered an error.Since different modules access the same configuration, it is possible(and encouraged) that two independent code paths can access thesame configuration option. If both of them specify default valuesin the Python code, there is a chance that they specify differentdefaults, which would be inconsistent and a bug.A final observation around challenges of the human mind hasbeen a recurring tendency to overestimate the importance of what-ever small feature an engineer is currently working on. This recencybias causes engineers to add new options to bootstrap.yml — thetop level configuration file end users look at first — and document-ing this option on the first page of the user manual. But DSI has utomated System Performance Testing at MongoDB Author Preprint, Arxiv hundreds of configuration options and putting all of them in thetop level is not scalable. Moving options from the bootstrap.yml config file further "down" into other, more topical sections of theconfiguration, remains a regular occupation for the authors. Figure 2: Results for a YCSB test in the Evergreen CI envi-ronment.

Figure 2 shows the Evergreen result page of a task where DSIhas executed the YCSB benchmark. The benchmark results areshown as a yellow dot on the graphs in the bottom. The graphas a whole is the timeseries of daily build results. Bolded lines arestatistically significant changes highlighted by the signal processingalgorithm, and the green diamonds link to associated Jira tickets.The investment in running performance tests daily in CI has paidoff: A major regression has been found and fixed as part of normalengineering process, instead of waiting until a release candidate isreleased and only finding issues then. The top right corner showsfail/pass status of the static checks done by the DSI analysis module.As the development of DSI proceeded and its usability improved,the system performance benchmarking project in Evergreen hasbecome the primary target for MongoDB engineers to write per-formance tests for new features. As of March 2020 we have almost200 tests and 20 MongoDB configurations in the DSI repository,most of which run once per day. This is on top of the single-nodemicrobenchmarks that have been in use before DSI, and also inaddition to the Google Benchmark C++ unit tests, which run severaltimes per day.The goal of developing a flexible system that could be used to testarbitrary MongoDB configurations was achieved. An early valida-tion of the design was the addition of a test to measure performanceof the initial synchronization of data files, when a new node is addedto a replica set. The test required one of the MongoDB nodes to bedeployed detached from the replica set and in uninitialized state.This abnormal setup was possible with a configuration change only.Similar small victories, such as using arbiter nodes in a replica set,have followed later.While we have focused our testing on a single platform: EC2 andAmazon Linux, the goal was for DSI to be able to support also otherhardware infrastructure. This was finally proven recently whenwe have pointed a test_control.yml file against servers runningin Azure. A similar triumph of modular design was to replace the deployment of a MongoDB cluster with a HTTP call to deploy anAtlas cluster instead. By replacing mongodb_setup entirely, DSIcould be used to test other database products than MongoDB. Simi-larly, to add more infrastructure options, such as Kubernetes, wewould add support for them in infrastructure_provisioning module.Reproducing regressions caught by the CI system is straight-forward. An engineer can simply download the "DSI Artifacts"tar file from the Evergreen result page (see 2), untar the pack-age, cd into the directory, and execute the sequence of DSI com-mands ( infrastructure_provisioning , ...) to reproduce the exactsame steps as were executed in CI.During the MongoDB 4.2 development cycle the system perfor-mance CI builds caught 63 regressions, far beyond the microbench-marks project at 20. But more importantly, the MongoDB Serverengineers are now committed and able to reproduce and investigatethe regressions from CI. As a result, all but one of the major perfor-mance regressions had been fixed before stable release. In addition,we are also able to track net new improvements in performance, ofwhich there were 17 caught by DSI tests. The major improvementswere chronicled by marketing in a blog post [19].

At the time we started developing DSI, we were not aware of anysimilar end-to-end tools for performance testing of distributeddatabases — and certainly not one supporting MongoDB.On first sight, it still seems like not much has been published onthe topic of automating system level performance benchmarking.An illustrative example is the CockroachDB documentation, whichincludes a guide for benchmarking CockroachDB with TPC-C [5].It directs the user to start with manually deploying a CockroachDBcluster: "Repeat steps 1 – 3 for the other 80 VMs for CockroachDBnodes".

In reality this is not how Cockroach Labs engineers testCockroachDB clusters, rather the Github repository does include aDSI-like pair of tools roachprod and roachtest [7].Mikael Ronström’s book "MySQL Cluster 7.5 inside and out"includes a whole chapter on bench_run.sh , which supports severalbenchmark clients used to benchmark MySQL NDB Cluster [17]. Itrequires the user to have downloaded the necessary software to thecontrol host, and expects the servers to exist, but from this startingpoint automates deployment of the MySQL software and executionof the test.Scylla Cluster Tests (SCT) [8] is an end to end framework verysimilar to DSI. It supports the use of a "test oracle", a referencedatabase that executes the same tests and is assumed to alwaysproduce the correct response. The test oracle could be a previousstable release.Rally is the benchmarking tool for ElasticSearch [6]. It is in-stallable as a Python module via the pip tool, and comes with anextensive user manual. Similar to SCT, it supports benchmarkingdifferent ElasticSearch releases and comparing the results. Rallydoes not deploy any cloud resources, rather expects cluster hoststo already exist.The above approach to compare a new version in parallel to a"known good" release seems to be a common approach. While DSIdoes not support that as a first class feature, we should note that wedo similar comparisons of the development branch results against uthor Preprint, Arxiv Henrik Ingo and David Daly recent stable releases. However, we consider these monthly andannual reviews as a stopgap measure only. We feel that in a daily CIoriented workflow, focusing on comparing results against a recenthistory helps us catch regressions as they happen.SAP HANA uses a more CI oriented workflow for performancetesting, very similar to ours [16]. At MongoDB we have recentlymoved to a process, where merges to the master source code branchgo through an automated CI gatekeeper. However, we have yet toadd performance tests to this automated gatekeeping process, ratherengineers must submit their changes for performance testing onan opt-in basis. SAP is using performance tests as part of theirgatekeeping step. SAP reports also resource consumption, such asCPU utilization, as part of their test results. They note that only ahuman evaluator can judge whether increased CPU utilization is apositive or negative change.Outside of the database space, BenchFlow [14] provides a veryDSI-like end-to-end automation for benchmarking web services.

Distributed Systems Infrastructure (DSI) is MongoDB’s frameworkfor running fully automated system performance tests in our CIenvironment. Automating deployment of real multi-node clusters,executing tests, tuning the system for repeatable results, and thencollecting and analyzing the results, is a hard problem, and it tookus 3 attempts and 6 years to get it right. The open sourced DSIproject is the result of those efforts.Today DSI is MongoDB’s most used and most useful performancetesting tool. It runs almost 200 different benchmarks in daily CI,and we also use it for manual performance investigations. DuringMongoDB 4.2 development cycle, DSI caught 63 regressions. As wecan alert the responsible engineer in a timely fashion, all but onenon-trivial regressions were fixed before the 4.2.0 release. We arealso able to detect net new improvements, of which DSI caught 17.DSI presents some novel design choices. For example, we havebanned the use of command line options completely. By forcing allconfiguration into configuration files, we get an audit trail of eachtest execution, and engineers are also easily able to reproduce theexact same execution. Another uncommon choice is that we usevanilla images when deploying servers instead of building our owncustom image. This means that all configuration code is in the DSIrepository.

ACKNOWLEDGMENTS

Chung-Yen Chang and Rui Zhang were with us in the original Per-formance team that started work on a properly designed Pythonbased version. Jim O’Leary, Ryan Timmons, and Julian Edwardswere part of the team that wrote most of the code that is in DSI to-day. Several interns and new grads contributed major componentswhile visiting the team: Shane Harvey created mongodb_setup.py and the SSH plumbing. Ryan Chipman created bootstrap.py andBlake Oler the key feature to restart MongoDB between tests. Sev-eryn Kozak, William Brown, Kathy Chen, Pia Kochar worked onrules in analysis.py and test configurations. After the active de-velopment phase completed, maintainership of DSI has been withMax Hirschorn, Ryan Timmons, Robert Guo and Raiden Worley,while we have taken the role of end users using DSI to investigate MongoDB performance. It was Ryan Timmons who never gave upthe hope of one day open sourcing DSI, and played an active role infacilitating merges of remaining cleanup work to make that hope areality.Dan Pasette and Ian Whalen have acted as project leads duringtransition phases between above team compositions, and overalldeserve thanks for not giving up on the project during a longbut necessary rewrite. Cristopher Stauffer and April Schoffer haveprovided similar project oversight in recent years.We thank Eoin Brazil and the anonymous reviewers for valuablefeedback on this article, helping us to present our work more clearly.

REFERENCES

ICPE âĂŹ20, April 20âĂŞ24,2020, Edmonton, AB, Canada (ICPE ’20) , TODO (Ed.), Vol. 3. ACM. https://doi.org/10.1145/3358960.3375791[13] Kyle Erf. 2016. Evergreen Continuous Integration: Why We ReinventedThe Wheel. Blog Post. https://engineering.mongodb.com/post/evergreen-continuous-integration-why-we-reinvented-the-wheel[14] Vincenzo Ferme and Cesare Pautasso. 2017. Towards Holistic Continuous Soft-ware Performance Assessment. 159–164. https://doi.org/10.1145/3053600.3053636[15] Henrik Ingo and David Daly. 2019. Reducing variability in performance tests onEC2: Setup and Key Results. Blog Post. https://engineering.mongodb.com/post/reducing-variability-in-performance-tests-on-ec2-setup-and-key-results[16] Kim-Thomas Rehmann, Changyun Seo, Dongwon Hwang, Binh Than Truong,Alexander Boehm, and Dong Hun Lee. 2016. Performance Monitoring in SAPHANAâĂŹs Continuous Integration Process.

SIGMETRICS Perform. Eval. Rev.

MySQL Cluster 7.5 inside and out . Books on Demand, Chapter71: DBT2-0.37.50 Benchmark Scripts, 586–609. https://drive.google.com/file/d/1z72GCsHudw1X4Z4RZM869mxDViz0fxrB/view[18] John Ferguson Smart. 2011.