[PDF] Best Practices in Statistical Computing

Abstract

The world is becoming increasingly complex, both in terms of the rich sources of data we have access to as well as in terms of the statistical and computational methods we can use on those data. These factors create an ever-increasing risk for errors in our code and sensitivity in our findings to data preparation and execution of complex statistical and computing methods. The consequences of coding and data mistakes can be substantial. Openness (e.g., providing others with data code) and transparency (e.g., requiring that data processing and code follow standards) are two key solutions to help alleviate concerns about replicability and errors. In this paper, we describe the key steps for implementing a code quality assurance (QA) process for researchers to follow to improve their coding practices throughout a project to assure the quality of the final data, code, analyses and ultimately the results. These steps include: (i) adherence to principles for code writing and style that follow best practices, (ii) clear written documentation that describes code, workflow and key analytic decisions; (iii) careful version control, (iv) good data management; and (iv) regular testing and review. Following all these steps will greatly improve the ability of a study to assure results are accurate and reproducible. The responsibility for code QA falls not only on individual researchers but institutions, journals, and funding agencies as well.

Full PDF

Best Practices in Scientific Computing Ricardo Sanchez RAND Corporation, New Orleans, LA 70130 Beth Ann Griffin RAND Corporation, Arlington, VA, 22202 Daniel McCaffrey Educational Testing Service, Princeton, NJ 08541

Author Footnote:

Ricardo Sanchez is a Research Software Engineer at RAND Corporation New Orleans, VA 70130 (email: [email protected]); Beth Ann Griffin is Senior Statistician at RAND Corporation Arlington, VA 22202 (email: [email protected]); Daniel McCaffrey is a Principal Research Scientist at Educational Testing Services at Education Testing Services Princeton, New Jersey 08541(email: [email protected]).

Acknowledgments : The development of this manuscript was supported by funding from grant R01DA045049 from the National Institute on Drug Abuse. Abstract

Keywords:

Data management, programming, code review, transparency, replicability Introduction

The Lancet.

One involved the examination of the impact of Hydroxychloroquine in COVID-19 patients and which reported increased risks of in-hospital mortality for hospitalized patients. The paper was retracted after publication when an attempt at replication could not obtain the data and it was revealed that the study had used only a subset of the data for the analyses reported in the journal. Moreover, the subset was selected by the data vendor and not communicated to all the study researchers. The paper was released in the midst of a pandemic and potentially impacted care for critically ill patients. The second well-known case of a data error resulting in the retraction of a paper is the infamous vaccine study that erroneously linked the measles, mumps and rubella (MMR) vaccine to autism. The results of this study contributed to an ongoing and relatively widespread misbelief of the dangers of the vaccine, even though multiple papers have refuted any such connection. These examples appear to be just the tip of the iceberg. In fact, an analysis of 423 PubMed retraction notices revealed that 18.9% were the result of “analytic errors” related to code and data and are increasing in frequency. Beyond data errors that can arise from poorly managed and documented data processing, there is a widely acknowledged and ongoing crisis in the scientific literature where published results fail to replicate.

The replication crisis in medicine is of particular concern for both patients and providers. As noted in Ioannidis (2005): “Of 49 highly cited original clinical research studies, 45 claimed that the intervention was effective. Of these, 7 (16%) were contradicted by subsequent studies, 7 others (16%) had found effects that were stronger than those of subsequent studies, 20 (44%) were replicated and 11 (24%) remained largely unchallenged.” The study noted (as expected) that highly cited nonrandomized studies are more likely to be contradicted (5 out of the 6 considered) versus randomized control trials (9 out of 39). There are various theories as to the reasons that scientists are unable to reproduce the results of each other’s work. When surveyed, scientists hint that the most likely cause stems from the pressure to publish and selective reporting. Others have blamed poor statistical practices, such as p-hacking. Another theory lays blame on the failure to share methods in sufficient detail for others to replicate the study or verify the results. Openness is one proposed solution to both minimizing coding and analytic mistakes as well as the replication problem. This means providing others with the data and code used in a scientific analysis. It stands to reason that having access to the source materials would aid the peer review process and attempts to independently reproduce results.

Nature , for example, requires that materials, data, code, and protocols be made available. When this is not possible, any restrictions or exclusions must be fully disclosed. The American Economic Association also requires that data and code are clearly documented and that the authors provide non-exclusive access to the data and code. However, some argue that the openness does not go far enough. Another crucial element is transparency, which requires that data processing and code follow standards so that they can be used and understood by other analysts who attempt to review or replicate the work. Transparency also requires inclusion of additional information, such as detailed explanations of data workflows, with the data and code. A key to creating transparency and conducting data processing and analyses in a manner that mitigates the risk of error is for analyst to accept a documented and standardized approach to making quality assurance (QA) a central focus of scientific computing. QA is not something that should only be tacked on at the end of a project. Instead, it needs to be a philosophy that begins before the first line of code is even written. A key element of it must be good documentation that flows through coding and work at every stage of the project. This means describing the data, explaining the methods, and justifying analytic decision made along the way. The goal should be to provide sufficient detail such that anyone can understand and reproduce the research. Furthermore, the code should be well-written and organized such that it can be easily checked for errors. At the end of the process, the code should ideally be bug free. Overarching, there should be a code QA process that underpins all the work related to code and execution of the statistical methods used in each study. In this paper, we describe the key steps of a process for researchers to take to improve their coding practices throughout a project to assure the quality of the final data, code, analyses and ultimately the results. The keys of such code QA ( see Figure 1 ) are as follows. First, a team must decide upon and adhere to principles for code writing and a coding style guide that follow best practices. Next, the team needs to create clear written documentation throughout the lifecycle of a project that describes code, workflow and key analytic decisions. Third, careful version control to track code changes should be employed. Fourth, the team also needs good data management processes and, last (but certainly not least), the project should test and carefully review code regularly. Following all these steps will greatly improve the ability of a study to assure results are accurate and reproducible. We will review each of these in detail in turn in the next sections before presenting a discussion of how they can be integrated into an overall practice of data and analysis quality, transparency, and sharing to support accurate and reproducible results. Figure 1. Key strategies for code quality assurance (QA)

Reducing the risk of errors and producing code that is transparent places requirements on how code is written. To ensure the final code meets those requirements, code writing must follow a set of clearly defined principles and a style guide that adhere to best practices. These principles must be established, and the style guide selected before the first line of code is written. Thus, choosing a set of principles for writing code and a coding style guide is the first step in assuring quality in code and results. For code QA, the key is writing code that is readable and well-organized. Principles for writing readable code are different from the principles of software engineering which focus on writing efficient code in terms of performance. Both sets of principles are important; however, code that yields effective computing but is not clear and cannot be understood does not support transparency, reproducibility, or minimize the risk of errors. Code QA should yield code that is both efficient and can be understood by analysts who did not write it. Style guides are essential to the practice of writing understandable code. They layout rules for naming functions and variables, indenting, and other formatting related issues that make code neater and easier to follow. These rules can sometimes feel arbitrary, the code will likely run whether you follow them or not, but they greatly improve readability. Adherence to a style guide will help collaborators working on the same code base and make it easier for external reviewers to understand the code. Style guides exist for most programming languages and may be modified to better suit the specific project or team. For example, one popular R style guide by the well-known developer, Hadley Wickham, has been copied and modified by Google for their own internal use. If a suitable guide cannot be found, the research team can write their own by creating a set of rules to follow as a team. The key element is that the rules should be applied consistently throughout the project. In addition to the style guides, there are generally accepted best practices for writing “clean” code that fall somewhere between software engineering principles and style guide and focus on code organization. For example, a key element of clean code concerns the utilization of functions. It is generally preferable to use many small functions, less than 100 lines of code, that fully encapsulate discrete and independent functionality than one large block of code. There are several benefits to this practice. The first is that it helps the programmer organize and think through the exact purpose of the code. Wrapping the code in a function makes it easier to reuse and makes top-level code, that which strings together several functions, easier to read. Lastly, it is easier to test and debug code that it is broken up into individual components, a topic we will discuss in a later section. Figure 2. Sample strategies for writing clean code

Failing to adhere to styling conventions and best practices contributes to code “smell,” a subjective and unquantifiable sense of badness. Although not a problem in and of itself, code smell can be an early warning indicator of poorly written or bad quality code. Elements that contribute creating code smell include poor housekeeping and organization, and code clutter ( see Figure 2 ). For example, blocks of duplicated code are highly discouraged as changes to that code can sometime fail to propagate to each of the copies. This occurs often enough that many modern programming environments will display a warning, much like when Microsoft Word suggests spelling or grammar changes, when it finds repeated blocks of code. Along these lines, “ghost” code, i.e., code that is commented out and never executed, should also be avoided. It creates ambiguity by making it difficult for someone reviewing the code to know whether the commented code contributed to the analysis or was used to generate results prior to being commented out. Because of this, dead code should be removed or replaced by a control structure like an if- or switch-statement. Well written code is easier to read, easier to develop and maintain, and easier to review. All this works to increase confidence that the code is functioning as intended and contributes to the overall goal of code QA. The second part of code QA is to create written documentation that describes the project, its purpose and goals, and key analytic decisions. It should include a description of input data, generated outputs, and maybe a few simple examples. Although not set in stone, the analytic methods and data workflow should be documented as well. The documentation should contain enough detail about the project that an external reviewer immediately understands the goals and methods used. Documentation also includes a few housekeeping items such as specifying the software license and choosing a method for package and software version management. This ensures the proper setup for the code to run. Documentation can exist as files separate from the code or within the code itself and both sources have critical roles in QA process. An important source of documentation capture is the supporting

README file ( see Figure 3 ). This file is the first place someone will look before running or reviewing your project. It should give a brief overview, installation instructions, directions for running the code, and perhaps a few examples. It may also be helpful to maintain a document that tracks key analytic decisions and along with a description of the intended analytical steps. This file can be stored alongside the

README file. Alternatively, some code version systems, discussed later, include Wikis for this purpose. Lastly, there should be a

LICENSE file that clearly states which software licensing rules apply to the code. This is especially important for projects that will be released open-source or commercialized. Licensing is a complicated subject and outside the scope of this paper. Figure 3. Example

README

File

Another critical component of the documentation is the version number of all software packages and libraries used in the analysis. This is important as software changes over time and there are no guarantees that the function used today will return the same result in the future. Furthermore, software can change in ways that break code, say by adding a dependency that previously didn’t exist or deprecating, or removing, a function. To ensure that this does not happen, it important to document the software exact version number used in the project. This could be maintained in a separate file or possibly the README file. The code itself also should contain a good amount of documentation. Documentation is usually in the form of code comments ( see Figure 4 ). This is English text included in the code files that is not visible to the computer or program running the code. It allows the programmer to add a description of the function of the file and is helpful to anyone trying to read and understand how the code works. Style guides offer suggestion for how detailed this documentation should be. For example, it is often considered best practice to include a large comment section at the front of a function declaration that briefly explains the purpose of the function and typically includes a description of the expected inputs and outputs. Figure 4. Example of commented code In-line comments are text that appear throughout the code that explain in words what is being done. Comments are written for people reading the code, which may include the original programmer who could be returning to the project after a long break. Comments should be used sparingly to clarify certain parts of the code that may not be obvious when reading the implementation. Good documentation goes beyond just explain the code functioning but also explains the motivation for the actions being described. For example, when modifying data in a table, one could write:

As the code used for a project develops, it is important to have a mechanism and workflow that can track changes to the code. We do not advocate manual tracking. Rather best practice calls for the use an automated version tracking system, like Git. Version tracking systems maintain logs of all changes made to the code, creating a record what was deleted or added. They also track additional information such as the date of the change and the name of the person who made the change. All modifications are tracible to the source and can be easily undone. This just scratches the surface of what a modern version control system can do. Much can also be said about the best way to tack code changes. In fact, books have been written on the subject. Therefore, we will only cover the major points in this paper. First, versioning systems keep track of every change made, maintaining an historic record of the evolution of the code base. Many versioning systems have features that allow one to add comments alongside code changes, making it an effective way to understand the reason for the change. When working with multiple collaborators it will be desirable to use distributed version tracking system. These systems store a copy of the code in a central location that is accessible by everyone working on the project. Collaborators will push any changes they make to the code to this server for everyone to see. Individuals will work on their local copy. To get the latest changes each collaborator will pull from the central code base to receive the latest changes and updates. Because collaborators may end up modifying the same section of code, the versioning system also has a way to resolve conflicts . Versioning systems are very convenient way to collaborate on the same code. Once a versioning system has been chosen, it is highly recommended to develop a workflow. For example, many versioning systems will allow the code to be effectively copied and modified separately from the main body of code. This is convenient for organizing and developing individual features or improvements from to the code. It allows those changes to be done in isolation, without affecting other users, until those changes are ready to be merged , or reincorporated, into the main body of code. To help avoid potential problems and to assist with QA, there are several suggested workflows. One popular workflow is Git Flow. which proposes that all work, such as bug fixes, new features, and code improvements, be done in separate branches or copies of the code until completed and tested. At that point, the code is “merged” or reintegrated with the main branch. The advantage of this approach is that the main branch is in a perpetual “good” state, meaning it only contains code that has been tested and reviewed. This workflow works well for scientific computing, which is why we suggest it. However, there have been some critiques of the technique. For example, long-running feature branches can lead to integration problems and effectively reduce communication by limiting code sharing. One promising alternative is GitHub Flow. Seemingly inspired from Git Flow, it is a less rigorous process of branching and merging that works well in settings like scientific computing where there are typically fewer developers and less emphasis on the ability to rollback or track code version numbers.

Versioning systems track changes to files by doing a line-by-line comparison. As such, they do not handle binary file types like Microsoft Office or PDF documents well. However, good QA includes version control for both the code and its documentation. Hence, documentation should be written if flat text files, using markdown for styling, and its version should be managed alongside the code. Similarly, data should not be put in the versioning systems. This is especially true if the data contains sensitive information and must be protected under the data use agreement. Intermediary data formats and final results can always be regenerated from the code and the original data source files. If data needs to be shared among collaborators, then an appropriate file sharing system should be used for the data. Typically, versioning also involves some sort of numbering system. These version numbers indicate the progression of the code and incorporate a system to separating minor and major revisions. For example, the first, production ready version of the code might be tagged 1.0.0. Small fixes to the code might cause an increment to the right most number, changing the version to 1.0.1. Minor changes would increment the middle number, changing the version to 1.1.0. Finally, major changes increment the left most number, changing the version to 2.0.0.

Versioning systems do a great job of tracking each little change to the code. However, for a more holistic view, major changes and updates to the code should be noted in a text file that is a component of the documentation, called the

CHANGELOG . This file tells the users of your code what you changed and how it affects their use of your code. This is a top-level file that exists alongside the README and is typically a first stop for anyone who wants to know what’s new or different. This typically occurs when enough changes have been made that you will want to increase the version number.

A lot of the focus has been on code; however, proper data management is also an essential component of assuring results are accurate and reproducible. A key element of code QA is the idea that every piece of analysis can be tracked back to its origins in the raw data and that everything can be traced from the input to the code to the output. This can sometimes be tricky if there are multiple processing steps; for example, when data needs to be cleaned and merged or when multiple models are used in series or parallel to generate results. Whatever the workflow, the entire process ultimately depends on knowing exactly what data was used to generate the final results. Because the input data is an essential element, it is important that it is also well managed. Best practices include storing all raw data in a read-only format or location. This means the data cannot be modified directly by anyone on the team. Data cleaning, augmentation and transformation is performed on a copy of the data and produces intermediary files. As part of the analytic workflow, data preparation is also part of code QA, meaning it should follow all the practices outlined in this document. Hand editing or using GUI tools to modify data could introduce errors and are not easily reproducible. Therefore, this should be avoided and all changes to the data should be made via scripted code. As with analytic code, code used for data cleaning and preparation should produce the same result every time. It should also be managed following the QA processes described in Steps 1 to 3. A Digital Object Identifier (DOI) should be obtained for the raw data. This is accomplished by submitting your data to a DOI Registration Agency, typically a third-party provider, that validates the content of the supplied data and metadata. Journal and publications may be able to help in selecting an appropriate repository or may provide guidelines for archiving data. The DOI is a unique, alphanumeric string that can be used to validate that the raw data is the same and has not been modified. In this way, others can be assured that they have access to the exact data used in the research, reducing the risk the results fail to reproduce due to data errors. Of course, special consideration should be given to datasets containing sensitive information, such as those with personally identifiable information (PII). Not all repositories have mechanisms for handling this type of data. Be sure to consult the FAQ or verify with the service before submitting your data.

Testing is an essential element of code QA. It is composed of two parts. The first part is unit testing in which individual elements of the code are tested. These are usually small tests used to validate a segment of code, for example a single function, always returns the same result every time it is run (for stochastic methods, the seed of the random number generator will help with this). The goal is to create unit tests for as much of the code as possible. Programming tools can help compute metrics like code coverage which indicates how much of the code is being tested. A good rule of thumb is to create a unit test for each bug, after it is fixed. The second part is test automation and continuous integration. This means that the unit tests are run regularly, for example each night, on the code. Errors that occur during testing are logged and flagged for the further investigation and possibly repair. Units test should also be run whenever a new change is introduced. For example, if a programmer has recently updated a key function, it is a good idea to run all the unit tests to make sure the change has not inadvertently broken another part of code. The idea is to continuously monitor the health of the code base. Finally, a code QA plan should be developed that outlines the policies and procedures for validating the functionality of the code. This includes independent testing of the code to minimize the errors that inherently occur when developing code and reviewing code to ensure it is functioning as described in the analysis plan and supporting documentation. Furthermore, reviews can result in code performance improvements and other optimizations. Thus, a code QA plan should include a provision for code review, where each line of code is seen by at least two eyes. One way to accomplish this, though often not feasible, is pair programming. In pair programming, two programmers sit side-by-side as they develop the code. Typically, one is at the helm, manning the keyboard, while the other is double checking their work, offering suggestions, or researching solutions. Another way to provide code review is through continuous QA, where code written by one programmer is reviewed by a different programmer prior to integration with the main code base. This ensures a second set of eyes have checked the code and provides a mechanism to continuously monitor the quality of the code base. This should not be confused with continuous integration, which was already discussed. Alternatively, a project can utilize an independent code review by someone outside the team (ideally as early as possible and more than just once) to check for errors and bugs and to ensure the code reproduces the findings from the study. If one of the end products of the research is a new methodology or tool to implement statistical analyses (e.g., a new R/Python package, SAS/Stata macro, or web/mobile application), additional beta-testing should be done to assess usability concerns. Specifically, a team should conduct a “heuristic evaluation of usability.” For heuristic evaluations, a sample of 6 users is enough to identify at least 85% of the problems associated with the website and newly developed analysis tools. Thus, we recommend new statistical software should undergoing three waves of beta testing by six researchers external to the team (two per wave of testing) and from a range of backgrounds. In addition, code developers may wish to provide the beta-testers with a tutorial or instruction on the tool’s use along with test data if necessary. However, each beta-tester should be encouraged to use an additional, independent dataset for their review. While vitally important, beta-testing of new software (particularly new R packages) is infrequently done and, as a result, it is not uncommon for developers of R packages to find initial users reporting back many challenges, errors, and bugs when using a new package. Conclusion

Increasingly complex data and analyses fuel current research and data and analysis only continue to grow in complexity. Data management and preparation and analyses rely on computers which must function appropriately to yield accurate results. Moreover, the code documents the analyses and allows for other to replicate them. Failures of such reproducibility and errors in results have been documented in multiple research areas and can have significant negative impact. As such, there is a clear need for code used in our research and methods development to be fully transparent so our collaborators can easily find errors and external researchers and peers can clearly see the details of the analysis (both in order to replicate the work and to ensure it is error-free). These transparency requirements exist across all disciplines, yet adoption of best practices for code transparency is still not nearly as widespread as it should be, likely because there is a lack of clear guidance on the best practices for developing code in ways that ensure quality and transparency and insufficient training for researchers and developers who create the code. This paper provides detailed steps to the code QA process that promote openness (providing others with the data and code used in a scientific analysis) and transparency (requiring the inclusion of additional information, such as detailed explanations of data workflows). We outlined five steps for promoting code QA. Although these steps should be followed throughout the lifecycle of the project, it is important to remember that they each requires some amount of planning. Before any code is written guides should be agreed upon, documentation should begin, a version system should be selected, data management practices should be outlined, and a testing plan should be developed. By starting early and through consistent application of these steps it is possible to create higher quality code that will be easier to validate (e.g., via peer review) and reproduce. It is not only the responsibility of research teams to implement code QA. Institutions and journals can greatly help by creating institutional practices and journal guidelines that include detail code QA practice. Funding agencies should also require such QA processes and pay for such work to be part of the project budgets. Although corrective measures like “journals enforcing standards” and “more time checking notebooks” tend to be viewed as less favorable by researchers, we believe such actions could help improve the quality of research. As noted in the introduction, a notable percentage of research is not following such practices (e.g., almost 20% of an analysis of 423 PubMed articles revealed that has “analytic errors” related to code and data). We imagine the percentage of teams not using code QA is much higher. Making data accessible and providing access to the code could help restore confidence in scientific computing, along with enforcement standards and formalized testing and QA prior to publication. Transparency is key to addressing the biggest problems, like exposing p-hacking, and the smaller issues, like errors in data management or analysis. However, it is not the only tool we should look to for help. Thorough and complete documentation is also necessary. This ensures that data and code will run and run the same way each time the analysis is repeated. Finally, the code QA process we described tacitly assumes all analysis will be conducted programmed or scripted, i.e., the analysis steps exist as written instructions in text are read and executed by a machine to yield results. The process excludes point-and-click analysis conducted with GUI style tools. Because these types of tools are more difficult to track, we suggest they should be avoided or that more work is needed to enable better tracking of their use from a code QA perspective. We appreciate that GUI tools sometimes are used to make advanced analysis more accessible to a wider audience of non-programmers. In such cases, careful documentation of the data processing and analysis steps (e.g., recording of macros generated and run in the background by the GUI, if possible) will help with QA. References

1. Enserink, M. (2017). How to avoid the stigma of a retracted paper? Don’t call it a retraction.

Science . https://doi.org/10.1126/science.aan693 2. Eggertson, L. (2010). Lancet retracts 12-year-old article linking autism to MMR vaccines.

Canadian Medical Association Journal , 182(4), E199–E200. https://doi.org/10.1503/cmaj.109-3179 3. Casadevall, A., Steen, R. G., & Fang, F. C. (2014). Sources of error in the retracted scientific literature.

The FASEB Journal , 28(9), 3847–3855. https://doi.org/10.1096/fj.14-256735 4. Palus, S. (2018). Make Research Reproducible.

Scientific American , 319(4), 56–59. https://doi.org/10.1038/scientificamerican1018-5 5. (2015). Estimating the reproducibility of psychological science.

Science , 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716 6. Begley, C.G., Ellis, L,M. (2012), Raise standards for preclinical cancer research.

Nature , JAMA , 294(2), 218. https://doi.org/10.1001/jama.294.2.218 8. Baker, M. (2016). 1,500 scientists lift the lid on reproducibility.

Nature , 533(7604), 452–454. https://doi.org/10.1038/533452a 9. Nuzzo, R. (2014). Scientific method: Statistical errors.

Nature , 506(7487), 150–152. https://doi.org/10.1038/506150a 10. . Boulton, G. (2016). International accord on open data.

Nature , 530(7590), 281–281. https://doi.org/10.1038/530281c 11. Nature Research. (2020).

Reporting standards and availability of data, materials, code and protocols . Springer Nature Limited. 12. American Economic Association. (2020)

Data and Code Availability Policy . 13. Chen, X., Dallmeier-Tiessen, S., Dasler, R., Feger, S., Fokianos, P., Gonzalez, J. B., Hirvonsalo, H., Kousidis, D., Lavasa, A., Mele, S., Rodriguez, D. R., Šimko, T., Smith, T., Trisovic, A., Trzcinska, A., Tsanaktsidis, I., Zimmermann, M., Cranmer, K., Heinrich, L., … Neubert, S. (2018). Open is not enough. Nature Physics, 15(2), 113–119. https://doi.org/10.1038/s41567-018-0342-2 14. Google Style Guides. Google’s R Style Guide. no date listed. 15. Martin, R,C. (2009),

Clean Code: A Handbook of Agile Software Craftsmanship.

Pearson Education. 16. Fowler, M., Beck, K., Brant v., Opdyke, W., Roberts, D. (1999),

Refactoring: Improving the Design of Existing Code.

Addison-Wesley Professional (1 st ed.). 17. Loeliger, J., McCullough, M (2012), Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development,

O'Reilly Media, Inc. 18. Driessen, V. (2020), “A successful Git branching model. nvie.com Thoughts and writings by Vincent Driessen.,”

Hot Jar. https://nvie.com/posts/a-successful-git-branching-model/ 19. Hilton, R. (2017),

A Branching Strategy Simpler than GitFlow: Three-Flow21