[PDF] Exchanging Best Practices and Tools for Supporting Computational and Data-Intensive Research, The Xpert Network

Abstract

We present best practices and tools for professionals who support computational and data intensive (CDI) research projects. The practices resulted from an initiative that brings together national projects and university teams that include individual or groups of such professionals. We focus particularly on practices that differ from those in a general software engineering context. The paper also describes the initiative , the Xpert Network , where participants exchange successes, challenges, and general information about their activities, leading to increased productivity, efficiency, and coordination in the ever growing community of scientists that use computational and data-intensive research methods.

Full PDF

EExchanging Best Practices and Tools for SupportingComputational and Data-Intensive Research

The Xpert Network

Parinaz Barakhshan

University of DelawareNewark, Delaware, [email protected]

Rudolf Eigenmann

University of DelawareNewark, Delaware, [email protected]

ABSTRACT

We present best practices and tools for professionals who sup-port computational and data-intensive (CDI) research projects. Thepractices resulted from an initiative that brings together nationalprojects and university teams that include individual or groups ofsuch professionals. We focus particularly on practices that differfrom those in a general software engineering context. The paperalso describes the initiative – the Xpert Network – where partic-ipants exchange successes, challenges, and general informationabout their activities, leading to increased productivity, efficiency,and coordination in the ever-growing community of scientists thatuse computational and data-intensive research methods.

The motivation of this paper, and its underlying project, was thatthere appears to be little exchange of information between theincreasing number of support groups for computational and data-intensive (CDI) research, which are of critical importance, as laidout below. In particular, knowledge of best practices is usually ac-quired by "training on the job" and not communicated to otherprojects. What’s more, many domain science projects, by budgetaryrealities, include individual computational experts, who have lit-tle to no access to peers and need to self train on such practices.The fact that the practices exhibit differences from those in a gen-eral software engineering context makes their compilation anddissemination especially relevant. Filling this gap is the key oppor-tunity and contribution of the present paper. We describe initialoutcomes of an initiative that brings together CDI support profes-sionals, which we refer to as CDI experts or, short, Xperts. Thereare many related terms, such as research software engineers (RSEs),research facilitators, or research programmers. While one can argueabout differences, in this paper we consider the terms essentiallysynonymous.

The importance of CDI research is well evidenced. A relentlesslygrowing amount of computing power is consumed to conduct re-search using computational experimentation. Nobel prizes havebeen awarded to CDI science breakthroughs [18].

Laszewski et al. have shown that CDI work correlates with higher scientific im-pact [89].

Apon et al. have found that Universities investing in com-putational infrastructure supporting CDI research tend to increasein ranking [4]. Supporting such CDI research by Xperts is critical, allowingdomain scientists to focus on their research tasks. Recognizingthis need, several large projects funded by the National ScienceFoundation (NSF) [62], such as XSEDE [97], CyVerse [17], and theNSF Software Institutes, support the community through groupsof Xperts. There are also a growing number of university IT de-partments, or research computing support groups, that includeXperts for boosting the productivity of researchers on their cam-puses. Furthermore, many large domain-science projects includecomputational experts for supporting their own applications. Theeffectiveness of such computational support is remarkable. A studyby

XSEDE’s ECSS group [23] has revealed that time invested bycomputational experts can have a four-fold return in terms of timea domain scientist would spend on the same tasks.

The Xpert Network [75] is an NSF-supported, community-servinginitiative that offers • Monthly online meetings (webinars and panels) to exchangebest practices applied, tools used, and open problems facedin computational and data-intensive (CDI) research, • Face-To-Face Meetings at major conferences, such as PEARC(Practice and Experience in Advanced Research Computing),ICS (International Conference on Supercomputing), and SC(Supercomputing) for in-depth discussions, • A website (sites.udel.edu/xpert-cdi) with access to recordingsand reports of past events, a calendar of upcoming events,and community announcements.Under preparation are also • A discussion platform for community communication aroundthe clock, and • An exchange program that supports participants visitingother participants.The ultimate goal of the initiative is to advance science by in-creasing the productivity of researchers who use CDI methods forpushing research frontiers. To this end, several communities areinvited to participate, including(i) researchers developing and using CDI applications, (ii) thosewho assist these researchers with expertise in CDI technology andmethods (Xperts, Facilitators, RSEs), (iii) domain experts/scientistson university Campuses and other research organizations, and(iv) developers of tools that support the creation and use of CDIapplications. These groups are invited to: a r X i v : . [ c s . C Y ] F e b Share success stories about how Xperts have assisted domainresearchers, • Present open problems and challenges faced when support-ing CDI research, • Exchange best practices that are being applied in their orga-nization or in their research, • Present tools and reporting on the use of tools that have madea substantial difference in supporting the work of Xpertsand domain researchers, • Coordinate activities that are of mutual interest.

Two specific outcomes of the discussions are the creation of • A best-practices guide for computational and data-intensiveresearch, and • A catalog of tools that support CDI research and engineering.They will serve as training material for new Xperts joining CDIsupport teams, for instructional events, and for university curricula.The present paper can be seen as a starting point for such educa-tional material. This material is related to the large volume of bestpractices for software engineering and scientific programming thatexists, but differs in the following significant ways.While training material for general software engineering andscientific programming cover important skills that

Xperts musthave, we are looking for best practices specific to professionals that support domain science teams . We focus on information that is notcommonly found in software engineering and scientific program-ming literature, or on advice that is unexpected. We also discussskills that our participants have called out as especially relevantand those that need to be applied differently in a

CDI context . Thesources of this information are the

Xpert Network webinars thatwe have conducted thus far and three in-person events: the

ICS19workshop , and the

PEARC19 and

SC19 Birds-of-a-feather sessions, as well as our past work on several interdisciplinary projects withdomain scientists.Another important distinction of our approach is the inclusionof tools. Best practices need to be supported by tools that followthe underlying methodologies. Good tools for scientific softwaredevelopment can boost the productivity of domain science researchas much as the best practices themselves.The remainder of the paper is organized as follows. Section 2describes the best practices that have emerged in the Xpert Networkactivities so far. Section 3 focuses on tools that have proven theirvalue in accelerating CDI research and the work of Xperts. Section 4describes related efforts and creating synergy among them, followedby conclusions in Section 5.

This section identifies best practices for CDI research involvingapplications that are compute intensive (spending most executiontime on computational algorithms) and/or data intensive (manipu-lating large volumes of data). Two broad categories of best practicesfor supporting such research have emerged in the Xpert Networkactivities: (i) best practices related to project collaborations and(ii) best practices for software development. As mentioned in theintroduction, the second category is covered well in the software

Best Practices for Collaborations: (1)

Diversity of Xpert Backgrounds-

Be aware ofdifferent backgrounds Xperts may bring into ateam; configure training so that less-familiar bestpractices can be acquired as needed.(2)

Understanding the Academic Environment-

Be aware of the academic reward system and activ-ities to increase academic standing.(3)

Breadth of Xpert Skills Needed-

Prepare forskills needed beyond your current expertise, bynetworking with other Xperts.(4)

Collaborative Assistance-

Help propel newprojects though short-term, close collaborationwith domain scientists.(5)

Overcoming the Terminology Gap-

Carefullyidentify and resolve terminology gaps. Keep vocab-ulary to the essentials. Explain using many exam-ples.

Best Practices for Software Development: (6)

Developing a Project Plan-

Devote substantialtime to understanding the domain problem, turn-ing possibly vague ideas into a feasible solutionapproach, and developing a project plan.(7)

Prioritize Functional Requirements-

Carefullyvet all requirements by the application’s end usersand prioritize aggressively.(8)

Issue Tracking-

Track origin as well as implemen-tation status of requirements and bug reports.(9)

Source Code Management-

Make use of sourcecode management and version control systems totrack you software’s evolution.(10)

Code Review-

Xpert and domain scientist shouldreview each other’s software written.(11)

Software Testing-

Define test cases that the ap-plication and its components must pass before youbegin their implementation.(12)

Documentation-

Document your project to en-sure long-term success, reproducibility, and obtainproper credit for your work.(13)

Continuous Integration-

Integrate new soft-ware updates frequently into the application ver-sion seen by the end users in their end environ-ments.(14)

Reproducibility-

Enable reproducibility andtransparency by capturing data and software under-lying scientific processes, using available softwareplatforms.(15)

Parallelization-

Write serial code first, then par-allelize.Summary of Best Practicesengineering literature and in organizations focused on training,such as

Software Carpentry [80]. his paper considers only those software development practicesthat have been described by participants as different or particularlyimportant for CDI research. Of particular interest are practices thatare actionable ; we will highlight the recommended actions in eachpractice. In some cases, the action lies in being aware of something.The following subsection will begin with this category. Be aware of different backgrounds Xperts may bring into ateam; configure training so that less-familiar best practicescan be acquired as needed.

Members of

Xpert groups may comefrom domain sciences, from computer science backgrounds, orfrom environments where they learned to assist

CDI science teamsthrough training on the job. For example, working with Unix com-mands, parallel programming models, and version control maybe second nature to members with computer science background,whereas these very skills may be critical best practices to acquireby

Xperts with domain-science background. Vice versa,

Xperts whohave a domain degree are often well aware of the terminology gapdiscussed in Section 2.5, which facilitates the communication withnew science collaborators [24]. Training for Xperts should accom-modate these differences, allowing the trainees to focus on bestpractices they are not familiar with.

Be aware of the academic reward system and activities to in-crease academic standing.

An issue for

Xperts that is essentiallyabsent in a general software engineering context is the importanceof understanding the academic environment. This issue is espe-cially relevant for software engineers with background in industry,where hierarchical organization, and the overriding goal of creatinga reliable product as rapidly as possible, are the norm. Understand-ing the academic reward system and the many side activities thatresearchers may get engaged in to maintain the academic standingof the science team [60, 61] will influence project decisions. In fact,some of the best practices need to be understood from this viewpoint, such as the issues of documentation and testing, mentionedin Sections § 2.12 § 2.11.

Prepare for skills needed beyond your current expertise, bynetworking with other Xperts.

Modern CDI applications drawfrom a broad range of technologies, such as computing paradigms,programming languages, architectures, and algorithms. What’smore, this range keeps evolving. For example, new application maydemand expertise of machine learning techniques. Individual Xpertsand those in small teams, serving a large CDI research community,will unlikely cover the needed skills. Besides, they may need toperform tasks across the entire software life cycle and be involvedin project management. It is important to be able to recognizesuch situations and seek external advice. Maintaining contacts withother Xpert teams, who may be consulted when needed, is highlyadvisable.

Help propel new projects though short-term, close collabo-ration with domain scientists.

A recommended form of interac-tion between Xperts and CDI domain scientists is through collabo-rative assistance . For a period of one to several months, Xperts workside-by-side (physically or virtually) with the domain scientistswhose project they support. During the collaboration, the Xperttakes care of computer-engineering issues that arise, while the do-main researcher resolves the problems requiring application scienceexpertise. This form of joint work allows issues that fall into thecompetency of the collaborator to be addressed immediately, signif-icantly reducing, or even avoiding, the need for domain researchersand Xperts to get trained on each others’ knowledge and skills atproject begin. Over the course of the joint work, the collaboratorstend to pick up sufficient knowledge of each others’ skills and ter-minology. In particular, the domain scientist becomes familiar withthe software development processes and tools, allowing them tocontinue the work independently. Vice versa, the

Xpert will haveacquired sufficient knowledge of the domain problem, allowingthem to effectively provide help remotely, even after moving on toother projects. This model has been pioneered by

XSEDE’s ECSSgroup and was applied successfully by other Xpert teams [24].

Carefully identify and resolve terminology gaps. Keep vo-cabulary to the essentials. Explain using many examples.

Thegap between computer-science and domain-science lingo is an of-ten mentioned issue in Xpert Network discussions. The challengecan be big if the same term is used by both collaborators, but withdifferent meanings. Participants reported significant confusions andeven incorrect project executions due to this challenge. Awarenessof the issue and patience in trying to understand the collaborators’viewpoints are critical. It was pointed out that the gap is widerthan in a general software engineering context, when trying tounderstand a customer, as science terminology tends to use rich vo-cabularies. Keeping the vocabularies to the essentials and investingtime to explain new terms is key to successful collaboration. Usingmany examples and frequent feedback from both sides will helpbridge this gap.

Devote substantial time to understanding the domain prob-lem, turning possibly vague ideas into a feasible solution ap-proach, and developing a project plan.

Many Xpert Networkparticipants highlighted the importance of the first contact withthe supported domain scientists and the approach taken to under-stand problems and develop solutions. In addition to awarenessof the terminology gap, the relevance of investing time in under-standing the goals and the functional as well as non-functionalrequirements was emphasized. Showing patience in the process iscritical. The domain researcher needs to be helped to transform aninitially often vague idea into a concrete plan. Developing specific equirements for the computational application and the underlyingsystem is essential. Devote sufficient time. One challenge for theXpert is to introduce the researcher to the possibilities of the to-be-created or improved software implementation and the underlyingsystem while introducing only a small number of new terms. Rec-ommendations include the use of many concrete examples and an inverted pyramid approach. The latter describes a possible solutionin initially very few, high-level terms, which are then successivelyrefined in discussions.The situation is again related to that of software engineers withtheir customers, two important differences being:(1) the potentially large terminology gap, mentioned above, and(2) the collaborative assistance situation, which often enablesthe inverted pyramid approach: As Xpert and domain re-searcher work side by side, refinements of high-level ideascan more easily happen over time, permitting the collabo-rators to invest their initial effort in defining the tip of thepyramid.

Carefully vet all requirements by the application’s end usersand prioritize aggressively.

The dilemma of a large number ofdesirable features but only a short project duration can be big in sci-entific software. Xperts need to learn to tell the difference betweenessential and desirable requirements users may have. Essential fea-tures should be strictly prioritized. Scientific software is also specialin that requirements can change rapidly, as new insights of a re-search project emerge. Frequent re-assessment of the requirementsand priorities is needed [11].

Track origin as well as implementation status of require-ments and bug reports.

Issue tracking is important in most mid-size and large-size software development. Two points make it aparticular concern in scientific applications. The first is the beliefthat research projects – often of a three-year duration – will remainsmall and easy to oversee, obviating the need for issue tracking.The second is that developers (typically graduate students and post-doctoral researchers) tend to change often, losing important projectmemory. The reality is that, even within three years, rememberingwhat feature was requested by whom and with what rationale, canbe difficult. After a "change of guard", the same will become close toimpossible. What’s more, successful science applications may turnout to have a long life. Hence, being able to trace a feature request,its implementation status, and the reasons for accepting/rejectingthe request to its origin can be key. Furthermore, having a clearrecord of who made what request can be critical for re-assessingand re-prioritizing functional requirements, mentioned previously.Section § 3.4 describes issue tracking tools, which maintain a listof tasks and sub-tasks to be performed, avoiding duplicated effortsand enabling collaborative work. The tools support requirementtraceability, that is, the association of an application update withthe requirement that motivated it [91].

Make use of source code management and version controlsystems to track you software’s evolution. An Xpert with soft-ware engineering background is well aware of the importance ofsoftware version control. At the same time, this importance needsto be advertised to domain experts, as pointed out strongly by theXpert network participants.Many successful science software applications had their originin a "toy program" written by a graduate student. It got gradually ex-panded by several authors, caught the attention of a wide audience,and ended up becoming an important research tool. Without ver-sion tracking, the history (origin, authors, relationship of features,and specific extensions) may no longer be known, making it difficultto extend further and obtain needed documentation. Learning ver-sion control methods and tools will not only help overcome thesedifficulties; it will also increase the productivity of the softwaredeveloper as soon as the application exceeds the boundaries of asmall program. What’s more, source code management tools greatlyfacilitate collaborative software development [9], enable softwareroll-back to a previous, well-defined state, and help developers starta new branch of the software. The latter can be important if, say,two graduate students want to add their own, separate feature sets.Section § 3.3 will mention supporting tools.

Xpert and domain scientist should review each other’s soft-ware written.

Code review – a second person reading newly writ-ten software – is one of the best ways to catch errors and alsoto improve the code. Despite its effectiveness, it is often ignored,unless strictly mandated, and hence warrants mentioning here. Anumber of additional reasons make this practice particularly usefulfor an Xpert-domain scientist relationship: • The two collaborators have different backgrounds, increasingthe chance of detecting an issue that eluded the other. • The review is effective in verifying that requirement andimplementation match. • It is a good way of transferring the knowledge of what theXpert did to the domain scientist, who will continue thework after the collaboration completes. • The discussions happening during code review provide ex-cellent material for documenting the code and project.Code review helps improve the science as well as the code itself.Research software can benefit from code reviews, as much as in-dustrial software [57].

Define test cases that the application and its componentsmust pass before you begin their implementation.

Testingis another issue that is covered well in the software engineeringliterature. The topic has been mentioned often in our Xpert discus-sions, deserving a place in this list of best practices. Next to raisingawareness of the need and the benefit of testing, a key point is thattesting must not be an afterthought; one must avoid thinking oftest cases only when the software is close to completion. Instead, esting should be considered in the design phase. In the initial com-munication with the domain researcher, the Xpert needs to ask thequestion "what test cases should be passed by the tool or applica-tion, once it is completed?" Coming up with a thorough set of suchcases will not only facilitate the later testing phase; it will also helpclarify the specific capabilities that need to be implemented. Theimplementation of features that do not satisfy any test case, can bepostponed or even avoided altogether. Designing test cases can gohand in hand with functional prioritization (§ 2.7). Document your project to ensure long-term success, repro-ducibility, and obtain proper credit for your work.

Softwareengineering teachings cover extensively the fact that well doc-umented programs are essential for software maintenance andextensibility. Yet, academic software is notorious for the lack ofdocumentation, as pointed out forcefully by

Xpert Network partic-ipants. While raising awareness of the benefits is important, onealso has to understand that academic processes and reward systemsoften do not support spending time on documentation (§ 2.2). Mostacademic research funding pays for showing principles and devel-oping prototypes, not production-ready software. What’s more,students are under pressure to graduate, whereby the functionalityof the created software is more important than its documentation.If the software becomes successful, the original author often is nolonger involved and thus has little incentive to retrofit a properdescription. It is often the difficult task of the next generation ofstudents in a team to accomplish these tasks. It helps, however,to understand these relationships. If one can identify the originalcreator, they may be willing to help, especially when offering themproper credit or co-authorship on a forthcoming publication. Viceversa, properly documenting the original authorship will insurethat such credit can be given.Reproducibility (§ 2.14) of scientific research is a concern that iscurrently gaining attention. Proper software documentation can bekey to reproducibility [9].

Integrate new software updates frequently into the applica-tion version seen by the end users in their end environments.

Continuous Integration (CI) is generally good software engineeringadvice. It catches miscommunicated requirements early. CI also al-lows users to experience new features and provide feedback to thedeveloper early and often. The process may be combined with auto-mated builds and testing, ensuring that new functionality satisfiesand continues to satisfy defined test cases.CI was mentioned as particularly relevant for the work of Xperts,as scientific software tends to be a moving target with frequentchanges of requirements. Ensuring that new features are what theend user had in mind, conform to defined test cases, and do notbreak previous requirements is critical. Many teams find that thisapproach leads to significantly reduced integration problems andenables the development of cohesive software more rapidly.

Enable reproducibility and transparency by capturing dataand software underlying scientific processes, using availablesoftware platforms.

While the reproducibility of research resultsis a key requirement in any domain of science, there is recent,increased focus on this issue in computation-based research. Re-producibility was highlighted as a significant concern in our XpertNetwork discussions as well, pointing out that adding supportingsoftware and data to a publication can increase the value signif-icantly. This is of particular importance in large computationalstudies, where data analysis may play a central role in reachingthe conclusions. Disclosing the data and software underlying theresearch methods will add transparency.Making use of software platforms, such as GitHub [35], Git-Lab [37], and container environments, such as Docker [19], candramatically reduce the cost of capturing and describing the com-puting environment used to produce the scientific results.

Write serial code first, then parallelize.

The question if parallelcode should be written directly, versus creating serial code first,is an open one [54]. The Xpert Network participants were clear,however, in recommending the latter for creating Computationaland Data-Intensive (CDI) applications. Among the arguments werethat the benefits of lesser complexity of getting the serial codecorrect first, combined with better tool support for serial code,outweigh the negatives. The primary negative is that certain serialalgorithms are intrinsically hard to parallelize. When keeping thisissue in mind and selecting algorithms that are known to parallelizeand scale well, this negative can be overcome, however.

This section summarizes discussions of CDI application develop-ment tools in the Xpert Network workshops, webinars, and positionpapers. The importance of such tools for increasing the productiv-ity of both CDI domain scientists and those who assist them waspointed out repeatedly. Many of the best practices can be enhancedby supporting tools, for example, by identifying opportunities foroptimization and parallelization, automating performance analysisand testing, or by supporting the workflow, such as issue trackingand version control. Listed below are some of these tools that havehad an impact in developing CDI applications of Xpert Networkparticipants [24].

Many groups report good experiences using tools for managingproject tasks. Use of such tools help with visualizing the work,limits work-in-progress, helps teams establish order in their dailywork, and maximizes efficiency. These tools also help facilitatecommunication between groups of collaborators, such as betweendomain researchers and computational experts.

Tools:

Jira [8], Kanban boards [16]

Best practices supported by these tools:

Collaborative Assis-tance (§ 2.4), Developing a Project Plan (§ 2.6), Prioritize FunctionalRequirements (§ 2.7), Issue Tracking (§ 2.8) ools for CDI Application Development: (1) Project Management –

Jira [8], Kanbanboards [16](2)

Documentation –

Doxygen [20] (for C, C++, C

Source Code Management –

Git [34],GitHub [35], GitLab [37], Bitbucket [10],Mercurial [58].(4)

Issue Tracking –

Jira [8], Trello [86], GithubBoards [36], Asana [7](5)

System Build–

CMake [14], GNU Make [39](6)

Compiler Reports and Diagnostics –

Intel [48],Gnu [40], PGI [64]. Research Compilers: Cetus [79],Rose [77](7)

Debuggers –

GDB (GNU Project debugger) [73],Arm DDT [5](8)

Memory Debuggers –

Valgrind [88], AddressSan-itizer (ASan) [1](9)

Performance Analysis –

Intel Advisor [47], ARMMap [6], Tau [85], HPCtoolkit [22], mpiP [66](10)

Test Frameworks –

Reframe test framework [76](11)

Containers –

Docker [19], Singularity [84](12)

Cloud-based Development Environments –

Eclipse Che [69], Amazon Cloud9 [3], Gitpod [38],Codespaces [15](13)

Continuous Integration –

Travis CI [13], Git-Lab [37], Jenkins [50](14)

Profiling/Tracing –

GNU Project Profiler(GPROF) [70], TAU [83](15)

User interfaces to HPC resources – • Science Gateways [32] • Open OnDemand [63] • Rich desktop clients, such as the Eclipse ParallelTools Platform (PTP) [45] • Interactive applications, such as Jupyter [52],RStudio [78], JupyterHub [71], and Jupyter-Lab [52]Summary of Tools

The challenge of scientific software documentation was mentionedin Section § 2.12. Researchers tend to write comprehensive docu-mentation only when absolutely demanded by a collaborator orexternal user of the code. Tools can help overcome this issue.

Tools:

Tools that help in creating and automating software doc-umentation are Doxygen [20] ( for C, C++, CSharp, D, Fortran,Java, Perl, PHP, Python ), GhostDoc [33] ( for CSharp, Visual Basic,JavaScript ), and Javadoc [49] (Java). Among the tools that can beused for the sole purpose of publishing documentation, GitHub andGitHub Pages were mentioned.

Best practices supported by these tools:

Documentation(§ 2.12)

Tools for source code management and version control were de-scribed as fundamental to the development of CDI applications.At the same time, participants reported that some domain teamsused ad-hoc methods, instead. The following tools are especiallyimportant where multiple developers are continuously involved inchanging the source code.

Tools:

Git [34], GitHub [35], GitLab [37], and Bitbucket [10], Mer-curial [58]

Best practices supported by these tools:

Source Code Manage-ment and Version Control (§ 2.9)

Issue tracking systems manage and maintain lists of issues, record-ing implementation status and traceability to user requests. Thetools are generally used in collaborative settings but can also beemployed by individuals.

Tools:

Jira [8], Trello [86], Github Boards [36], and Asana [7]

Best practices supported by these tools:

Issue Tracking (§ 2.8)

System build tools support the generation of executables and othertranslated files from the program’s source code, and software pack-aging. [51]

Tools:

CMake [14], GNU Make [39]

Best practices supported by these tools:

Continuous Integra-tion (§ 2.13), Parallelization (§ 2.15)

Besides their code-generation functionality, compilation tools canbe crucial in providing reports about analyzed program character-istics and applied optimizations.

Tools:

Intel [48], GCC ( GNU Compiler Collection ) [40], PGI [64],LLVM [56]. Research Compilers: Cetus [79], Rose [77],

Best practices supported by these tools:

Parallelization (§ 2.15)

The following are among the debugging tools mentioned. Attendeesreported on the difficulty of finding convenient tools that assist indebugging parallel programs.

Tools:

GDB ( GNU Project debugger ) [73], Arm DDT [5]

Best practices supported by these tools:

Parallelization (§ 2.15)

Memory debuggers have been useful for tasks such as identifyingmemory leaks and buffer overflows, often related to allocation anddeallocation of dynamic memory.

Tools:

Valgrind [88], AddressSanitizer (ASan) [1]

Best practices supported by these tools:

Parallelization (§ 2.15)

Performance analysis tools have been essential for understanding,diagnosing, visualizing and resolving issues related to programperformance and scalability. ools: Intel Advisor [47], Arm Map [6], TAU [85], HPCtoolkit [22],mpiP [66]

Best practices supported by these tools:

Parallelization (§ 2.15)

Testing frameworks provide guidelines and rules for creating anddesigning test cases. They can provide essential support in increas-ing the efficiency of software testing.

Tools:

Reframe test framework [76]

Best practices supported by these tools:

Software Testing (§ 2.11)

Containerization helps preserve software environments and ad-dresses reproducibility. Containers can capture the full softwareenvironment of a computational method and enable portabilityto different platforms. It was mentioned that, while containers donot solve the issue of long-term preservation of software – dueto changing container versions, operating systems, and researchinfrastructures – they may deliver intermediate solutions [99].

Tools:

Docker [19] and Singularity [84]

Best practices supported by these tools:

Reproducibility (§ 2.14)

These tools help with collaborative software development, docu-mentation, debugging and testing. They provide convenience byproviding access via a browser, without the need for download andinstallation. Many open challenges were mentioned, however. Theyrelate to the support for real time co-development, co-debuggingproblems in the same session, and reproducibility. The cost of run-ning cloud-based services, security, privacy, and the complexity ofthe application building and execution process were also mentionedas concerns [98].

Tools:

Eclipse Che [69], Amazon Cloud9 [3], Gitpod [38], andCodespaces [15]

Best practices supported by these tools:

Collaborative Assis-tance (§ 2.4), Documentation (§ 2.12), Software Testing (§ 2.11)

Cloud-based access to tools for continuous integration, can signifi-cantly improve development productivity and reduce the amountof human involvement for routine setup and maintenance tasks. Inthis way, testing is triggered on cloud servers when commits arepushed to the source code repository [98].

Tools:

Travis CI [13], GitLab [37], and Jenkins [50]

Best practices supported by these tools:

Continuous Integra-tion (§ 2.13)

One of our surprise findings was that, even though profiling isa technology well known to software engineers, many domainscientists were insufficiently aware of it. After introducing them tothe basic concepts and features, they reported making significantprogress in analyzing and improving their programs.

Tools:

Beyond profiling functionality of basic tools, such as GPROF( GNU Project Profiler ) [70], the TAU [83] environment offers arich feature set for both profiling and program tracing to identify performance bottlenecks. TAU’s automatic instrumentation capa-bilities support programs written in many languages and parallelprogramming models.

Best practices supported by these tools:

Parallelization (§ 2.15)

Accessing HPC machines through means other than command lineinterfaces are key for broadening participation [2].

Tools:

Among the approaches are: • Science Gateways [32] are domain-specific web portals, oftenwith access to high-performance computing resources. • Open OnDemand [63] provides comprehensive web inter-faces with rich functionality for HPC and flexible customiza-tion for resource managers. • Rich desktop clients, such as the Eclipse Parallel Tools Plat-form (PTP) [45], provide multi-language development en-vironments. They include full support for multiple parallelparadigms including MPI, OpenMP, and OpenACC, as wellas synchronized projects for effective utilization of remoteHPC resources. PTP also provides rich customization forresource managers. • Interactive applications, such as Jupyter [52] and RStudio [78],and their web interfaces JupyterHub [71] and JupyterLab [52],provide remote access to HPC resources.

Best practices supported by these tools:

Parallelization (§ 2.15)A general concern expressed with many of these tools was thatimprovements are needed to achieve both ease of use for end usersand advanced functionality for power users [25].

A number of papers have recommended practices for scientificprogramming.

Wilson et al. [92, 93] have provided a set of "best practices" andalso "good enough practices" for scientific computing communityfrom the experiences of the thousands of people who have takenpart in Software Carpentry [31] and Data Carpentry [28] work-shops, and from a variety of other guides.

Dubois [21] has described some of his experiences to help writingscientific programs.

Heroux et al. [43] discuss practices used in the Trilinos [41] open-source software library project, some of which are close to practicesadvocated by the Agile software development community [55].

Naguib et al. [59] present an overview of two projects and de-scribes the software engineering methods they applied on thesecomputational science and engineering applications.

Wilson [31] talks about the “common core” of modern softwaredevelopment that is taught through software carpentry courses tohelp computational scientists meet the standards that experimentalscientists have taken for granted [90].

Riley [53] advocates bringing knowledge of useful software en-gineering practices to HPC scientific code developers. While notprescribing specific practices, she emphasizes adopting practicesthat help productivity . he Better Scientific Software (BSSw) [12] community is focusedon software development, and sustainability that leads to improvedsoftware for computational science and engineering (CSE) andrelated technical computing areas. The Working Towards Sustainable Software Science: Practice andExperiences (WSSSPE) [94] community meet frequently to discussthe challenges of software for science.

The Software Engineering for Computational Science (SE4Science) [81] community is working to understand the differences, necessi-ties, impacts, and barriers of applying general software engineeringpractices to research software So that scientific communities canadopt good software engineering practices.

The UK Software Sustainability Institute [46] has worked on fa-cilitating the advancement of software in research by cultivatingmore sustainable research software.

IDEAS-ECP [44] community addresses issues of software pro-ductivity and sustainability in the Exascale Computing Project(ECP) [68].

The UCAR [87] Software Engineering Assembly (SEA) [82] is a com-munity for software engineering professionals within UCAR, thataddresses the issue of effective software engineering throughoutUCAR.While these contributions provide important, general softwareengineering advice for science and engineering applications, the fo-cus in this paper was on practices that are specific to computationalXperts (a.k.a. Research Facilitators or Research Software Engineers)that support domain scientists. We reported best practices discussedby participants of the Xpert Network activities, which representboth large projects and individuals involved in such support roles.We focused on the best practices related to project collaborationsbetween Xperts and domain scientists, which is the essence ofsuch projects, and the best practices for software development thatare particularly important or should be applied differently in CDIresearch.It is worth mentioning that even for those practices that are incommon between CDI reserach and general software engineering,the priority and importance of the practices differ considering differ-ent scopes defined for the projects based on different environments(Academic Environment, or Industry) these projects are designedfor. While in academic environment a working prototype wouldbe enough to prove a point, in industry we are trying to make areliable production quality application.

Many CDI-related tools are introduced and taught through plat-forms, such as the Carpentries [26] and the HPC University [30].The Carpentries include Software Carpentry [31], Data Carpen-try [28], and HPC Carpentry [29]. While Data Carpentry aims toteach fundamental concepts, skills and tools for working more effec-tively with data, Software Carpentry has been teaching researchersgeneral computing skills and related tools. HPC Carpentry’s searchengine on the other side searches upon the resources of

HPC Uni-versity (HPCU) [30], and

Computational Science Education ReferenceDesk (CSERD) [27] to help researchers with a broad range of topicin HPC and computational science,including tools. Some of the tools that were brought up in the Xpert NetworkWebinars and panels are mentioned below:The

Extreme-scale Scientific Software Stack (E4S) [42] providesopen source software packages for developing, deploying and run-ning scientific applications on high-performance computing (HPC)platforms. E4S provides from-source builds and containers of abroad collection of HPC software packages.

TAU Performance System® [85] is a portable profiling and tracingtoolkit for performance analysis of parallel programs written inFortran, C, C++, UPC, Java, Python.

XDMoD Tool [95], which stands for XD Metrics on Demand,provides a wide range of metrics pertaining to resource utilizationand performance of high-performance computing (HPC) resources.

HPCToolkit [22] introduces an integrated suite of tools for mea-surement and analysis of program performance on computers rang-ing from multicore desktop systems to the nation’s largest super-computers. HPCToolkit supports measurement and analysis of se-rial codes, threaded codes (e.g. pthreads, OpenMP), MPI, and hybrid(MPI+threads) parallel codes.While there is overlap with the tools presented in the aboveprojects, this paper reported on tools that were discussed by partic-ipants of the Xpert Network activities as relevant to their work insupporting CDI domain science groups.

The Xpert Network is related to and collaborates with a number ofactivities that involve or facilitate idea exchanges among profession-als supporting computational and data-intensive (CDI) research.The

Research Software Engineer Association (US-RSE) [74] is aninitiative to create synergy among research software engineer indi-viduals and groups at US universities and other institutions. This isa recent initiative with aims that are similar to those of the XpertNetwork, and the two efforts seek to combine forces.The

CaRCC (Campus Research Computing Consortium) effort [67]also aims to engage a similar community as US-RSE, with the pro-fessionalization (professional development and advocacy) of thoseinvolved in campus research computing being a particular focus.CaRCC uses the term Research Facilitator, which is closely relatedto the terms Xpert and RSE.The

Virtual Residency [61] program focuses on training Xperts/RSEs/ Research Facilitators.The

XSEDE (The Extreme Science and Engineering Discovery Envi-ronment) [97] project includes two thrusts that support CDI appli-cation researchers. The

ECSS group (Extended Collaborative SupportService) [23] has a large number of computational and data expertswho work collaboratively with domain scientists in improving theirapplications. The

Campus Champions [96] program engages partic-ipants at many university campuses to raise awareness of XSEDEservices and help local users accelerate CDI research.The

CyVerse project [17], supporting data-driven discovery, in-cludes community support services similar to that of XSEDE’s ECSSgroup.Related support groups are also part of the NSF Software Insti-tutes, including the

Science Gateway Community Institute [72] and the Molecular Sciences Software Institute [65]. he Xpert Network exchanges ideas with all the above efforts.Best practices and toos identified in this paper may provide train-ing material for the mentioned projects. In addition to presentingbest practices, the agenda of monthly Xpert Network webinarsinclude discussions of coordination among the many involved ac-tivities, with the overall goal of increasing efficiency in the sciencecommunity.

This paper presented best practices and tools used by those whosupport domain researchers in creating, improving, and runningcomputational and data-intensive (CDI) applications. We refer tothese professionals as computational experts, research softwareengineers, research facilitators or, short, Xperts. The informationin this paper emerged from discussions and reports of the XpertNetwork project, also presented in the paper. While both the bestpractices and the tools have overlap with those presented in generalsoftware engineering courses and literature, we have emphasizedaspects in which the work of Xpert professionals differs. Further-more, the paper discussed related projects that support CDI researchand how to create synergy within this community.

ACKNOWLEDGMENTS

This work was supported in part by the National Science Foundationunder Award No. OAC-1833846.

REFERENCES [1] AddressSanitizer. 2021.

AddressSanitizer . https://clang.llvm.org/docs/AddressSanitizer.html[2] Jay Alameda. 2019. (Position Paper) Supporting Modern User Interfaces forHigh Performance Computing. https://sites.udel.edu/xpert-cdi/2019/08/06/ics-workshop/[3] Inc Amazon Web Services. 2020.

AWS Cloud9 . https://aws.amazon.com/cloud9/[4] Amy Apon, Stanley Ahalt, Vijay Dantuluri, Constantin Gurdgiev, Moez Limayem,Linh Ngo, and Michael Stealey. 2010. High Performance Computing Instrumen-tation and Research Productivity in U.S. Universities.

Journal of InformationTechnology Impact

10, 2 (Sept. 2010).[5] Arm. 2021.

ARM DDT

Arm MAP . https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-map[7] Asana. 2020.

Asana: Make more time for the work that matters most . https://asana.com/[8] Atlassian. 2020.

Jira

Bitbucket . https://bitbucket.org/product[11] Adam Brazier. 2015.

Best Practices for Software Development in the ResearchEnvironment

The Better Scientific Software (BSSw) . https://bssw.io/[13] Travis CI. 2020.

Travis CI: Test and Deploy with Confidence . https://jenkins.io/[14] CMake. 2020.

Buid with CMake . https://cmake.org/[15] Codespaces. 2020.

Codespaces . https://github.com/features/codespaces[16] Asana Company. 2020.

Asana-kanban boards . https://asana.com/uses/kanban-boards[17] CyVerse. 2020.

CyVerse: Transforming Science Through Data-Driven Discovery .https://cyverse.org/[18] C. Day. 2012. Nobel prizes for computational science [The Last Word].

Computingin Science & Engineering

14, 06 (nov 2012), 88. https://doi.org/10.1109/MCSE.2012.123[19] Docker. 2020.

Docker: Debug your app, not your environment

Doxygen

Computingin Science & Engineering

1, 1 (1999), 7–11.[22] ECP. 2020.

HPCToolkit . http://hpctoolkit.org/ [23] ECSS. 2020.

XSEDE Extended Collaborative Support Service

THE XPERT NET-WORK,Workshop on Best Practices and Tools for Computational and Data-IntensiveResearch . Report P-29. University Of Delaware. https://cpb-us-w2.wpmucdn.com/sites.udel.edu/dist/6/8980/files/2019/08/ICS-workshop-report-RE4.pdf[25] Rudolf Eigenmann, Bob Sinkovits, Ian A. Cosden, Henry Neeman, and Shel-ley Knuth. 2019. XPERT NETWORK BOF SESSION AT PEARC19 CONFER-ENCE. https://sites.udel.edu/xpert-cdi/2019/08/06/xpert-network-bof-session-at-pearc19-conference/[26] Carpentries Foundation. 2020.

The Carpentries . https://carpentries.org/[27] CSERD Foundation. 2020.

The Computational Science Education Reference Desk(CSERD) . http://shodor.org/cserd/[28] Data Carpentry Foundation. 2020.

Data Carpentry-Building communities teachinguniversal data literacy . https://datacarpentry.org/[29] HPC Carpentry Foundation. 2020.

HPC Carpentry- Teaching basic skills for high-performance computing . https://hpc-carpentry.github.io/[30] HPC University Foundation. 2020.

HPCU softwarecarpentry-Teaching basic lab skillsfor research computing . https://software-carpentry.org/[32] Science Gateways. 2016. science gateways . https://sciencegateways.org/[33] GhostDoc. 2020.

GhostDoc . https://submain.com/ghostdoc/[34] Git. 2020.

Git . https://git-scm.com/[35] GitHub. 2020.

GitHub . https://github.com/[36] GitHub. 2020.

GitHub project boards . https://help.github.com/en/github/managing-your-work-on-github/managing-project-boards[37] GitLab. 2020.

GitLab . https://about.gitlab.com/[38] Gitpod. 2020.

Gitpod

GNU Make

GCC, the GNU Compiler Collection . https://gcc.gnu.org/[41] M. A. Heroux. 2009.

Trilinos project Home Page . http://trilinos.sandia.gov[42] Michael A. Heroux. 2020.

E4S Project-The Extreme-scale Scientific Software Stack .https://e4s-project.github.io/[43] Michael A Heroux and James M Willenbring. 2009. Barely sufficient softwareengineering: 10 practices to improve your CSE software. In . IEEE, 15–21.[44] IDEAS-ECP. 2020.

IDEAS-ECP . https://ideas-productivity.org/ideas-ecp/[45] Eclipse Foundation Inc. 2020.

Eclipse PTP

UK Software Sustainability Institute

Intel® Advisor

Intel oneAPI Toolkits

Javadoc

Jenkins: Build great things at any scale . https://jenkins.io/[51] Julien Jorge. 2018.

An overview of build systems . https://medium.com/@julienjorge/an-overview-of-build-systems-mostly-for-c-projects-ac9931494444[52] Jupyter. 2020.

Jupyter . https://jupyter.org/[53] Argonne National Laboratory Katherine Riley. 2020.

Good Scientific ProcessRequires Software Engineering Practices . https://extremecomputingtraining.anl.gov/files/2016/08/Riley_935aug8_GoodScientific.pdf[54] David B. Kirk and Wen mei W. Hwu. 2013. Programming Massively ParallelProcessors. In

Programming Massively Parallel Processors, A Hands-on Approach,Second Edition .[55] Agile Software Development LLC. 2009.

Agile Software Development Home Page

The LLVM Compiler Infrastructure . https://llvm.org/[57] I. Barrass M. Alexandrakis. 2019. Poster: Code review tools beyond pull requests.In

RSEConUK 2019 .[58] Mercurial. 2020.

Mercurial, distributed source control management tool

Procedia Computer Science

1, 1 (2010),1505–1509.[60] Henry Neeman, Hussein M. Al-Azzawi, Dana Brunson, William Burke, Dirk Col-bry, Jeff T. Falgout, James W. Ferguson, Sandra Gesing, Joshua Gyllinsky, Christo-pher S. Simmons, and et al. 2019. Cultivating the Cyberinfrastructure Workforcevia an Intermediate/Advanced Virtual Residency Workshop. In

Proceedings of thePractice and Experience in Advanced Research Computing on Rise of the Machines(Learning) (Chicago, IL, USA) (PEARC ’19) . Association for Computing Machinery,New York, NY, USA, Article 79, 8 pages. https://doi.org/10.1145/3332186.3332204[61] Henry Neeman, Marisa Brazil, and Dana Brunson. 2019. (Position Paper)TheVirtual Residency: A Training Program for Research Computing Facilitators.https://sites.udel.edu/xpert-cdi/2019/08/06/ics-workshop/[62] NSF. 2020.

National Science Foundation

63] Open onDemand. 2020. open onDemand- Supercomputing. Seamlessly. Open,Interactive HPC Via the Web . https://openondemand.org/[64] PGI. 2020.

PGI Compilers & Tools mpiP . http://mpip.sourceforge.net/[67] CaRCC Project. 2019. Campus Research Computing Consortium,https://carcc.org/.[68] ECP project. 2020.

Exascale Computing Project (ECP)

Eclipse Che

GNU gprof . https://ftp.gnu.org/old-gnu/Manuals/gprof-2.9.1/html_mono/gprof.html[71] Jupyter project. 2020.

JupyterHub . https://jupyter.org/hub[72] SGCI Project. 2015. Science Gateways Community Institute,https://sciencegateways.org/.[73] The GNU Project. [n.d.].

GDB: The GNU Project Debugger

ReFrame . https://reframe-hpc.readthedocs.io/en/stable/[77] Rose. 2021.

ROSE Compiler-Program Analysis and Transformation . http://rosecompiler.org/[78] Rstudio. 2020.

Rstudio . https://rstudio.com/[79] Milind Kulkarni Rudolf Eigenmann, Samuel Midkiff. 2017.

Cetus-A Source-to-Source Compiler Infrastructure for C Programs . https://engineering.purdue.edu/Cetus/[80] Mohamed Sayeed, Hansang Bae, Yili Zheng, Brian Armstrong, Rudolf Eigenmann,and Faisal Saied. 2008. Measuring High-Performance Computing With RealApplications.

IEEE Computation in Science and Engineering

10, 4 (2008), 60–69.[81] SE4Science. 2020.

Software Engineering for Computational Science (SE4Science) .https://se4science.org/workshops/[82] SEA. 2020.

SOFTWARE ENGINEERING ASSEMBLY (SEA) . http://sea.ucar.edu[83] Sameer Shende and Allen D. Malony. 2019. (Position Paper) TAU PerformanceSystem:A performance evaluation tool for CDI researchers. https://sites.udel.edu/xpert-cdi/2019/08/06/ics-workshop/ [84] Sylabs. 2020.

SINGULARITY: SIMPLE, FAST, SECURE . https://sylabs.io/[85] TAU. 2020.

TAU-Tuning and Analysis Utilities

Trello lets you work more collaboratively and get more done . https://trello.com/[87] UCAR. 2020.

University Corporation for Atmospheric Research (UCAR)

Valgrind . https://valgrind.org/docs/manual/quick-start.html[89] G. von Laszewski, F. Wang, G. C. Fox, D. L. Hart, T. R. Furlani, R. L. DeLeon,and S. M. Gallo. 2015. Peer Comparison of XSEDE and NCAR Publication Data.In . 531–532. https://doi.org/10.1109/CLUSTER.2015.98[90] Greg Wilson. 2006. Software carpentry: getting scientists to write better code bymaking them more productive.

Computing in Science & Engineering

8, 6 (2006),66–69.[91] Greg Wilson. 2013. Software Carpentry: Lessons Learned. arXiv:1307.5448 [cs.GL][92] Greg Wilson, Dhavide A Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis,Richard T Guy, Steven HD Haddock, Kathryn D Huff, Ian M Mitchell, Mark DPlumbley, et al. 2014. Best practices for scientific computing.

PLoS biology

12, 1(2014).[93] Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt,and Tracy K Teal. 2017. Good enough practices in scientific computing.

PLoScomputational biology

13, 6 (2017).[94] WSSSPE. 2020.

Working Towards Sustainable Software Science: Practiceand Expe-riences (WSSSPE) . http://wssspe.researchcomputing.org.uk/[95] XDMoD. 2020.

XDMoD Tool . https://xdmod.osc.edu/[96] XSEDE. 2020.

Campus Champions

XSEDE – Extreme Science and Engineering Disvocery Environment