Building a Sustainable Structure for Research Software Engineering Activities
Jeremy Cohen, Daniel S. Katz, Michelle Barker, Robert Haines, Neil Chue Hong
BBuilding a Sustainable Structure for ResearchSoftware Engineering Activities
Jeremy Cohen ∗ , Daniel S. Katz † , Michelle Barker ‡ , Robert Haines § and Neil Chue Hong ¶∗ Department of Computing, Imperial College London, London, UKEmail: [email protected] † NCSA, CS, ECE, iSchool, University of Illinois Urbana-Champaign, Urbana, IL, USAEmail: [email protected] ‡ Australian Research Data Commons, Cairns, AustraliaEmail: [email protected] § Research IT, University of Manchester, Manchester, UKEmail: [email protected] ¶ Software Sustainability Institute, EPCC, University of Edinburgh, Edinburgh, UKEmail: [email protected]
A shortened two-page version of this paper was published by and ©2018 IEEE, as part of the 9th International Work-shop on Sustainable Software for Science: Practice and Experiences (WSSSPE6.1) – http://wssspe.researchcomputing.org.uk/wssspe6-1/, available at https://doi.org/10.1109/eScience.2018.00015.
Abstract
The profile of research software engineering has beengreatly enhanced by developments at institutions aroundthe world to form groups and communities that can sup-port effective, sustainable development of research soft-ware. We observe, however, that there is still a long wayto go to build a clear understanding about what approachesprovide the best support for research software developersin different contexts, and how such understanding canbe used to suggest more formal structures, models orframeworks that can help to further support the growth ofresearch software engineering. This paper sets out somepreliminary thoughts and proposes an initial high-levelmodel based on discussions between the authors aroundthe concept of a set of pillars representing key activitiesand processes that form the core structure of a successfulresearch software engineering offering.
Index Terms research software, software sustainability, repro-ducible research, research software engineering
I. I
NTRODUCTION
While researchers, including academic faculty and staff,postdocs, students, and those working in industry, have beenbuilding software to support their research for many decades,their primary goal is generally their research outputs, notthe software. There has, however, been significant growth inthe number of individuals who are interested in the devel-opment of the software itself and the process of workingwith researchers to help design and build quality, sustainablesoftware. This has led to the fairly recent emergence of theconcept of Research Software Engineering. Developed out ofdiscussions that took place at the UK Software SustainabilityInstitute’s [1] Collaborations Workshop in 2012 [2] [3], theconcept builds on the fact that developing research software requires increasingly advanced skill-sets that must be builtup over time by individuals who specialise in the process ofwriting code and the application of best practices to ensureits reliability and sustainability. Jim´enez et al. [4] provide anexample of four such best practices. While researchers can stillteach themselves to code and build up a base of knowledgethat enables them, for example, to start analysing their researchdata or developing user interfaces to support their users,advances in computing hardware and infrastructure, and vastincreases in data volumes, raise a number of significant chal-lenges in building research software. While the capabilities ofmodern computing hardware and new models of computation,such as remote cloud computing infrastructure or GPUs andFPGAs, present significant opportunities to researchers, theyalso present significant technical barriers.To take advantage of improvements in technology, and thespeed of change in the field, developers need a much moreadvanced set of knowledge, which takes longer to build andmaintain, in order to ensure that they can support researchrequirements. This is especially true in the case of developersworking independently. Software teams sharing knowledgeacross a group of developers may offer a more manageableway to sustain expertise and to better support specialisationin particular areas or techniques. The additional technicalcomplexity of larger projects and the time required to gain thenecessary technical expertise mean that developing some codealongside one’s research is becoming increasingly difficult todo well for all but the smallest projects. This has led to anew class of individuals, Research Software Engineers (RSEs),who generally have a research background but have chosen tofocus on the software development-related aspects of research.In addition to their knowledge of the research lifecyle, RSEsapply professional software engineering practices in a manner a r X i v : . [ c s . S E ] A ug uited to the research environment, following best practiceswith a view to developing better quality, more sustainable andmaintainable research software. The discussions that led tothe concept of RSEs emerging observed the special nature ofthe roles that these individuals hold, but also their challengesin trying to find approaches for career structures and careerprogression that could make these roles sustainable [5].Ensuring that these structures develop and that there issustainability for RSEs is still very much work-in-progress.Nonetheless, the profile of research software engineering hasbeen greatly enhanced by the activities occurring at institu-tions around the world to develop groups and communitiesthat can support more effective, sustainable development ofresearch software. This process was initiated in the UK withthe establishment of the first research software engineeringgroups in various academic institutions, providing a centralteam of RSEs to undertake software development work forresearchers within their local institution. The approach ofbuilding institutional research software engineering groups canoffer a team structure and scope for career progression, some-thing that is much more challenging for the lone “researcher-developer” who is based within, or leading, a research group.These developments have been followed by the emergence offormal initiatives in the UK to champion and advocate forresearch software engineering: the UK RSE Association [6]in 2013, the first EPSRC RSE Fellowships in 2015 and thefirst RSE Conference in 2016. This has led to the global riseof research software engineering activities including the firstinternational workshop for leaders, from across the world, ofsuch research software engineering groups and communities(e.g., NL-RSE [7] in the Netherlands, de-RSE [8] in Germanyand RSE-AU [9] in Australia) which took place in Londonin 2018 [10]. This was aimed at people running (or settingup) RSE groups and communities around the world, withparticipants from Europe, North and South America, Africaand Australasia attending to share experiences and start col-laborations. Representatives of the Moore-Sloane Data ScienceEnvironments, which are involved in the establishment of newresearch career structures for RSEs in the USA [11], alsoparticipated.The UK Research Software Engineer Network (RSEN)’s2017 State of the Nation Report [12] provides a background tothe development of research software engineering, as well as arange of statistics about the RSE role and community. To gaina better understanding about RSEs and answers to questionssuch as what they do, how they do it and how they view theirrole, a number of surveys of RSEs have now been carriedout in various countries [13], for example, in Germany [14],Australia and New Zealand [15], and the US [16].As research software engineering has grown as a concept, ithas become clear that there are a number of activities that arecommon between different offerings at different institutions.It is also clear to us that research software engineering is, orat least should be, about a lot more than individuals writingresearch software. In this paper we set out the basis for amodel that we are currently developing that defines a set of “pillars” which encapsulate the core activities that wefeel are crucial in ensuring sustainable, long-term support foreffective development of software for research. This paper isintended to stimulate further discussion around this area andto support the development of a further publication detailingthe next iteration of work defining a complete multi-pillarmodel. In Section II we highlight the core activities that we seeas underlying comprehensive research software engineeringsupport while Section III shows how these are brought togetherin our initial sustainable RSE framework model. Section IVpresents initial conclusions and suggests future work.II. C ORE ELEMENTS OF R ESEARCH S OFTWARE E NGINEERING
Based on our observations of the way research softwareengineering has developed over the past few years, we see aseries of activities that we believe contribute to an institution(or group of institutions) being able to provide an RSE offeringto its research community that is sustainable and manageableover the long-term. These activities can be combined intogroups covering specific areas which we define as the pillarsof research software engineering—the key structures aroundwhich a successful research software development capabilitycan be built. These pillars are: • software engineering, • community, • training, and • policy.Furthermore, it is our assertion that to be able to offercomprehensive research software support, activities from eachof these pillars must be provided. We now provide an overviewof the pillars, as highlighted in Figure 1: A. Software Engineering
Software engineering encompasses the process of buildingsoftware and the people who build it. The software aspectsof research software development can involve any of a widerange of common languages, as well as sometimes includingmore obscure research-focused languages or Domain SpecificLanguages (DSLs). The process of building software in aresearch environment is, however, somewhat different to thatwhich professional software engineers are likely to be familiarwith, and can be incompatible with standard software develop-ment models or processes. For example, software in scientificresearch is generally developed to solve a specific researchchallenge meaning that it is often built without thought forits longer-term, wider use or maintenance, as highlighted inthe work of Morris and Segal [17]. Building research softwarealso generally requires a lot more interaction with the clients– the researchers – requiring the developer(s) to have a muchgreater understanding of the task being undertaken and tobe able to work productively with the researcher or researchteam to understand what they are trying to achieve and takean active role in developing the approach used to find asolution. It is for reasons such as these that the people whobuild research software are so vital to this element of the o f t w a r e E n g i n ee r i n g C o mm un i t y Research Software Engineering P o li c y T r a i n i n g Fig. 1. The four supporting pillars of Research Software Engineering in our preliminary model. model. While they have the expertise to build software, theyalso have an understanding of the research community, theresearch lifecycle and the process of working effectively withresearchers. This gives research software developers a uniqueand valuable skill-set.
B. Community
Communities provide a forum through which RSEs canmeet, network, and share ideas and technical knowledge.The value of this cannot be underestimated, especially whenworking as part of a small development team or researchgroup, or working independently as the only developer withina project or group of non-computational researchers. Evenwhen working in a larger team, the diversity of thoughts, ideas,and perspectives that comes from bringing together individualsfrom different disciplines and technical backgrounds can beenormously helpful in developing new ideas and approachesto solving technical problems.Community is a separate entity from software developmentwithin our model but it should support the better developmentof software and provide a means to develop collaborations,and to access support and advice on technical questions orideas. In turn, the software development aspects of researchsoftware engineering should support the community, providingtechnical challenges for discussion, interesting projects and usecases that can be presented in technical talks, and forums formeeting other community members who can offer advice andsupport to help all members of the community achieve the bestthey can from their work.
C. Training
Training is vital in ensuring the long-term maintenanceand growth of skills and in keeping up to date with thelatest developments in the research software domain. It alsosupports the development of skills amongst researchers whowill provide the base for the next generation of RSEs. Whilechange happens rapidly in all technical arenas, this is espe-cially noticeable in research. New technologies are emergingat a substantial rate, and new libraries and tools emerging fromthe open source community can gain rapid adoption.We consider training to be a pillar in its own right because itencompasses a number of activities that are separate from thecommunity element of research software engineering. It alsohas clear links to both the software and community elements ofthe model. Training can exist within a community context butit can also be separate and attract a different group of peopleto those who form an RSE community. Training can cover thedevelopment of basic skills such as those provided throughthe Software and Data Carpentry movements [18], [19]. It canalso include specialist, domain-specific training such as thatprovided for High-Performance Computing topics through thePRACE community [20]. More specialised local training maybe provided via a local RSE community and also throughthe running of technical seminars as part of a community’sevents programme, providing the link between the communityand training pillars. The link to software development comesthrough the technical benefits and associated improvementsin productivity, software quality, and robustness that can bexpected by enhancing the skills of researchers and softwaredevelopers.
D. Policy
Policy advances are also critical to enabling the broader sys-tem changes required to increase understanding of software’scrucial role as enabling infrastructure for research, and to pro-mote software as a principal component of cyberinfrastructurestrategic planning. Cultural change in the research environmentwithin which research software engineering work occurs isneeded, at all levels, from departmental and institutional tointernational.In addition to simply promoting research software engi-neering processes as valuable to researchers and PrincipalInvestigators, there are a number of much more significantareas where achievement of more substantial change amongstinstitutions, funders, and the wider research community couldoffer important (and arguably essential) changes to the waythat certain aspects of their processes are currently handled.These include: • Recognising software outputs as first-order research as-sets and providing means to assign credit to them • Providing better structured career pathways to help sus-tain research software engineering roles and to improveopportunities for career progression, alongside coherentapproaches to training • Incorporating RSEs within funding guidelines • Providing researcher access to RSEs, which can includeproviding physical spaces for collaboration (see [21]).These types of large-scale cultural changes can be broughtabout in a more structured manner through advocacy to keydecision-makers by RSE community leaders, rather than in anad hoc manner by individuals or small groups of developers.Measurement is another key element of system change,providing evidence to decision-makers of the benefits ofchange, and analysing priority areas. The survey work on RSEsdescribed above is already providing valuable confirmationof the key role of RSEs, and studies on the critical role ofsoftware in research add to the argument, such as the work ofNangia and Katz. [22].III. A
PRELIMINARY MODEL FOR SUSTAINABLE RESEARCHSOFTWARE
This preliminary model is a work-in-progress and analysisis still ongoing to identify suitable solutions to a number ofquestions. The model described here is therefore intendedto stimulate discussion and seek feedback with a view todeveloping a further paper presenting a more detailed, refinedand more concrete description of the model.The basis for the model is the pillars described in Section II.It is believed that these pillars:1) encapsulate the wealth of processes, topics and activitiesthat make up research software engineering, and2) represent, in their naming, the core, top-level conceptsthat individuals can identify with as being of utmostimportance in enabling research software engineering. Other aspects that are required to complete the structure ofthe model are: • Defining the processes that link or bring together theconcepts represented by the pillars. This requires anunderstanding of the synergies between activities that fitwithin different pillars. For example, can links betweentraining and communities offer greater combined benefitsthan just offering opportunities in one of the two areas? • Identifying the profiles of the individuals that each pillarrelates to or targets, e.g. where do researchers, academics,software engineers, RSEs, research data specialists, re-search managers, etc. fit within the model? How signifi-cant is this in ensuring its success? • Identifying how generally applicable the current pillardefinitions are. Do they apply differently in the contextof individuals or groups? How robust are they in thelight of possible structural changes in RSE communities?One possible way to look at this is in the context of thedifferent levels of individual/team, research software andthe wider field of research software that URSSI considersunder their issues in the figure “Key factors for URSSIconceptualization” [23].
A. Outstanding issues: questions and queries
The points highlighted above need to be addressed in orderto complete the initial structure of our model. However, thisinitial structure will then need refining. There are several moregeneral questions that we feel will require further investiga-tion/discussion as part of this model refinement process. • Do the four pillars highlighted in Section II represent thecomplete picture? Are there any further pillars that shouldbe defined? • Are any of the topics covered by existing pillars ofsufficient importance/significance that they should bepromoted to form separate pillars in their own right? • Is the naming of the pillars correct? So far, these havebeen determined amongst a fairly small group of individ-uals with extensive experience in the RSE community.However, others, either from outside the RSE communityor from different scientific backgrounds, may feel thepillars could be named differently, perhaps to clarify theirmeanings, or to provide a different slant on the way theyare viewed from different perspectives. • Do groups or individuals relate differently to the pillars?Can we consider all individuals as being the same in thecontext of this question or are there differences in theway that individuals of different profiles identify withthe pillars – e.g. researchers or RSEs?We hope to have the opportunity to investigate some of theseissues with the wider research software community to gatherthoughts, feedback and suggestions that can help to test, refine,and complete the preliminary model set out here.V. C
ONCLUSIONS AND F UTURE W ORK
Research software engineering has come a long way in thepast six or so years. Nonetheless, it is still in its infancy asa discipline and while many different groups have emergedand different approaches have been, or are being, tested, wehave observed that there is still a lack of significant formalstructures or models that can be used to explain how and whyRSE works in different contexts and, most importantly, howit can be effectively sustained and grown to offer its benefitsto a much wider range of researchers.In this paper we have described work that is currently inprogress to define a model that can offer one formalisation of astructure for research software engineering that brings togethera full set of activities that we believe are necessary to providea sustainable offering. Since the model is still in development,this paper seeks to gather feedback and thoughts on theperceived correctness of our proposed model and suggestionson how it might be improved. To this end, we have highlightedsome of the specific questions that remain in developing thenext iteration of the model and some more general pointswhere we feel additional input is important.Going forward, we intend to prepare a more detailed publi-cation that defines our next iteration of the model, addressingthe various issues raised here. As part of this ongoing work,we want to gather thoughts and feedback as part of a widerdiscussion on the ideas presented. There are two ways in whichyou can engage with this process: you can email the leadauthor, Jeremy Cohen, to express interest, or you can submitthoughts or questions to the rse-models repository [24] thathas been set up to capture and collect such information.A
CKNOWLEDGEMENTS
JC acknowledges the support of the UK Engineering andPhysical Sciences Research Council (EPSRC) through grantEP/R025460/1.RH acknowledges the support of the University of Manch-ester for his UK and International RSE activities.NCH acknowledges the support of the UK Engineeringand Physical Sciences Research Council (EPSRC), Economicand Social Research Council (ESRC) and Biotechnology andBiological Sciences Research Council (BBSRC) through grantEP/N006410/1 for the Software Sustainability Institute.R
Digital Research2012 , Oxford, UK, Sep. 2012, Presentation. [Online]. Available:http://purl.org/net/epubs/work/63787 [4] R. C. Jim´enez et al. , “Four simple recommendations to encourage bestpractices in research software,”
F1000Research
FORCE2017 , Berlin, Germany, Oct. 2017,Presentation. [Online]. Available: https://presentations.copernicus.org/FORCE2017-30 presentation.pdf[15] M. Sinha, “RSEs World Over: Now in Australia and New Zealand,”in
RSE Conference 2017: International RSE Community Session ,Manchester, UK, Sep. 2017, Presentation. [Online]. Available: https://drive.google.com/file/d/0BwK34Sv9sm73TUo3cjNZaFNWaTQ/view[16] D. S. Katz, S. Gesing, O. Philippe, and S. Hettrick.(2018, Jun.) Results from a US survey about Re-search Software Engineers. http://urssi.us/blog/2018/06/21/results-from-a-us-survey-about-research-software-engineers/. Accessed2018-07-04.[17] C. Morris and J. Segal, “Some challenges facing scientific softwaredevelopers: The case of molecular biology,” in2017 IEEE13th International Conference on e-Science (e-Science)