[PDF] A continuous integration and web framework in support of the ATLAS Publication Process

Abstract

The ATLAS collaboration defines methods, establishes procedures, and organises advisory groups to manage the publication processes of scientific papers, conference papers, and public notes. All stages are managed through web systems, computing programs, and tools that are designed and developed by the collaboration. A framework called FENCE is integrated into the CERN GitLab software repository, to automatically configure workspaces where each analysis can be documented by the analysis team and managed by the relevant coordinators. Continuous integration is used to guide the writers in applying consistent and correct formatting when preparing papers to be submitted to scientific journals. Additional software assures the correctness of other aspects of each paper, such as the lists of collaboration authors, funding agencies, and foundations. The framework and the workflow therein provide automatic and easy support to the researchers and facilitates each phase of the publication process, allowing authors to focus on the article contents. The framework and its integration with the most up to date and efficient tools has consequently provided a more professional and efficient automatized work environment to the whole collaboration.

Full PDF

EEUROPEAN ORGANISATION FOR NUCLEAR RESEARCH (CERN)

CERN-OPEN-2020-00715th May 2020

The ATLAS Publication Process Supported by ContinuousIntegration and Web Framework

Juan Pedro Araque Espinosa f , Gabriel Baldi Levcovitz a , Riccardo-Maria Bianchi i , Ian Brock d ,Tancredi Carli b , Nuno Filipe Castro f,g , Alessandra Ciocio h , Maurizio Colautti e , Ana Carolina DaSilva Menezes a , Gabriel De Oliveira da Fonseca a , Leandro Domingues Macedo Alves e , AndreasHoecker b , Bruno Lange Ramos a , Gabriela Lemos Lúcidi Pinhão a,f , Carmen Maidantchik a ,Fairouz Malek , Robert McPherson j , Gianluca Picco e , Marcelo Texeira Dos Santos a a Universidade Federal do Rio De Janeiro COPPE / EE / IF, Rio de Janeiro, Brazil b CERN c LPSC, Université Grenoble Alpes, CNRS / IN2P3, Grenoble, France d Physikalisches Institut, Universität Bonn, Bonn, Germany e Dipartimento Politecnico di Ingegneria e Architettura, Università di Udine, Udine, Italy f Laboratório de Instrumentação e Física Experimental de Partículas, Lisbon, Portugal g Departamento de Física, Escola de Ciências, Universidade do Minho, Braga, Portugal h Lawrence Berkeley National Laboratory and University of California, Berkeley, USA i Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh PA, USA. j Department of Physics and Astronomy, University of Victoria, Victoria BC, Canada

The ATLAS Collaboration deﬁnes methods, establishes procedures, and organises advisory groups tomanage the publication processes of scientiﬁc papers, conference papers, and public notes. All stagesare managed through web systems, computing programs, and tools that are designed and developedby the Collaboration. The Phase 0 system was implemented using the FENCE framework and isintegrated into the CERN GitLab software repository, to automatically conﬁgure workspaces where theanalysis can be documented and used by the analysis team and managed by the conveners. Continuousintegration is used to guide the writers in applying accurate format and valid statements when preparingpapers to be submitted to scientiﬁc journals. Additional software assures the correctness of otheraspects such as lists of collaboration authors, funding agencies, and foundations. The ATLAS Physicsand Committees O ﬃ ce provides support to the researchers and facilitates each phase of the publicationprocess, allowing authors to focus on the article’s contents that describe the results of the ATLASexperiment.c (cid:13) Contact editor: [email protected] a r X i v : . [ c s . D L ] M a y ontents FENCE main classes 83.2

MBF (Models, Builders, and Factories) infrastructure 113.3 Conﬁguration ﬁles in

FENCE Phase 0 repository 144.2

Phase 0 main functionalities 14

API

TWiki pages 328.4 Data on

FENCE public pages 32

Appendix 38

A Classes for analysis and paper phases 38

A.1

Graph class 38A.2

Action class 38A.3

User authorisation class 38

B FENCE and GitLab integration classes 39

B.1 Methods 392.2 execMethod function 39B.3 createProject function 39

C Author list ﬁles 40

C.1 Author list XML ﬁle header 40C.2 Author list XML ﬁle institutes 40C.3 Author list XML ﬁle authors 40

D Proofs checks 40

D.1 Levenshtein distance 40D.2 Institutes 41D.3 Authors 41D.4 Report 413

Introduction

The ATLAS Physics and Committees O ﬃ ce (PO) is one of the ATLAS Collaboration’s [1] executivecommittees. It is constituted by physicists and engineers performing tasks connected to the continuoussupport of committees and groups including the ATLAS Management, the Physics Coordinators, thePublication Committee, analysis group conveners, the Authorship Committee, the Speakers Committee,and many others. The PO also provides assistance to any member of the ATLAS collaboration, by forexample facilitating membership, authorship, paper submission to the arXiv and journals, and reviewingtalks and posters for national and regional meetings.The PO supports the development of several tools including those used to manage physics analyses,prepare and submit papers, distribute detector performance documents, and track conference proceedings.It uses web-based systems to implement the metadata connected with analyses, version control for editingdocuments, and author lists. PO members are available to guide users in understanding the tools. The POalso assists with other daily tasks to lower the load on each member of the collaboration.The ATLAS Collaboration has a dedicated organisational structure for work on detector maintenanceand operation, data analysis, and scientiﬁc publication and outreach. Collaborative tools are needed toprovide e ﬃ cient communication among collaborators and straightforward interaction with the journals, theinstitutions, and the funding agencies.This report is focused on the infrastructure for managing analysis and papers, especially its most recentdevelopments which were launched in Fall 2017. Due to the phasing out of the SVN system [2], a newsystem was built using the FENCE [3] framework, described in Section 3. This is now used to handle anyanalysis or document type, for internal use or for a large publication, as is described in Sections 2 and 4.The framework is used not only for ATLAS document handling but for the organization of informationabout other entities including members, institutes, appointments, equipment, talks, and conferences. It isalso used by the ALICE experiment to organize information on members, appointments, funding agencies,institutes, the author list, and shift bookings. The LHCb experiment uses the framework for members,appointments, and institutes management. ATLAS has very speciﬁc needs for each task, requiringintegration with a single database. For this reason a ﬂexible custom solution had to be developed.The new system is based on Git and the associated CERN GitLab code repository hosting platform.Development of a special

FENCE –GitLab integration has been necessary, as is detailed in Section 5. TheATLAS GitLab area for editing the documents and submitting the papers to the journals, PO-Gitlab, isdescribed in Section 6. A description of the main tools used to support the collaboration author list and theacknowledgements of funding agencies and foundations is given in Section 7. A more general descriptionof the way the metadata are managed is presented in Section 8.

The ATLAS experiment supports a wide physics programme to explore the fundamental nature of matter.To do so, it makes use of the Large Hadron Collider (LHC), which collides protons at almost the speedof light and a centre-of-mass energy of 13 TeV. To carry out such a physics program, physicists needsoftware and graphical tools to analyse the data and compare them to theoretical models.4TLAS is organised into several Physics (

PHY ) and Combined Performance ( CP ) working groups andsubgroups. These groups are coordinated by conveners appointed by the collaboration for typically twoyears. Example names of PHY and CP groups include Top Quark ( TOPQ ), Standard Model (

STDM ), B -physics( BPHY ), Higgs (

HIGG ), Electron / Gamma (

EGAM ), and Jet and EtMiss (

JETM ). Studies of system detectors(

SYS ) and activities such as software (

SOFT ) and data preparation (

DAPR ) are also organised hierarchicallywith subgroups and conveners.Once an analysis is ﬁnished or an aspect of the detector performance has been studied in detail, somemembers of the analysis team prepare a publication. These writers are known as the editors. In fundamentalresearch, as is the case with the research conducted at CERN, the publication of the results is a dutyand is the usual way to show the results publicly and to report outcomes to the taxpayers and fundinginstitutions.ATLAS produces six di ﬀ erent types of documents: • PAPER : general publications in refereed journals, based on collision data analyses and detectorprojects; • PUB notes: public documents classiﬁed as a note; they sometimes use only simulated data; • PROC and

CONF notes: conference proceedings and notes containing preliminary results, respectively,which are shown at conferences; • INT : internal notes or technical documents. • PLOT : plots that can be used along with the above-mentioned documents.All ATLAS analyses are discussed and presented in the relevant working groups which have the respons-ibility, together with the subgroups, to provide guidance, help, and / or resources to the analyses in boththe early stages of an analysis and during its development. The working groups should also develop acoherent and realistic plan for the release of the results for a conference and / or journal publication. This isa necessary step before any paper draft can be planned or circulated. This procedure and the related steps(or phases) are described in an ATLAS internal document and are summarized below. • For an INT, CONF, or PUB note, the procedure has two phases:

Phase 0 and

Phase 1 . For a paperthat is sent to a peer-reviewed journal, the procedure has four phases, which include, in addition tothe two above,

Phase 2 and

Submission . • The start of an analysis or a document is done at

Phase 0 of the

Analysis

FENCE interface (alsocalled the

Analysis

FENCE page). The

Analysis

Team (AT) starts their analysis and beginswriting drafts and supporting documents. The type of document could be

PAPER , CONF , or

PUB .Some important settings are established at the start of an

Analysis , including the constitution of theAT, the appointments of the group and sub-group conveners in charge of oversight of the analysis,and the constitution of an Editorial Board (

EdBoard ). From the start of

Phase 0 , they are assigneda dedicated GitLab space, a repository, with which to edit their documents. A GitLab repositorywith a skeleton

INT note is created by default at

Phase 0 . • The

EdBoard reviews the complete analysis and ensures that any documentation or paper drafts areprepared according to ATLAS policies. Once it is satisﬁed, it signs o ﬀ on the draft PAPER or CONF note before its distribution to the ATLAS collaboration for review. The

EdBoard should verify thatthe analysis is worth publishing in the proposed form and consult with the Publication Committee(

PubComm ) chair if there are doubts. It should also establish with the editors and conveners whether5he paper should be a letter or an article, and propose a journal. These steps and the validationworkﬂow are performed during

Phase 1 and

Phase 2 , related respectively to the ﬁrst and secondcirculation of the draft document to the collaboration for their comments. During the circulationperiods, authors can read and comment on the paper draft. • The

PubComm chair has the responsibility to assess the quality of the paper and ensure that theATLAS guidelines and policies are followed. A Physics Approval Meeting is held after the ﬁrstcirculation, followed by a Physics Closure Meeting after the second circulation. After a sign-o ﬀ of the revised draft following second circulation by the EdBoard , the draft goes to the Chair of the

PubComm for a ﬁnal sign-o ﬀ . • The ATLAS Spokesperson ( SP ) is ultimately responsible for the scientiﬁc quality of the results fromthe ATLAS Collaboration and makes a ﬁnal review of each paper before the Submission . The ﬁnaldraft is signed o ﬀ by the SP or his / her delegate. • When the SP has signed o ﬀ , the validation workﬂow at Phase 2 is ﬁnished. A message to thePhysics O ﬃ ce Publications team ( PO-Pub ) is generated to inform them of a new document to submit.

PO-Pub o ﬃ cers then proceed with the submission to the arXiv and the peer-reviewed journal. Theyare responsible for communication with the journal during all the steps (referee reports and proofs)through a dedicated Submission workﬂow. The submission is completed once the document ispublished online. The journal references are implemented at the last step of the workﬂow, whichcloses the procedure and makes available the references of the publication on the arXiv, public webpages, and the in SPIRE-HEP database [4].

CONF and

PUB notes use only the

Phase 1 workﬂow. The steps and validation workﬂow are implementedin systems developed using the

FENCE framework, which is described in Section 3. The related web-basedsystems encompass all of the

Phase 0, 1, 2, and

Submission steps and are described in Section 4with a focus on

Phase 0 . If necessary, at

Phase 0 , editors may request the creation of dedicated Gitlabrepositories appropriately conﬁgured through the

FENCE -GitLab integration, as is described in Section 5.The metadata ﬁlled in any of the

Phases are exported to web sites to display the necessary information,including the Public Results pages that are explained in Section 8. Some of the metadata are also usedinternally by the collaboration to monitor the journal submission process or related activities. The GitLabContinuous Integration ( CI ) tools, which are explained in Section 6, allow validation of the documentdrafts and preparation of the appropriate ready-to-go tarball , a compressed set of ﬁles, containing thefull LaTeX [5] resources and ﬁles for the submission to the peer-reviewed journals.For the category PAPER , a longer process is carried out by the

PO-Pub o ﬃ cers. They check the author listand the acknowledgements. Author lists and acknowledgements are both handled and generated throughthe FENCE framework, described in Section 3, and their production is described in detail in Section 7.1.Before the ﬁnal publication, and after the refereed review and acceptance by the journal, proofs are sent tothe collaboration for a last check. While the editors proofread the content of the paper within a short periodof time, usually two days, the

PO-Pub o ﬃ cers check whether the authors and their a ﬃ liations have beenappropriately handled by the journal, through comparison to the original ﬁles sent to them. This check isperformed automatically using a tool called the Proof Checker , which is described in Section 7.3.6 igure 1: The ATLAS web-based

FENCE systems

FENCE is an object-oriented PHP [6] framework designed for the development of web applications. Itencompasses the concepts of encapsulation, data abstraction, polymorphism, and inheritance.

FENCE usesan

ORACLE database (DB) to store the data fetched and displayed in its interfaces. Although ORACLEis the default DB management system used, with some development e ﬀ ort, one can use instead otherrelational database services such as MySQL and

Microsoft SQL Server .A class can be deﬁned as a template that describes the behaviour that the object of its type supports.

FENCE assembles classes to build applications by making extensive use of conﬁguration ﬁles, which are loadedinto the engine at each request. It then generates the HTML response on the user’s browser. The classescan be inherited by the systems that make use of the framework, and therefore, the code can be reused,with similar features implemented from the predeﬁned software components. As a consequence, thedevelopment process is accelerated and the maintenance cost is reduced.The

FENCE software development process encompasses software engineering methods such as requirementsanalysis, architecture, design, testing, deployment, and maintenance in order to guarantee the qualityof the software. Requirements are gathered and documented prior to the solution design and, in thisway, developers are able to propose broader solutions that can beneﬁt the whole project. After anyimplementation, tests are performed to assure software correctness, robustness, extensibility, and re-usability.Figure 1 presents the ﬁfteen ATLAS web-based systems currently in production. These were developedusing the

FENCE framework, which facilitates their maintenance and enhancement. They can be dividedinto three categories: people, publications, and equipment. The people-related ones have features for man-aging personal information of the ATLAS members, including their contracts, appointments, a ﬃ liations,nominations, conferences, theses, and research activities. The systems related to publications automate theprocess of producing papers, conference and public notes, and weekly performance plots from collisiondata, for review. Those related to the equipment handle information about system detectors’ design andinterconnection. 7 .1 FENCE main classes

The

FENCE framework is composed of a library of helper classes that are extensible program-code templatesfor creating objects. Any new class can be coded and added to the framework, widening its scope, andcan then be reused in di ﬀ erent systems. One example is the Search class that provides methods to createsearch interfaces that allow data ﬁltering through predeﬁned search attributes. The

SuperSearch classo ﬀ ers an advanced search interface, where the user can build logic queries with AND and OR operators. Theinputs that are entered into a form can easily be added using classes such as TextInput , DateInput , and

MemberInput , which provides a selection box with the list of all members of an experiment. The mostimportant classes, developed to support the ATLAS publication process, are described in the followingsections.

Workflow class

The

FENCE Workflow represents any process involving states and actions triggered by a change from onestate to another. It was used to implement the web system that supports the ATLAS publication process,which is organised in phases. Each phase is divided into several steps separated by actions. Each step canactivate a number of tasks including the recording of metadata into the ATLAS database, triggering of anE-group creation, activating an update on GitLab [7], and sending automatic emails.The

Workflow class was developed based on the concept of Directed Cyclic Graphs (DCG) that encom-passes the relation between objects. Objects are called nodes and the relations between them are callededges, implying a directional ﬂow. To represent this concept, some classes were created. The abstract

Graph , whose corresponding code can be found in Appendix A.1, has methods that allow the additionand deletion of nodes and edges. The class that implements

Graph is called

MapperGraph . It storesnodes and edges inside a PHP data structure called

SplObjectStorage that, for this implementation, canbetter manage objects than associative arrays. The use of this data structure allowed the development ofvery simple methods to retrieve neighbour nodes or edges given an origin and target node, which meansretrieving a directional edge.The

Node class deﬁnes methods to set and get data related to one node. The

Edge deﬁnes similar methods,but related to an edge. An example of data that can be added to a node is an instance of the

Action , havingmethods to set and get function callbacks, deﬁning its arguments, and being able to access its outputs.More details about the

Action class implementation can be found in Appendix A.2.The behaviour of the

Workflow is controlled by a JSON ﬁle, following the

FENCE pattern described inSection 3.3. This ﬁle deﬁnes a workﬂow’s steps, their order, and the actions that can be triggered at a givenstep. The

Workflow class uses the

MapperGraph, Node, Edge , and

Action classes to build a graphand its elements.

The

Messenger class is used by the

Workflow to send automatic emails and to allow users to edit emailtemplates. The

JSON ﬁle used by the

Workflow deﬁnes email template names to be triggered by an action.These templates and their variables are stored in the database in two

JSON ﬁles. The ﬁrst one containsall the templates with variables to be substituted, and the second contains the variables’ identiﬁers andthe methods used to substitute them into the templates before sending the email. Using another class

DBJReader , the

Messenger can read these

JSON ﬁles from the database. It can then either get thetemplates and show them in the interface, so the users can edit them, or parse the variables and send anemail. In the ﬁrst case, the changes applied in the templates are saved in the database, but this time usingthe

DBJWriter . In the second case,

Messenger will substitute all the variables in the template and use the

Mailer , designed to send automatic emails and to trigger the email to the correct recipient. A summary ofthis infrastructure is illustrated in Figure 2.

Figure 2: Summary of the

Messenger infrastructure. The two

JSON ﬁles related to the

Messenger are stored in thedatabase. The class retrieves them using the

DBJReader . Messenger can either use the

Mailer to trigger emails orrender email templates so users can edit them through a

FENCE system interface, illustrated by the computer in theﬁgure. After the user edit is complete, the

JSON email templates are saved into the database using the

DBJWriter . EgroupManager class

The

EgroupManager class is similar to the

Messenger class, since it also gets a template from a

JSON ﬁle and substitutes variables. The di ﬀ erence is that the templates are not related to emails, but to E-groupconﬁgurations. It does not allow users to edit the templates from the interface since they contain manytechnical details.The EgroupManager class uses another

FENCE class , called

JReader , to get the templates from the

JSON ﬁle. This class was designed to parse

JSON ﬁles and store them in an object. After getting the

JSON templates, the

EgroupManager parses them, substituting all the variables.With the template parsed, the

EgroupManager uses the

FENCE EgroupSOAPHandler to communicatewith the E-groups

API . To do so, it ﬁrst makes an authentication. Using the methods available in the SOAPWebServices, it can create, update, and delete E-groups.9 .1.4

User class

The

User class supports access control of the interfaces.The main purpose of the

User class is to deﬁne an object that stores information concerning the connectedATLAS member (connected to the main CERN authentication server), including the CERN CCID (CERNComputing ID), ﬁrst and last name, E-groups, and others attributes. It also deﬁnes speciﬁc methods tofacilitate access control within the interface.In Appendix A.3, there are two examples of the above-mentioned methods. These are used to check user au-thorisation: is_expert() checks if the user is a member of the E-group of

FENCE team developers, whichis composed of the project developers. The method

Permission($permission) accepts a permission tobe checked as an argument and veriﬁes if it is in the user permissions inventory.When an extension of

User is created, extra methods are appended to

User to provide speciﬁc

Utils fora context,

Utils being a

FENCE class that contains useful public methods used by many other classes ofthe framework. Every system has its own

User class extending the

FENCE core

User class. Systems maytherefore have speciﬁc methods that are used to grant edit permissions and control user access.Conﬁguration ﬁles, described in Section 3.3, provide multiple properties that set access control and editpermissions. This is mainly achieved in two ways. General access control is set using CERN E-groups,or

FENCE user groups, including experts, administrators, and many others. In this case,

User veriﬁes theclearance by comparing the user actual E-groups and user groups to the required ones. The other wayconcerns edit permissions and uses speciﬁc roles. These roles are keys mapped to methods in

User thatcheck if the member is supposed to have edit permission on that speciﬁc ﬁeld. An example is shown inListing 1.

Listing 1:

User example "pub_short_title": {"label": "Public short title","sublabel": "Plain text , no LaTeX","type": "textarea","rules": {"maxlength": 1000,"character_not_allowed": ["\\", "$"]},"analysis_roles": ["GROUP_CONVENER","SUBGROUP_CONVENER","PROJECT_LEADER"]}

Taking the

GROUP_CONVENER role as an example, it uses the following method to grant permission to editthe public short title ﬁeld, see Listing 2:

Listing 2:

GROUP_CONVENER method example public function is_subgroup_convener($publication_id = null){if ($publication_id === null) return false;foreach ($this ->subgroup_convener_publications as $line => $row) {if (strpos($row[’PUB_LIST ’], strval($publication_id)) !== FALSE) return true;}return false;} .2 MBF (Models, Builders, and Factories) infrastructure

Models , Builders , and

Factories are all heavily used software design patterns. Their combined use isa particular feature of the

FENCE framework. The main goal of these development standards is to createa wrapper to store complex objects and facilitate their construction in di ﬀ erent contexts, working as anSQL query builder. For instance, it would be possible to pass an actual SQL Query every time informationfrom the database is needed. It is, however, much more convenient to just call a class that handles thequeries and presents to the user the needed object. The desired behaviour described here is exactly how the

MBF infrastructure works. In

FENCE classes, objects are constructed simply by instantiating speciﬁc factoryclasses. For instance, in the example below, a member is constructed by instantiating the Member Factory,see Listing 3:

Listing 3:

MBF method example public function buildMemberByID($memberId){$DBManager = new DBManager;$order = ["first_name","last_name","email"];$memberFactory = new MemberFactory($DBManager ,$order);$member = $memberFactory ->build($memberId);return $member;}

In this example,

MemberFactory extends the core

Factory , which handles the whole process thatconnects to the database and assembles objects. An object containing the order of properties to be built ispassed as an argument to the instantiated factory. In the example of Listing 3, the member factory providesthe ﬁrst and last name as well as the email address of a member.When a speciﬁc new object needs to be created, a group of three ﬁles is needed: the

Factory , the

Builder ,and the

Model ﬁles. The ﬁrst one stores the inventory of factories a speciﬁc

Factory connects to andsets which

Builder it uses. The next stores the relation between the database structure and the

Model ,assigning table columns to its

Setters . Finally, the

Model is the class that is populated by the

Builder and stores the information in structured objects that can be accessed through

Getters .From a perspective opposite to that of the paragraph above,

Model s are classes that serve as orientedobject representations of the information. They deﬁne several set and get methods that handle speciﬁcproperties of the object. These models are used in

Builders , where the actual query is set and databasecolumns are associated with a model set method. Finally, a

Factory calls its corresponding

Builder andcontains an inventory, which may be empty, of other

Factories that are related to this object.

FENCE

The

FENCE framework is based on conﬁguration ﬁles that provide the necessary parameters and propertiesto build interfaces. The main goal of this infrastructure is to simplify many aspects of web systemrequirements. The conﬁguration ﬁles are in

JSON , a lightweight format for storing and transporting data,11nd since those can be transformed in structured objects, developers can easily deﬁne a group of propertieswithin speciﬁc contexts. For instance, it is possible to set up which groups of users can have access toa certain interface. Another beneﬁt of using conﬁguration ﬁles is that major classes that have severalarguments and environment parameters can be instantiated in a cleaner way, with just a conﬁguration ﬁlepath as argument. With that, developers feel encouraged to develop more generic and robust features, sincethey can be easily reused in the future.Along with the conﬁguration ﬁle concept, additional utilities were developed to guarantee the feasibilityof this idea. One of these tools is the class

JReader , which provides functionality for template variablesubstitution and

JSON schema validation. Another one is the

FENCE Content , which gets some defaultinformation from conﬁguration ﬁles to handle common interface needs, such as access control, constants,and rendering outline formats.Most of the time, when a new interface is created using

FENCE , the class that generates the particularcontent of this page inherits the

Content . At the same time, as is described in the Uniﬁed Model Language(UML) in Figure 3, the

Content has a conﬁguration ﬁle path as argument. This conﬁguration ﬁle path ispassed to an instance of

JReader constructed within

Content . The

JReader method parse_contents makes available for

Content the corresponding conﬁguration ﬁle content.

Figure 3:

Content

UML diagram describing its interaction with the

JReader class.

To automate the process of conception, evolution, review, and approval of publications described inSection 2, ﬁve web-based systems were developed using the

FENCE framework:

PAPER s, CONF notes,PUB notes , PLOTs and

Phase 0 , see Figure 1. Together they are called the Analysis Web systems. Theﬁrst four are described here brieﬂy, while the last one is presented in detail.The relationships among the ﬁve Analysis web-based systems are represented in Figure 4. The

Phase 0 system has been implemented to support the publication process. The evolution of the process from thecreation of a

Phase 0 to the other publication systems (

PAPER , CONF , and

PUB ) is described in Section 2.For the review and approval process of a publication set of plots, there is also the

PLOTs system, which canbe used during all phases whenever a new plot is sent for circulation to and review by the collaboration.12 igure 4: The relationship between the Analysis Web systems and their phases.

The

PAPER features functionalities for inserting, retrieving, editing, and deleting the properties of apaper in a database, through managing the activity ﬂow of its three phases:

Phase 1 , Phase 2 , and

Submission .The

CONF notes system incorporates notes that should be presented at a conference. The

PUB notes system incorporates public notes that should be presented to the scientiﬁc community without beingsubmitted to a journal or presented at a conference. The

PLOTs system handles the plots that are usedto present results in all other types of publications mentioned so far. Those three systems presentfunctionalities for inserting, retrieving, editing, and deleting the properties of their entities in a databaseand also manage the workﬂow of each system’s

Phase 1 .The

PAPER , CONF notes,

PUB notes, and

PLOTs systems are quite similar, di ﬀ ering only in the numberof phases and the workﬂow / steps involved in each one. They also resemble the actions related to eachphase’s steps, which can be: saving data in the database, sending automatic emails, or creating or updatingE-groups.The need for the Phase 0 system arose in 2017, when the ATLAS IT department downgraded the ApacheSubversion (SVN) [2] version control system and encouraged its members and authors to use Git [8]because of its decentralised characteristic, which is better adapted to the situation of the collaborators.The experiment started to use the repository platform GitLab [7] because of its continuous integrationfunctionality, the possibility of storing repositories in private servers, and the provision of an API withmany services.The transition period has triggered the need for a tool that can communicate with the GitLab API andcreate automatically-conﬁgured Git repositories with each publication’s unique metadata. To formalisethe creation of repositories at the beginning of the publication writing process, the concept of

Phase 0 emerged. It was recognized that this could also include the ﬂow of tasks during the preliminary stage ofthe editorial process, when it is not yet known whether scientiﬁc content will materialise into a paper,conference note, or public note. So, in March 2017,

Phase 0 web-based system development waslaunched.The system provides functionalities to support and formalise the initial stages that may lead to a publication,before accessing the

PAPER , CONF , and

PUB notes production process.

Phase 0 can trigger di ﬀ erent13ypes of processes, including an Analysis workﬂow towards a

PAPER or a

CONF note which gathers allthe physics and combined performance analysis activities (

PHY , CP ). One can also skip the AnalysisWorkflow towards a

CONF/PAPER or a

PUB note. This is allowed for a

PUB note, which is usually asimulation work or an instrumental description. It is also allowed for a

PAPER/CONF intended for aninstrumental description purpose, or for a physics

CONF note that should proceed as quickly as possiblethrough internal review so that it may be used at a conference.

Phase 0 is the common stage for

PAPER , CONF , and

PUB note workﬂows, before

Phase 1 . It storessome metadata divided into steps, e.g. meeting dates, comments, links, groups of people such as AnalysisContacts, target dates for analysis ﬁnalisation, editorial board members and meetings, and approval sign-o ﬀ dates. As is described in Section 2, each of those metadata should be ﬁlled in a speciﬁc order by userswith the appropriate permissions and should trigger automatic emails or E-group updates all along theprocess. Phase 0 repository

The ﬁrst step of

Phase 0 system implementation was the data modelling to identify the system’s entitieswith their attributes and relationships. A simpliﬁed version of this study will be presented next.The main entity of the system is a

Publication , which has attributes such as title, reference code, andcreation date. A publication is always related to a

Group and, most of the time, a

Subgroup , whoseattributes are name and description.

Members of the ATLAS experiment are related to a publication by one or more

Roles such as AnalysisTeam or Editorial Board member. A

Member has attributes such as his / her ﬁrst name, last name, andprimary email address. The attributes of a Role are its name, type, start date, and end date.A publication contains

Phases (in this system, only

Phase 0 ), whose attributes are the start date andits status. During

Phase 0 steps, many

Meetings take place, and their attributes are title, date, andcomments.Some external

Contents are associated with

Phase 0 , such as notes containing supporting documentationfor the publication and meeting minutes that are stored on the CERN document server. This entity has asits attributes the name of the content, its type, and its web address.Phase 0 is also related to

Deadlines by which people ﬁnish their activities. A

Deadline has as attributesits type and its date.

Phase 0 main functionalities

The

Phase 0 system has three main functions. The ﬁrst refers to the insertion of a new publication, whenthe members of the ATLAS experiment decide to publish the results of their work and need to deﬁne theprincipal data of the article or public note in order to start writing. The interface presents a web form thatcontains several ﬁelds that deﬁne the main information of the new publication. These include its title,reference code, groups, subgroups, and keywords. The second interface presents the search functionality.With this a user can search for publications by setting ﬁlters, and can write reports through the results table.The third interface allows editing of the information about a publication, facilitates the monitoring andevolution of

Phase 0 , and enables the automatic creation of Git repositories.14 igure 5: The analysis submission functionality in the

Phase 0 system. On the left is a summary of all steps neededto complete the data submission. On the right are the ﬁelds that belong to the ﬁrst step.

The functionality to submit a new analysis can be seen in Figure 5. Through this, a member ﬁlls out a formin steps. The mandatory ﬁelds in each step are indicated by asterisks (*). Information on how to ﬁll eachﬁeld are deﬁned by the ‘i’ icon next to the ﬁeld name. At the end of all steps, there is a conﬁrmation stepwhere the user can verify whether all ﬁelds have been ﬁlled in correctly. If so, the form information canbe stored in the database, which now gathers the information that deﬁnes an analysis such as its title andreference code.The advanced search functionality of the

Phase 0 system, shown in Figure 6, allows a user to deﬁnecriteria through three ﬁelds. The ﬁrst deﬁnes a publication attribute, the second selects an operator, and thethird allows a value to be entered. One or more search criteria can be selected and arranged by forminglogical expressions using the AND and OR operators. Users can also conﬁgure the search results by settingthe ordering of the records in ascending or descending order, grouping them by attributes, selecting thevisible attributes, and saving those conﬁgurations for use in a future search. Search result reports can alsobe exported in CSV ﬁle format.Finally, the publication details interface, the main interface of the system shown in Figure 7, presentsmetadata and allows editing of it. The interface also controls the workﬂow of

Phase 0 activities, providingan overview of all its stages and highlighting the previous, current, and upcoming ones. A transitionbetween

Phase 0 steps triggers actions. The most common is storing data in the database. If allowed, auser has the option of saving the data to the repository and staying at the same step by pressing the ‘Save’button; or saving the data and going to the next step by pressing the ‘Proceed’ button. When one movesforward in the workﬂow, the system triggers automatic messages that alert and provide instructions to theperson responsible for the next step.An example of a

Phase 0 step that is part of a workﬂow is the Editorial Board “request meeting andformation data" step which is illustrated in Figure 8. The group convener is responsible for adding theEditorial Board “request meeting" title, date, comments, and links. The Publication Committee Chair is15 igure 6: Advanced search functionality of the

Phase 0 system. It presents ﬁelds to deﬁne search criteria to beadded in the Logic Workspace area forming logical expressions. responsible for appointing the Editorial Board members and ﬁlling in the date on which they are appointed.Once all this information is in the system, the Publication Committee Chair can proceed to the next

Analysis workﬂow step. Subsequently the Editorial Board E-group is automatically created, includinginformation for all its members, and an email is sent, informing them that they were appointed and shouldproceed to the next step of the

Analysis workﬂow.The

Workflow , Messenger , EgroupManager , and

User FENCE classes (mentioned in Section 3) andthe

MBF infrastructure made possible the development of the

Phase 0 system workﬂow. They do not,however, include the GitLab Integration, a key feature of the system, which is explained in detail in thenext sections.

As was mentioned in Section 4, the

FENCE Phase 0 system was designed and implemented to provideautomatic creation of Git repositories to simplify the analysis and the editing of any type of draft to supportthe analysis. The

Phase 0 functionalities include some features that trigger the GitLab commands. Theintegration of the software framework and the collaborative repository platform is described below.

At any

Phase 0 creation, Git repositories are created in GitLab under the atlas-physics-office group.Each leading

Physics or Combined Performance group or System Detector / Activity e ﬀ ort is labelled16 igure 7: Phase 0 system main interface. On the right is a summary with the most important information about anactivity. On the left are the steps corresponding to the

Phase 0 activity ﬂow. as a category with four letters in the

FENCE systems related to the analysis and the documentation creation.The full list of Physics and Combined Performance groups is shown in Table 1.For example, the leading Top Quark physics group is

TOPQ while the Electron / Gamma Combined Perform-ance group is

EGAM . The identiﬁer (ID) of a

Phase 0 FENCE entry is therefore labelled:

ANA-GROUP-YEAR-NN where GROUP can be

TOPQ , HIGG , or

EGAM while

YEAR is the year the document was created and NN isa two-digit counter. For instance, ANA-SUSY-2019-04 represents the fourth analysis the

FENCE entrycreated in the SUSY group in 2019.An analysis group may evolve into a

PAPER , a

CONF note, or a

PUB note. The identiﬁers (IDs) of thosedocuments are therefore

GROUP-YEAR-NN , CONF-GROUP-YEAR-NN , or

PUB-GROUP-YEAR-NN , respectively.This naming convention preserves backward compatibility with the di ﬀ erent entries used for each type ofdocument before Phase 0 creation.In PO-Gitlab, an e ﬀ ort has been made to make the document IDs more logical. They are labelled: • ANA-GROUP-YEAR-NN-INTn for internal notes, • ANA-GROUP-YEAR-NN-PAPER for a paper, • ANA-GROUP-YEAR-NN-CONF for a CONF note, and • ANA-GROUP-YEAR-NN-PUB for a PUB note. 17 igure 8: Screenshot of the "Editorial Board request meeting and formation data" step in the

FENCE Phase 0 system.

For example, in the Higgs category, for a given

Phase 0 analysis entry

ANA-HIGG-2017-08 , PO-GitLabwill host

ANA-HIGG-2017-08-INT1,2..n, ANA-HIGG-2017-08-PAPER, ANA-HIGG-2017-08-CONF,and ANA-HIGG-2017-08-PUB . Each repository is connected to the appropriate

FENCE interface. Thisis illustrated in Figure 9 where the GitLab interface for the atlas-physics-office subgroups andrepositories is shown.

ANA-HIGG-2017-08 , a subgroup of

HIGG , contains for example one paper and oneinternal note repository, respectively

ANA-HIGG-2017-08-PAPER and

ANA-HIGG-2017-08-INT1 . API

A set of classes was created with the original aim of making the use of the GitLab

API easier between the

FENCE systems. In fact, it is mostly used by the Analysis systems within the Analysis GitLab integration.Through the main class, called

Gitlab , it is possible to handle all the basic operations o ﬀ ered by the API :create, get, and customise settings for projects, groups, and branches, handle commits, and carry out manyother actions deﬁned and explained in the GitLab REST

API documentation [9].Each

API endpoint can be accessed by one of the following HTTP methods:

GET , POST , DELETE , and

PUT .The

FENCE –GitLab class uses them through methods detailed in Appendix B.1. Each of those methodsmakes a call to execMethod (see Appendix B.2), which conﬁgures the endpoint using the PHP CURLmethods [10] and executes one of the HTTP methods, returning the REST

API answer. This can be a

JSON ﬁle with metadata, or just a success, or an error message.18 able 1: List of the Physics Activity leading groups and their acronyms. The WG and CP abbreviations indicateWorking Group and Combined Performance, respectively.

Acronym GroupBPHY B-physics WGEGAM e / gamma CPEXOT Exotics WGFTAG Flavour tag CPHDBS Higgs & Diboson Searches WGHIGG Higgs WGHION Heavy Ions WGIDTR Inner Detector Tracking CPJETM Jet / Etmiss CPMUON Muon CPPMGR Physics Modelling GroupSIMU SimulationSTAT Statistics CommitteeSTDM Standard Model WGSUSY SUSY WGTAUP Tau CPTOPQ Top WGUPPH Upgrade PhysicsThe metadata returned by the execMethod are then used to populate the attributes of many classesrepresenting GitLab elements, including

Branch, File, Commit, Project, Group, Label , and

Member . These can then be manipulated by any

FENCE system.An example is the creation of a paper repository. The createProject method (see Appendix B.3), iscalled with the project name as the ﬁrst argument (or an instance of the

Project class ) and the projectparameters (such as path , namespace , default branch , and description ) as the second argument. Themethod calls the POST method mentioned above and stores the new repository metadata in a FENCEProject object, which can be used for further manipulations.

The ﬁrst interaction between

FENCE and GitLab happens when a

Phase 0 entry is created. A group withits reference code is automatically formed containing the ﬁrst internal note repository. The content ofthis repository’s ﬁrst commit is obtained from a source repository, which is the package containing ﬁletemplates called atlaslatex . FENCE is responsible for substituting all the necessary variables into all theﬁle templates according to the metadata inserted when creating the entry in the system. After the commit,

FENCE automatically de-protects the master branch, creates the protected PO-ready branch, and creates thePO-Publication label. The last step is to set the developer permission to the Analysis Team E-group usingLDAP synchronisation.Another

FENCE and GitLab integration process is executed when

Phase 0 is ﬁnished or is skipped, thusproceeding to

PAPER , CONF note, or

PUB note

Phase 1 . FENCE automatically creates an internal note19 igure 9: Screenshot of the substructure of a

HIGG

GitLab repository subgroup. The main group, atlas-physics-office , is shown at the top. The

HIGG subgroup is selected, and its

ANA-HIGG-2017-08 subgroup is expanded. A repository for the

PAPER , and one for the Internal Note (

INT1 ), are created under

ANA-HIGG-2017-08 .Figure 10: A view of a paper’s author list section: At ﬁrst circulation in

Phase 1 , the “Create and push to GitLab"button generates the author list and triggers the push action on the GitLab repository. The button will change its labelat

Phase 2 and

SUBMISSION to “Generate and push to GitLab ".repository setting all the conﬁguration elements that are needed. It is possible to append additional internalnote repositories at any time. The creation of the conﬁguration of the repositories holding the document isdone without any input from the editor’s side, allowing for a streamlined process.

FENCE and Gitlab also interact while handling the author list of a publication. Creating the author list atﬁrst circulation triggers a request for the existence of the GitLab repository associated with the publicationthrough the Gitlab

API . The act of clicking on the button labeled "Create and push to Gitlab" (see Figure 10)creates the author list according to its reference date in all the formats (including xml and tex ). It thenstarts a dialog between the two platforms,

FENCE and GitLab, to push the ﬁles through the GitLab

API . Onﬁrst circulation, the ﬁles are added to GitLab, while on subsequent circulations, as they already exist, theyare simply updated. 20

PO-GitLab and CI tools

The ATLAS Physics O ﬃ ce GitLab tools (PO-GitLab) simplify the publication process of ATLAS docu-ments by using the features provided by the CERN GitLab platform.The previous publication workﬂow involved a heavy email exchange between ATLAS editors and thePhysics O ﬃ ce in order to ensure that ATLAS rules were being followed up to submission of the paperto the arXiv or the journal. This approach led, usually, to modiﬁcations implemented by di ﬀ erent parties(o ﬃ cers and editors), which were sometimes not properly implemented and which slowed the publicationprocess down. Due to the uniform and repetitive nature of the tasks required to submit a publication, theimplementation of an automatic tool was favoured.Three main tasks are handled by the PO-GitLab up to the ﬁnal submission. They are: the automatic creationof GitLab repositories (Git repositories centralised in the remote platform), the real-time veriﬁcation oftechnical rules by the GitLab Continuous Integration (CI) tools, and the automatic processing of thedocument itself. These tasks are described in this section. A centralised area controlled by the ATLAS Physics O ﬃ ce needed to be designed ﬁrst. Control is thekey, in order to allow the Physics O ﬃ ce to maintain the quality of the document being accepted forpublication.A basic structure is set in GitLab to store the groups related to an analysis. The main GitLab group iscalled atlas-physics-office , and this represents the root of the group hierarchy tree. Each of itssubgroups belongs to a leading group, for example HIGG , EXOT , SUSY , etc., as is mentioned in Section 5.In the case of the publication shown in Figure 7, a subgroup of

GENR called

ANA-GENR-2018-01 wouldbe created. Inside

ANA-GENR-2018-01 there would exist speciﬁc repositories for each type of analysis,designated

ANA-GENR-2018-01-INT1 , ANA-GENR-2018-01-PAPER , ANA-GENR-2018-01-PUB , and / or ANA-GENR-2018-01-CONF .With this structure deﬁned, it is possible to create documents automatically through

FENCE , via the com-munication link between the framework and the GitLab

API . This is explained in more detail in Section 5.This amortisation relies on ﬁle templates that have their variables substituted according to requirementsof the related publication. This way, all created repositories contain the default documents correctlyformatted to start writing a

PAPER , CONF , PUB , or Internal Note. The repository is also conﬁgured witha new protected branch named

PO-ready , which means that only members with the role

Maintainers are allowed to push and merge. This special branch is used to run the ﬁnal submission pipeline when thedocument is ready and has been reviewed by the relevant parties. The master branch is used as the mainwork branch, unprotected at the time of the repository creation, allowing all editors to push new commitsand interact with the repository.

GitLab CI tools are designed to automatically execute a set of tasks every time a new modiﬁcation isintroduced into the document (i.e. a new commit is pushed to the document repository). The approachfrom the Physics O ﬃ ce was to develop a package that is able to run di ﬀ erent jobs on a given document,21erifying distinct aspects, which are executed by a PO-GitLab Python package. Given the modularity ofthe system, new and more complex tasks can be added, ensuring scalability.GitLab’s CI is organised using pipelines. A pipeline is a set of jobs grouped in stages. All the jobs in thesame stage are executed in parallel, while each stage is only executed after the previous one has completed.The dependencies among the jobs’ executions can be conﬁgured in di ﬀ erent ways according to the status.For example, it is possible in some cases to start the jobs of the next stage only if the previous ones haveﬁnished successfully, and in other cases only execute them if the previous stage failed. Each time a newcommit is pushed to the repository, a pipeline is triggered.Di ﬀ erent sets of checks are performed in each step of the publication process. For editors, all work donebefore the paper submission (detailed in Section 6.3) is monitored by the edit-pipelines as shown inFigure 11. These pipelines are triggered by any push made from branches whose name does not start with PO- . The special branches using the

PO- preﬁx are tracked by the submit-pipelines when a paper isconsidered ready for submission to the arXiv and the peer-review journal.

Figure 11: Screenshot of the edit-pipelines. These four stages do checks before the publication is ready for submission,with the ﬁrst stage checking the version of the PO-GitLab package, the second stage running checks related toL A TEX formatting, the third one ensuring that the ATLAS rules are being followed, and the last stage testing if thedocument builds correctly.

Figure 11 presents an example of an edit-pipeline that consists of the following set of stages: • Preparation : This consists of only one job that checks the current version of the PO-GitLabpackage. • Technical checks : This stage includes checks related to L A TEX: – Figures exist: checks if all ﬁgures used in the document are present in the repository. – Files exist: checks if all the tex ﬁles included in the document are present. – Repeated commands: checks for repeated user-deﬁned commands. It is not wise to use thesame command for di ﬀ erent purposes. This can present a problem when captions for ﬁguresand tables are being generated for the ATLAS public pages. – Repeated labels: checks for duplicate labels in all tex ﬁles. – Undeﬁned references: checks for undeﬁned references.22

Unused labels: warns if a L A TEX label has been deﬁned but not used. Although this is not aproblem, it might point to an improper reference. • ATLAS checks : These are checks related to ATLAS rules and style: – Bibliography: checks that the bibliography ﬁles are included. – Cover logo: checks that the proper logo is being used in the ATLAS template. – Figures labels: checks the ATLAS labels (e.g. ‘ATLAS Internal’) in the legends of ﬁguresdepending on the type of document. Table 2 shows the labels that are allowed or not allowedin di ﬀ erent ﬁle types. – Oversized ﬁgures: checks for ﬁgures larger than 2 MB. – Preprint ID: checks that the preprint ID is included in the document. – Template version: checks that the version of the ATLAS L A TEX template is the latest oneavailable. – Title and Abstract: checks that no user-deﬁned commands (i.e. non-L A TEX commands) arebeing used in the title and in the abstract. • Build : this stage builds the document itself. As these pipelines will be active on each commit, thepdf ﬁle of the document is not stored as an artifact. Whether or not the pdf ﬁle is to be generated bya manual job (editors can trigger it by clicking on the play button on the interface) is indicated by agear that produces and saves the document as an artifact for a user to download.

Table 2: Types of labels that are allowed and not allowed to be used in ﬁgure captions, depending on the documenttype.

Document type Preliminary label Internal labelPAPER Not allowed Not allowedBOOK Not allowed Not allowedCONF Allowed Not allowedPUB Allowed Not allowedNOTE Allowed Allowed

The CI also produces the required ﬁles for paper submission, using dedicated pipelines similar to theediting ones. These are called submit-pipelines . A protected Git branch, named

PO-ready , is createdby default at the time of the setup of the paper repository. When a paper is ready for submission, an editorcreates a Merge Request from the

Master to the

PO-ready branch. When this request is accepted by aPhysics O ﬃ ce o ﬃ cer, the paper submission pipelines are triggered. In addition, any branch or tag createdfollowing the pattern PO-* triggers the paper submission pipelines. These pipelines have the previouslydescribed tests but subsequently, at the build stage, a ﬂattening of the LaTeX document occurs, with thefollowing actions:1. all the source ﬁles are merged into a single L A TEX source ﬁle;23. all the comments in the L A TEX source ﬁle are removed;3. all the ﬁgures are renamed following the convention required by the journals;4. any directory structure is removed.The various actions are shown in Figure 12.Tarballs suitable for submission to the arXiv and journals are created using TEX Live 2016 and 2017,respectively. The two di ﬀ erent versions are required by the journals because of di ﬀ erences in handling thebibliography and to avoid incompatibilities. The arXiv favours TEX Live 2016, while some APS journals,for example, require TEX Live 2017. The tarballs also contain ﬁles with plots and tables for the public webpage. These tarballs are created as GitLab artifacts and can be downloaded by the corresponding editorsand members of the Physics O ﬃ ce. In the submission tarballs, the auxiliary material (ﬁgures and tablesnot for submission) are not included. Figure 12: Screenshot of submit-pipelines. From left to right, the jobs check the version of the CI tools and copy bib and sty ﬁles along with the ﬂattened L A TEX document to a special folder. The routine then handles all ﬁguresand tables, renaming and labelling them according to the journal’s speciﬁcations. At the ﬁnal stage, the ﬂatteneddocument is updated with the newly named ﬁgures and tables. In the last two steps, the document is built, producingthe bbl ﬁle needed for the journal and the tarballs for the public web pages.

The author list, often written authorlist for convenience, is the inventory of qualiﬁed authors at agiven date, which is called the reference date. Every paper has a related list of qualiﬁed authors with areference date that corresponds to the creation date of that list at the

PAPER Phase 1 , just before the ﬁrstcirculation of the draft document to the collaboration. Qualiﬁed authors are active physicists contributingto the maintenance and operation of the experiment. Some of them are retired people applying theirpre-data credits (obtained before the data-taking era); they are called signing-only authors. Between

FENCEPhase 1 and

Phase 2 , some people may receive exceptional authorship because of their involvement inthe analysis or the paper, even if they are not yet qualiﬁed as authors though the usual process. Thereforethe author list is updated to include “exceptional" authors. The special cases are studied by the AuthorshipCommittee and proposed for approval to the Spokesperson, who will agree or not with each exceptionafter reviewing the proposal from the Authorship Committee.24his information is stored in the ATLAS database and managed by

FENCE . Figure 13 shows the full list ofmembers (active and retired), their a ﬃ liations, and the related metadata that are needed to generate the fullreport of members and institutes. Figure 13: The

FENCE author list generation interface. On the left, a list of institutions used for the a ﬃ liations; inthe center of the screen, all the authors are listed (yellow ones are signing-only authors); at the top of the page, theinterface allows the users to view the author list on a selected date (top left) or for a given paper (top center), bytyping the ATLAS paper ID. The acknowledgements are incorporated in a legal paragraph that the collaboration agrees to include ineach paper to thank funding agencies for their ﬁnancial support. They do not change very often, but theymay include or suppress a funding agency or a foundation at a given date. Therefore, similarly to theauthor list, the acknowledgement ﬁle is built for each paper at the reference date.Both ﬁles, the author list and the acknowledgements, are built using the

FENCE framework (see Figure 14)and are automatically pushed to the appropriate Gitlab repository, using the

FENCE –GitLab integration(Section 5.3). Their integration into the paper is straightforward at the time of submission to a journal.

FENCE provides an elegant way to retrieve the required information from the database (see Section 3.2)and build all the ﬁles.The author list is built by the

FENCE framework into an xml ﬁle. This is composed of three main blocks: • Header: stores the paper’s main information (Appendix C.1) • Institutes: the list of institutes and their InSPIRE-HEP references (Appendix C.2) • Authors: the list of authors and their information, including names, initials, a ﬃ liations, and ORCID(Appendix C.3).The xml ﬁle is used as a role, since it contains all the information needed to build the other ﬁles. It is theﬁrst one to be generated. A backup version of the ﬁrst release of the author list is stored.The acknowledgement tex ﬁle is built using a standard template and is ﬁlled using the FENCE frameworkto retrieve the required information about the ATLAS funding agencies.25 igure 14:

FENCE author list interface: this list contains all the author lists generated, with information about theirpaper’s reference and updates. The ﬁrst ﬁve entries in the list are GitLab projects; the others are stored into

AFS . The

FENCE author list interface, Figure 14, shows the complete set of author lists created for every ATLASpaper that is being submitted or has been published since 2009. They are easily ﬁltered using the

SEARCH box. All the columns are self-explanatory; in the last column the drop-down menu gives access to theauthor list location, which can be distinguished by the icon. A download icon ( I ) means the ﬁles arestored in AFS and can be downloaded. A GitLab icon ( (cid:223) ) means the paper and the ﬁles are located in aPO-GitLab repository. The author lists can be downloaded or displayed in GitLab in the following ﬁleformats: • tex : used by the editors to include the author list into the draft publication; • xml : a structured ﬁle containing all the author list information. It is used by both the arXiv and thejournal as the main database of the paper; • csv : a comma-separated values ﬁle used to export authorlist metadata; • pdf : a view of the author list; • cds : a simple text ﬁle with the author list information in the format author : institute . Once the author list has been sent to the journal with the publication, a check is made to determine whetherthe publisher has correctly used the information provided at the paper production step. This check involves26 comparison of the journal pdf ﬁle that was sent back to the ATLAS Collaboration for a proof review, tothe original xml/tex ﬁle. This process used to be done by hand, requiring the o ﬃ cer to verify that each ofthe ( ∼ ∼ pdf ﬁle) of author lists andacknowledgements provided by the journal with the ATLAS data ( xml ) ﬁle. A report of this comparison,one for every version of the proof, is available to ATLAS PO-Pub o ﬃ cers who check the results. The proofchecker follows this process: • retrieve the information from the xml ﬁle, containing the authors and their a ﬃ liations; • extract the text from the journal’s pdf ﬁle; • parse the text from the pdf ﬁle, creating the target reference; • compare the o ﬃ cial reference obtained from the xml ﬁle with the target reference; • create a report with the di ﬀ erences found between the original and the target reference; • link the report to the main report page, see Figure 15. Figure 15: ATLAS collaboration proofs main page. This web page includes all the information about an ATLASauthor list: its ID, the reference number, the xml ﬁle link, the proof sent to the journal, and the proof checker report.

The main di ﬃ culty with this process is involved with extracting the content from the pdf ﬁle; the text isnot easily retrieved, for a variety of reasons. One is that many elements have to be identiﬁed and ignored,such as row numbers, watermarks, footers, and headings. Another reason is that words extracted froma pdf ﬁle don’t follow a speciﬁc coding convention; the ﬁle can contain non-ASCII characters that canbe output in many di ﬀ erent ways. The pdf ﬁle can specify a predeﬁned encoding standard to use, orprovide a lookup table of di ﬀ erences between a predeﬁned and a built-in encoding standard; for fonts withuncommon Latin characters, which are routine in this kind of publication, special encoding is used. It isnecessary to provide a ToUnicode table where semantic information about the characters is preserved.27lso the proof checker has to pass through all the publication text and recognize where the author liststarts, where it ends, where the institute list starts, and where it ends. All this is made more di ﬃ cult by thefact that di ﬀ erent publishers have di ﬀ erent layouts and create di ﬀ erent versions of pdf ﬁles. This makesthe above problems not generic, but often speciﬁc to a particular publisher.After the target reference is created, the comparison looks for: • authors that seem to be missing from the pdf ﬁle. Here, false positives are often due to characterencoding and spaces; • authors with inconsistent punctuation. This section points out di ﬀ erences between original andtarget references authors’ ﬁrst name punctuation, which can follow the rules X . or X.Y. or X.-Y. or X-Y. with or without space; • institutes that seem to be missing from the pdf ﬁle. Here false positives are often due to non-standardcharacters that break the entry; • institutes with close matches. All the entries that look like the original but have some inconsistenciesland in this group. Some publishers replace USA with United States of America (or vice versa).Sometimes there is a new character that does not break the institute entry, but makes it so that thematch is not perfect, for example, “Università" and “Universit‘ a"; • mismatched authors. All the authors collaborate through one or more institutes. It is checked thatthe link between the author and the institute is consistent. This sometimes results in a false positive,because it is not always easy to extract from the pdf ﬁle the index number of an institute, mainlybecause the text coming from the pdf ﬁle also includes other elements such as line numbers of thedocument. For this reason an author originally assigned to institute number X can end up matchedwith target institute YX , because in the text extracted from the pdf the number X might be precededby a Y line number; institute YX may not exist; • deceased authors. In some cases, ATLAS has tagged authors as deceased but the publication forgotto mark them as such, or vice versa; • missing funding agencies, or those wrongly added by the publisher.In early 2019, due to changes in CERN systems, the component written in PROLOG which ran thecomparison went out of service. This implied an urgent need for a new tool for this task. PROLOG takes adi ﬀ erent approach in a generic problem-solving situation: the expression of the problem is translated in alogic stream without working directly on its resolution algorithm. PROLOG is a language that is di ﬃ cult tomaintain, due to the fact that few developers work with it and its logic programming paradigm. Pythonwas chosen to replace this role.A way to obtain the best match among all the items of an array of institutes and authors was sought,because one cannot rely on ﬁnding an author or institute in the same position of the sequences in the xml and pdf ﬁles. For this purpose the concept of Levenshtein distance (Appendix D.1) was applied, so that aweighted index of similarity can be obtained to decide what is matched with what, and to then e ﬀ ectivelycheck for anomalies.A feature was developed to help the script evaluate as perfect matches some that would not otherwiseappear to be such. A list of synonyms (Section 7.3.1) is created for every entry, author or institute, to teachthe proof checker to validate similar strings when the di ﬀ erences are due only to problems we have whendecoding the text from the pdf ﬁle. So, for instance, if author X. Nonameˇciˇc is not found in the target28eference, but from the pdf entries we extracted an author with name

X. Nonamež ciž c , then, as it hasbeen previously veriﬁed that in the pdf ﬁle the name appears as expected, the proof checker considersit a perfect match, and skips the problem. A very long list of false positives can be found in the reportpage as “skipped items". The list of synonyms is updated manually, but a tool, the Synonym web page(Section 7.3.2), has been created to allow users to update this list themselves.

As introduced in Section 7.3, the comparison between the pdf ﬁle and the xml ﬁle can generate falsepositives. To minimize the list of false positives in the report page, the new version of the proof checkerincludes a synonyms list that allows the comparison script to understand if the di ﬀ erence is a real error oranother correct way to display the same information.An example of a working synonym is: Institute as stored into ATLAS DB & xml ﬁle Physics Department, SUNY Albany, Albany NY, United States of America

Institute as written on the journal’s author list

Physics Department, SUNY Albany, Albany, New York, USAThese di ﬀ erences are acceptable, since the main information is correctly displayed and no real errors arefound.All the synonym records are managed using a JSON ﬁle and are separated into institutes and authors(Appendices D.2 and D.3). Having this as a

JSON ﬁle allows the proof checker script to parse the recordseasily and understand if the faults must be marked as journal errors or can be skipped.

To manage the list of proof checker synonyms, ATLAS provides a web page that allows users to search foran existing entry and manage the recorded synonyms. Searching for an institute or author will display thelist of records that match the search criteria, see Figure 16. This allows users to edit the synonyms for therecord. Clicking the edit icon shows a new page section where users can insert their own known synonymfor the record. After conﬁrmation, this is added to the list of synonyms and is taken into account by thenext run of the proof checker.

The proof checker provides a report after its run, one for each paper and draft version. This report isprovided and stored in a

JSON ﬁle and must be parsed to show the report results in a human-readable way.This is done by the proof_report web page, see Figure 17. The report contains all the paper informationplus the comparison results sorted by topic (see Appendix D.4). The

JSON ﬁle contains more informationthan that which is displayed; this is done to allow the web page to optimize the display of the huge amountof information and to retain data for future improvements. The web page contains some hidden sectionsthat are produced by the proof checker via the known synonyms. These can be displayed by clicking29 igure 16: Proof checker synonyms web page. on ‘Skipped + ’. Here the page will show all the false positive results that the proof checker found on itscomparison, but that are ignored after association with the synonyms.The proof checker helps the Physics O ﬃ ce sta ﬀ in a tedious task, but it is far from being a perfect tool.It needs to be continuously maintained and updated for new cases, changes in publication layouts, andnew conventions in the author lists and their format. Further improvements are planned, with the goal ofreducing the number of cases to be checked manually by the user to just a couple of dozen. The ATLAS database stores data of various kinds that are displayed in di ﬀ erent ways via web pages. The FENCE framework provides an

API to retrieve this information. A call to the

API , allowed after a userauthentication, provides the results in a

JSON format. This kind of information is easily parsed by mostcommon programming languages and is standard for

API results.There are 3 main ways ATLAS provides web pages: • standard HTML pages; 30 igure 17: Proof report web page. • include ﬁles for TWiki pages; • FENCE web pages.The ﬁrst two options run on an ATLAS PO Virtual Machine, which provides scripts, cron-jobs, or

HTML pages to the users. This

Virtual Machine is directly connected to the

FENCE framework to use its

API and retrieve data, parse it, and store it in the EOS ATLAS ﬁle system.

FENCE also allows members who do not belong to the ATLAS Collaboration to access some of theinformation stored in its database. It provides various ways to retrieve and show the information, through a cron-job that runs on the ATLAS PO

Virtual Machine and extracts the data, parses it, and shows it tothe user.An example in which the data are retrieved using the

FENCE API with a cron-job on the ATLAS POVirtual Machine is the ATLAS map web page, Figure 18, where users can see a map of all active memberinstitutes of the ATLAS collaboration.This dynamic web page ﬁlters the results to use only the active institutes. This process is done onthe ATLAS PO

Virtual Machine by a Python script, which makes a request to the

API , parses theresults, and builds its own

JSON ﬁle. This ﬁle contains all the institute information (name, country, links,31 igure 18: The ATLAS map web page. The numbers displayed at a regional level represent the count of institutions. coordinates, etc.) and the layers to build the map. Once the Python script builds the

JSON ﬁle, the output isinterpreted by the web page, which takes care of displaying the layers and the markers for the institutes.

TWiki pages

Displaying information on a

TWiki page requires the use of the

TWiki %INCLUDE function. Thisfunction permits inclusion of the content of the ﬁle that is interpreted by the

TWiki page. Thecontent can be

TWiki code or

HTML code.

HTML code allows a more dynamic page to be created. One canuse javascript , jQuery , and other web development tools to make the content more intuitive to the user.In this case the page will be loaded on-demand and the data will be displayed in real time via the FENCE framework.The public results page, Figure 19, is an example of an include using an

HTML page which retrievesdata using the

FENCE API and displays it to the user into a

TWiki page. This page shows the full list ofpapers,

CONF notes, and PUB notes stored in the ATLAS database and managed by the

FENCE framework.It also allows users to ﬁlter results using the buttons on the top of the page. This page loads ∼ FENCE public pages

Although normally the

FENCE web pages are under restrictions based on users’ roles, the

FENCE framework also allows web pages that should be displayed publicly to be generated. This solution allows the de-velopers to use all the powerful

FENCE functionalities (MBF, for example) and to simplify the data retrieval32 igure 19: Public results

TWiki page. process. In addition, it grants the information to be loaded on demand, without cron-jobs or passingthrough the API in a way that will increase the web page loading time.An example of a public web page completely built using the

FENCE framework is the ATLAS Conferenceand Talks page, as shown on Figure 20. This retrieves all the talks, grouped by their conference andregistered within ATLAS, and displays a summary of all the information for each talk and conference,including speaker, institute, conference name, date, and location. All the table’s columns have searchﬁelds in order to allow the users to easily ﬁnd the talk they are looking for without parsing all the ∼

11 000records displayed on the page. There is an option to ﬁlter the results as future, past, or all (the defaultoption). The page also contains some internal links that point to

FENCE web pages (such as the link tothe speaker proﬁle). Such links are marked as internal because they demand authentication for thesenon-public data.This page was built using the MBF infrastructure described in Section 3.2. Before it, developers had tocreate public web pages on

TWiki or from scratch and retrieve all the data by accessing the databasedirectly. 33 igure 20: The ATLAS Talks by Conference web page.

This article summarises the tools that have been set up to support the publication of documents by theATLAS Collaboration. While the emphasis is on papers published in refereed journals, the technologycreated also supports internal documents and other public documents such as Conference and Publicnotes.The

FENCE framework is used as the backbone of the whole setup and is also used to interface the web-based tracking of the status of an analysis with the documentation in GitLab. Extensive use is made of theContinuous Integration tools available in GitLab to ensure that documents can easily be submitted to thearXiv and journals as soon as they have been approved by the collaboration.The software solutions described in this document are now used to accompany the whole of a physicsanalysis, from the expressions of interest by research groups, to the ﬁnal journal publication. They alsoinclude the generation of the appropriate author list and process the proof-reading.34he tools are used by the whole collaboration and minimise the amount of manual work required forrepetitive procedures, easing the workload of editors, editorial boards, Management, and the Physics O ﬃ ce.At the same time, all documents connected to an analysis can now be accessed from a central tool wherethe experiment’s rules and knowledge are codiﬁed and made available in an intuitive way. Acknowledgements

The authors are indebted to the ATLAS Collaboration for the support provided to achieve the resultsdescribed in this paper. We are grateful to ATLAS collaborators who provided invaluable commentsand input to the paper and the framework it presents. Special acknowledgements go to Marzio Nessi forhelping initiate the Glance project in ATLAS and for supporting its development, and to Kathy Pommesfor supervising the Glance team at CERN. Special thanks to Giordon Stark for thoroughly reviewing thispaper. 35 eferences [1] ATLAS Collaboration,

The ATLAS Experiment at the CERN Large Hadron Collider , JINST (2008)S08003.[2] Apache Subversion Documentation , url : https: // subversion.apache.org / docs / .[3] B. Lange, An object-oriented approach to deploying highly conﬁgurable Web interfaces for theATLAS experiment , J. Phys.: Conf. Ser. (2015) 062026.[4]

HEP inSpire , url : http: // inspirehep.net / .[5] The Latex project , url : https: // / .[6] PHP SplObjectStorage class documentation , url : http: // php.net / manual / en / class.splobjectstorage.php.[7] GitLab o ﬃ cial website , url : https: // gitlab.cern.ch.[8] Git o ﬃ cial website , url : https: // git-scm.com / .[9] Gitlab REST API documentation , url : https: // docs.gitlab.com / ee / api / .[10] PHP CURL methods , url : https: // / manual / pt_BR / book.curl.php.367 ppendix A Classes for analysis and paper phases

A.1

Graph class abstract class Graph{abstract public function addNode(Node $node);abstract public function deleteNode(Node $node);abstract public function addEdge(Edge $edge);abstract public function deleteEdge(Edge $edge);abstract public function clean();}

A.2

Action class class Action{protected $inputs;protected $outputs;protected $callback;public function __construct(){$this ->inputs = array();$this ->outputs = array();}public function setCallback($callback){$this ->callback = $callback;}public function getCallback(){return $this ->callback;}public function setInputs($inputs){$this ->inputs = $inputs;}public function trigger(){$this ->outputs = call_user_func_array($this ->callback , $this ->inputs);}public function getOutputs(){return $this ->outputs;}}

A.3

User authorisation class public function is_expert(){return $this ->has_egroup("fence -developers"); public function hasPermission($permission){return in_array($permission , $this ->permissions);} B FENCE and GitLab integration classes

B.1 Methods public function get($endPoint , $data = null){$this ->init($endPoint , $data);return $this ->exec();}public function post($endPoint , $data = null){return $this ->execMethod(’POST ’, $endPoint , $data);}private function delete($endPoint , $data = null){return $this ->execMethod(’DELETE ’, $endPoint , $data);}private function put($endPoint , $data = null){return $this ->execMethod(’PUT ’, $endPoint , $data);}

B.2 execMethod function private function execMethod($method , $endPoint , $data = null){$this ->init($endPoint);$this ->setMethod($method);if ($data) {$this ->setBodyData($data);}return $this ->exec();}

B.3 createProject function public function createProject($project , $parameters = []){$name = $project;if ($project instanceof Project) {$name = $project ->name();}\FENCE\Logger::debug("Creating project = {project}", ["project" => $project]);$payload = $this ->post(’projects ’, array_merge([’name ’ => $name], $parameters));if (! isset($payload[’id ’])) {throw new Exception\ProjectAlreadyExistsException( son_encode($payload));}return new Project($payload[’id’], $payload);} C Author list ﬁles

C.1 Author list XML ﬁle header

C.2 Author list XML ﬁle institutes physsci.adelaide.edu.au/hep/Department of Physics , University of Adelaide , Adelaide , Australia Adelaide U., Sch. Chem. Phys.275Adelaide member \

C.3 Author list XML ﬁle authors FirstName LastName FirstName LastName F. LastName INSPIRE -00000000 0000-0000-0000-0000

D Proofs checks

D.1 Levenshtein distance

Mathematically, the Levenshtein distance between two strings a, b of length | a | and | b | respectively is givenby lev a,b( | a | , | b | ) where 40here 1(ai (cid:44) bj) is equal to 0 when ai = bj and equal to 1 otherwise, and leva,b(i,j) is the distance betweenthe ﬁrst i characters of a and the ﬁrst j characters of b. D.2 Institutes {"id": "2","original": "Department of Physics , University of Alberta , Edmonton AB, Canada""synonyms": ["Department of Physics , University of Alberta , Edmonton , Alberta , Canada"],}

D.3 Authors {"original": "A. B\\\"ubbbbbb","inspire": "INSPIRE -00000000" ,"foafName": "Aaaa Bubbbbbb""synonyms": ["A. B\u00f2bbbbbb", "A. B\u00a8 bbbbbb"],}

D.4 Report {"ref_code": "EXOT -2017-24","ref_date": "2018-07-31","creation_date": "29-Oct -2018","publisher": "’APS ’","document": "doc1053","filename": "LY15578_proof_v2","authors_missing_skip": [...],"authors_missing_list": [...],"authors_puntuation_list": [...]"institutes_missing_pdf_list": [...],"institutes_missing_pdf_skip": [...],"authors_mismatched_list": [...],"authors_not_deceased_list": [...],"authors_deceased_list": [...],"institutes_close_matches_list": [...],"founding_agencies_missing": [...],"founding_agencies_wrong": [...]}{"ref_code": "EXOT -2017-24","ref_date": "2018-07-31","creation_date": "29-Oct -2018","publisher": "’APS ’","document": "doc1053","filename": "LY15578_proof_v2","authors_missing_skip": [...],"authors_missing_list": [...],"authors_puntuation_list": [...]"institutes_missing_pdf_list": [...],"institutes_missing_pdf_skip": [...],"authors_mismatched_list": [...],"authors_not_deceased_list": [...],"authors_deceased_list": [...],"institutes_close_matches_list": [...],"founding_agencies_missing": [...],"founding_agencies_wrong": [...]}