A continuous integration and web framework in support of the ATLAS Publication Process
Juan Pedro Araque Espinosa, Gabriel Baldi Levcovitz, Riccardo-Maria Bianchi, Ian Brock, Tancredi Carli, Nuno Filipe Castro, Alessandra Ciocio, Maurizio Colautti, Ana Carolina Da Silva Menezes, Gabriel De Oliveira da Fonseca, Leandro Domingues Macedo Alves, Andreas Hoecker, Bruno Lange Ramos, Gabriela Lemos Lúcidi Pinhão, Carmen Maidantchik, Fairouz Malek, Robert McPherson, Gianluca Picco, Marcelo Teixeira Dos Santos
EEUROPEAN ORGANISATION FOR NUCLEAR RESEARCH (CERN)
CERN-OPEN-2020-00715th May 2020
The ATLAS Publication Process Supported by ContinuousIntegration and Web Framework
Juan Pedro Araque Espinosa f , Gabriel Baldi Levcovitz a , Riccardo-Maria Bianchi i , Ian Brock d ,Tancredi Carli b , Nuno Filipe Castro f,g , Alessandra Ciocio h , Maurizio Colautti e , Ana Carolina DaSilva Menezes a , Gabriel De Oliveira da Fonseca a , Leandro Domingues Macedo Alves e , AndreasHoecker b , Bruno Lange Ramos a , Gabriela Lemos Lúcidi Pinhão a,f , Carmen Maidantchik a ,Fairouz Malek , Robert McPherson j , Gianluca Picco e , Marcelo Texeira Dos Santos a a Universidade Federal do Rio De Janeiro COPPE / EE / IF, Rio de Janeiro, Brazil b CERN c LPSC, Université Grenoble Alpes, CNRS / IN2P3, Grenoble, France d Physikalisches Institut, Universität Bonn, Bonn, Germany e Dipartimento Politecnico di Ingegneria e Architettura, Università di Udine, Udine, Italy f Laboratório de Instrumentação e Física Experimental de Partículas, Lisbon, Portugal g Departamento de Física, Escola de Ciências, Universidade do Minho, Braga, Portugal h Lawrence Berkeley National Laboratory and University of California, Berkeley, USA i Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh PA, USA. j Department of Physics and Astronomy, University of Victoria, Victoria BC, Canada
The ATLAS Collaboration defines methods, establishes procedures, and organises advisory groups tomanage the publication processes of scientific papers, conference papers, and public notes. All stagesare managed through web systems, computing programs, and tools that are designed and developedby the Collaboration. The Phase 0 system was implemented using the FENCE framework and isintegrated into the CERN GitLab software repository, to automatically configure workspaces where theanalysis can be documented and used by the analysis team and managed by the conveners. Continuousintegration is used to guide the writers in applying accurate format and valid statements when preparingpapers to be submitted to scientific journals. Additional software assures the correctness of otheraspects such as lists of collaboration authors, funding agencies, and foundations. The ATLAS Physicsand Committees O ffi ce provides support to the researchers and facilitates each phase of the publicationprocess, allowing authors to focus on the article’s contents that describe the results of the ATLASexperiment.c (cid:13) Contact editor: [email protected] a r X i v : . [ c s . D L ] M a y ontents FENCE main classes 83.2
MBF (Models, Builders, and Factories) infrastructure 113.3 Configuration files in
FENCE Phase 0 repository 144.2
Phase 0 main functionalities 14
API
TWiki pages 328.4 Data on
FENCE public pages 32
Appendix 38
A Classes for analysis and paper phases 38
A.1
Graph class 38A.2
Action class 38A.3
User authorisation class 38
B FENCE and GitLab integration classes 39
B.1 Methods 392.2 execMethod function 39B.3 createProject function 39
C Author list files 40
C.1 Author list XML file header 40C.2 Author list XML file institutes 40C.3 Author list XML file authors 40
D Proofs checks 40
D.1 Levenshtein distance 40D.2 Institutes 41D.3 Authors 41D.4 Report 413
Introduction
The ATLAS Physics and Committees O ffi ce (PO) is one of the ATLAS Collaboration’s [1] executivecommittees. It is constituted by physicists and engineers performing tasks connected to the continuoussupport of committees and groups including the ATLAS Management, the Physics Coordinators, thePublication Committee, analysis group conveners, the Authorship Committee, the Speakers Committee,and many others. The PO also provides assistance to any member of the ATLAS collaboration, by forexample facilitating membership, authorship, paper submission to the arXiv and journals, and reviewingtalks and posters for national and regional meetings.The PO supports the development of several tools including those used to manage physics analyses,prepare and submit papers, distribute detector performance documents, and track conference proceedings.It uses web-based systems to implement the metadata connected with analyses, version control for editingdocuments, and author lists. PO members are available to guide users in understanding the tools. The POalso assists with other daily tasks to lower the load on each member of the collaboration.The ATLAS Collaboration has a dedicated organisational structure for work on detector maintenanceand operation, data analysis, and scientific publication and outreach. Collaborative tools are needed toprovide e ffi cient communication among collaborators and straightforward interaction with the journals, theinstitutions, and the funding agencies.This report is focused on the infrastructure for managing analysis and papers, especially its most recentdevelopments which were launched in Fall 2017. Due to the phasing out of the SVN system [2], a newsystem was built using the FENCE [3] framework, described in Section 3. This is now used to handle anyanalysis or document type, for internal use or for a large publication, as is described in Sections 2 and 4.The framework is used not only for ATLAS document handling but for the organization of informationabout other entities including members, institutes, appointments, equipment, talks, and conferences. It isalso used by the ALICE experiment to organize information on members, appointments, funding agencies,institutes, the author list, and shift bookings. The LHCb experiment uses the framework for members,appointments, and institutes management. ATLAS has very specific needs for each task, requiringintegration with a single database. For this reason a flexible custom solution had to be developed.The new system is based on Git and the associated CERN GitLab code repository hosting platform.Development of a special
FENCE –GitLab integration has been necessary, as is detailed in Section 5. TheATLAS GitLab area for editing the documents and submitting the papers to the journals, PO-Gitlab, isdescribed in Section 6. A description of the main tools used to support the collaboration author list and theacknowledgements of funding agencies and foundations is given in Section 7. A more general descriptionof the way the metadata are managed is presented in Section 8.
The ATLAS experiment supports a wide physics programme to explore the fundamental nature of matter.To do so, it makes use of the Large Hadron Collider (LHC), which collides protons at almost the speedof light and a centre-of-mass energy of 13 TeV. To carry out such a physics program, physicists needsoftware and graphical tools to analyse the data and compare them to theoretical models.4TLAS is organised into several Physics (
PHY ) and Combined Performance ( CP ) working groups andsubgroups. These groups are coordinated by conveners appointed by the collaboration for typically twoyears. Example names of PHY and CP groups include Top Quark ( TOPQ ), Standard Model (
STDM ), B -physics( BPHY ), Higgs (
HIGG ), Electron / Gamma (
EGAM ), and Jet and EtMiss (
JETM ). Studies of system detectors(
SYS ) and activities such as software (
SOFT ) and data preparation (
DAPR ) are also organised hierarchicallywith subgroups and conveners.Once an analysis is finished or an aspect of the detector performance has been studied in detail, somemembers of the analysis team prepare a publication. These writers are known as the editors. In fundamentalresearch, as is the case with the research conducted at CERN, the publication of the results is a dutyand is the usual way to show the results publicly and to report outcomes to the taxpayers and fundinginstitutions.ATLAS produces six di ff erent types of documents: • PAPER : general publications in refereed journals, based on collision data analyses and detectorprojects; • PUB notes: public documents classified as a note; they sometimes use only simulated data; • PROC and
CONF notes: conference proceedings and notes containing preliminary results, respectively,which are shown at conferences; • INT : internal notes or technical documents. • PLOT : plots that can be used along with the above-mentioned documents.All ATLAS analyses are discussed and presented in the relevant working groups which have the respons-ibility, together with the subgroups, to provide guidance, help, and / or resources to the analyses in boththe early stages of an analysis and during its development. The working groups should also develop acoherent and realistic plan for the release of the results for a conference and / or journal publication. This isa necessary step before any paper draft can be planned or circulated. This procedure and the related steps(or phases) are described in an ATLAS internal document and are summarized below. • For an INT, CONF, or PUB note, the procedure has two phases:
Phase 0 and
Phase 1 . For a paperthat is sent to a peer-reviewed journal, the procedure has four phases, which include, in addition tothe two above,
Phase 2 and
Submission . • The start of an analysis or a document is done at
Phase 0 of the
Analysis
FENCE interface (alsocalled the
Analysis
FENCE page). The
Analysis
Team (AT) starts their analysis and beginswriting drafts and supporting documents. The type of document could be
PAPER , CONF , or
PUB .Some important settings are established at the start of an
Analysis , including the constitution of theAT, the appointments of the group and sub-group conveners in charge of oversight of the analysis,and the constitution of an Editorial Board (
EdBoard ). From the start of
Phase 0 , they are assigneda dedicated GitLab space, a repository, with which to edit their documents. A GitLab repositorywith a skeleton
INT note is created by default at
Phase 0 . • The
EdBoard reviews the complete analysis and ensures that any documentation or paper drafts areprepared according to ATLAS policies. Once it is satisfied, it signs o ff on the draft PAPER or CONF note before its distribution to the ATLAS collaboration for review. The
EdBoard should verify thatthe analysis is worth publishing in the proposed form and consult with the Publication Committee(
PubComm ) chair if there are doubts. It should also establish with the editors and conveners whether5he paper should be a letter or an article, and propose a journal. These steps and the validationworkflow are performed during
Phase 1 and
Phase 2 , related respectively to the first and secondcirculation of the draft document to the collaboration for their comments. During the circulationperiods, authors can read and comment on the paper draft. • The
PubComm chair has the responsibility to assess the quality of the paper and ensure that theATLAS guidelines and policies are followed. A Physics Approval Meeting is held after the firstcirculation, followed by a Physics Closure Meeting after the second circulation. After a sign-o ff of the revised draft following second circulation by the EdBoard , the draft goes to the Chair of the
PubComm for a final sign-o ff . • The ATLAS Spokesperson ( SP ) is ultimately responsible for the scientific quality of the results fromthe ATLAS Collaboration and makes a final review of each paper before the Submission . The finaldraft is signed o ff by the SP or his / her delegate. • When the SP has signed o ff , the validation workflow at Phase 2 is finished. A message to thePhysics O ffi ce Publications team ( PO-Pub ) is generated to inform them of a new document to submit.
PO-Pub o ffi cers then proceed with the submission to the arXiv and the peer-reviewed journal. Theyare responsible for communication with the journal during all the steps (referee reports and proofs)through a dedicated Submission workflow. The submission is completed once the document ispublished online. The journal references are implemented at the last step of the workflow, whichcloses the procedure and makes available the references of the publication on the arXiv, public webpages, and the in SPIRE-HEP database [4].
CONF and
PUB notes use only the
Phase 1 workflow. The steps and validation workflow are implementedin systems developed using the
FENCE framework, which is described in Section 3. The related web-basedsystems encompass all of the
Phase 0, 1, 2, and
Submission steps and are described in Section 4with a focus on
Phase 0 . If necessary, at
Phase 0 , editors may request the creation of dedicated Gitlabrepositories appropriately configured through the
FENCE -GitLab integration, as is described in Section 5.The metadata filled in any of the
Phases are exported to web sites to display the necessary information,including the Public Results pages that are explained in Section 8. Some of the metadata are also usedinternally by the collaboration to monitor the journal submission process or related activities. The GitLabContinuous Integration ( CI ) tools, which are explained in Section 6, allow validation of the documentdrafts and preparation of the appropriate ready-to-go tarball , a compressed set of files, containing thefull LaTeX [5] resources and files for the submission to the peer-reviewed journals.For the category PAPER , a longer process is carried out by the
PO-Pub o ffi cers. They check the author listand the acknowledgements. Author lists and acknowledgements are both handled and generated throughthe FENCE framework, described in Section 3, and their production is described in detail in Section 7.1.Before the final publication, and after the refereed review and acceptance by the journal, proofs are sent tothe collaboration for a last check. While the editors proofread the content of the paper within a short periodof time, usually two days, the
PO-Pub o ffi cers check whether the authors and their a ffi liations have beenappropriately handled by the journal, through comparison to the original files sent to them. This check isperformed automatically using a tool called the Proof Checker , which is described in Section 7.3.6 igure 1: The ATLAS web-based
FENCE systems
FENCE is an object-oriented PHP [6] framework designed for the development of web applications. Itencompasses the concepts of encapsulation, data abstraction, polymorphism, and inheritance.
FENCE usesan
ORACLE database (DB) to store the data fetched and displayed in its interfaces. Although ORACLEis the default DB management system used, with some development e ff ort, one can use instead otherrelational database services such as MySQL and
Microsoft SQL Server .A class can be defined as a template that describes the behaviour that the object of its type supports.
FENCE assembles classes to build applications by making extensive use of configuration files, which are loadedinto the engine at each request. It then generates the HTML response on the user’s browser. The classescan be inherited by the systems that make use of the framework, and therefore, the code can be reused,with similar features implemented from the predefined software components. As a consequence, thedevelopment process is accelerated and the maintenance cost is reduced.The
FENCE software development process encompasses software engineering methods such as requirementsanalysis, architecture, design, testing, deployment, and maintenance in order to guarantee the qualityof the software. Requirements are gathered and documented prior to the solution design and, in thisway, developers are able to propose broader solutions that can benefit the whole project. After anyimplementation, tests are performed to assure software correctness, robustness, extensibility, and re-usability.Figure 1 presents the fifteen ATLAS web-based systems currently in production. These were developedusing the
FENCE framework, which facilitates their maintenance and enhancement. They can be dividedinto three categories: people, publications, and equipment. The people-related ones have features for man-aging personal information of the ATLAS members, including their contracts, appointments, a ffi liations,nominations, conferences, theses, and research activities. The systems related to publications automate theprocess of producing papers, conference and public notes, and weekly performance plots from collisiondata, for review. Those related to the equipment handle information about system detectors’ design andinterconnection. 7 .1 FENCE main classes
The
FENCE framework is composed of a library of helper classes that are extensible program-code templatesfor creating objects. Any new class can be coded and added to the framework, widening its scope, andcan then be reused in di ff erent systems. One example is the Search class that provides methods to createsearch interfaces that allow data filtering through predefined search attributes. The
SuperSearch classo ff ers an advanced search interface, where the user can build logic queries with AND and OR operators. Theinputs that are entered into a form can easily be added using classes such as TextInput , DateInput , and
MemberInput , which provides a selection box with the list of all members of an experiment. The mostimportant classes, developed to support the ATLAS publication process, are described in the followingsections.
Workflow class
The
FENCE Workflow represents any process involving states and actions triggered by a change from onestate to another. It was used to implement the web system that supports the ATLAS publication process,which is organised in phases. Each phase is divided into several steps separated by actions. Each step canactivate a number of tasks including the recording of metadata into the ATLAS database, triggering of anE-group creation, activating an update on GitLab [7], and sending automatic emails.The
Workflow class was developed based on the concept of Directed Cyclic Graphs (DCG) that encom-passes the relation between objects. Objects are called nodes and the relations between them are callededges, implying a directional flow. To represent this concept, some classes were created. The abstract
Graph , whose corresponding code can be found in Appendix A.1, has methods that allow the additionand deletion of nodes and edges. The class that implements
Graph is called
MapperGraph . It storesnodes and edges inside a PHP data structure called
SplObjectStorage that, for this implementation, canbetter manage objects than associative arrays. The use of this data structure allowed the development ofvery simple methods to retrieve neighbour nodes or edges given an origin and target node, which meansretrieving a directional edge.The
Node class defines methods to set and get data related to one node. The
Edge defines similar methods,but related to an edge. An example of data that can be added to a node is an instance of the
Action , havingmethods to set and get function callbacks, defining its arguments, and being able to access its outputs.More details about the
Action class implementation can be found in Appendix A.2.The behaviour of the
Workflow is controlled by a JSON file, following the
FENCE pattern described inSection 3.3. This file defines a workflow’s steps, their order, and the actions that can be triggered at a givenstep. The
Workflow class uses the
MapperGraph, Node, Edge , and
Action classes to build a graphand its elements.
The
Messenger class is used by the
Workflow to send automatic emails and to allow users to edit emailtemplates. The
JSON file used by the
Workflow defines email template names to be triggered by an action.These templates and their variables are stored in the database in two
JSON files. The first one containsall the templates with variables to be substituted, and the second contains the variables’ identifiers andthe methods used to substitute them into the templates before sending the email. Using another class
DBJReader , the
Messenger can read these
JSON files from the database. It can then either get thetemplates and show them in the interface, so the users can edit them, or parse the variables and send anemail. In the first case, the changes applied in the templates are saved in the database, but this time usingthe
DBJWriter . In the second case,
Messenger will substitute all the variables in the template and use the
Mailer , designed to send automatic emails and to trigger the email to the correct recipient. A summary ofthis infrastructure is illustrated in Figure 2.
Figure 2: Summary of the
Messenger infrastructure. The two
JSON files related to the
Messenger are stored in thedatabase. The class retrieves them using the
DBJReader . Messenger can either use the
Mailer to trigger emails orrender email templates so users can edit them through a
FENCE system interface, illustrated by the computer in thefigure. After the user edit is complete, the
JSON email templates are saved into the database using the
DBJWriter . EgroupManager class
The
EgroupManager class is similar to the
Messenger class, since it also gets a template from a
JSON file and substitutes variables. The di ff erence is that the templates are not related to emails, but to E-groupconfigurations. It does not allow users to edit the templates from the interface since they contain manytechnical details.The EgroupManager class uses another
FENCE class , called
JReader , to get the templates from the
JSON file. This class was designed to parse
JSON files and store them in an object. After getting the
JSON templates, the
EgroupManager parses them, substituting all the variables.With the template parsed, the
EgroupManager uses the
FENCE EgroupSOAPHandler to communicatewith the E-groups
API . To do so, it first makes an authentication. Using the methods available in the SOAPWebServices, it can create, update, and delete E-groups.9 .1.4
User class
The
User class supports access control of the interfaces.The main purpose of the
User class is to define an object that stores information concerning the connectedATLAS member (connected to the main CERN authentication server), including the CERN CCID (CERNComputing ID), first and last name, E-groups, and others attributes. It also defines specific methods tofacilitate access control within the interface.In Appendix A.3, there are two examples of the above-mentioned methods. These are used to check user au-thorisation: is_expert() checks if the user is a member of the E-group of
FENCE team developers, whichis composed of the project developers. The method
Permission($permission) accepts a permission tobe checked as an argument and verifies if it is in the user permissions inventory.When an extension of
User is created, extra methods are appended to
User to provide specific
Utils fora context,
Utils being a
FENCE class that contains useful public methods used by many other classes ofthe framework. Every system has its own
User class extending the
FENCE core
User class. Systems maytherefore have specific methods that are used to grant edit permissions and control user access.Configuration files, described in Section 3.3, provide multiple properties that set access control and editpermissions. This is mainly achieved in two ways. General access control is set using CERN E-groups,or
FENCE user groups, including experts, administrators, and many others. In this case,
User verifies theclearance by comparing the user actual E-groups and user groups to the required ones. The other wayconcerns edit permissions and uses specific roles. These roles are keys mapped to methods in
User thatcheck if the member is supposed to have edit permission on that specific field. An example is shown inListing 1.
Listing 1:
User example "pub_short_title": {"label": "Public short title","sublabel": "Plain text , no LaTeX","type": "textarea","rules": {"maxlength": 1000,"character_not_allowed": ["\\", "$"]},"analysis_roles": ["GROUP_CONVENER","SUBGROUP_CONVENER","PROJECT_LEADER"]}
Taking the
GROUP_CONVENER role as an example, it uses the following method to grant permission to editthe public short title field, see Listing 2:
Listing 2:
GROUP_CONVENER method example public function is_subgroup_convener($publication_id = null){if ($publication_id === null) return false;foreach ($this ->subgroup_convener_publications as $line => $row) {if (strpos($row[’PUB_LIST ’], strval($publication_id)) !== FALSE) return true;}return false;} .2 MBF (Models, Builders, and Factories) infrastructure
Models , Builders , and
Factories are all heavily used software design patterns. Their combined use isa particular feature of the
FENCE framework. The main goal of these development standards is to createa wrapper to store complex objects and facilitate their construction in di ff erent contexts, working as anSQL query builder. For instance, it would be possible to pass an actual SQL Query every time informationfrom the database is needed. It is, however, much more convenient to just call a class that handles thequeries and presents to the user the needed object. The desired behaviour described here is exactly how the
MBF infrastructure works. In
FENCE classes, objects are constructed simply by instantiating specific factoryclasses. For instance, in the example below, a member is constructed by instantiating the Member Factory,see Listing 3:
Listing 3:
MBF method example public function buildMemberByID($memberId){$DBManager = new DBManager;$order = ["first_name","last_name","email"];$memberFactory = new MemberFactory($DBManager ,$order);$member = $memberFactory ->build($memberId);return $member;}
In this example,
MemberFactory extends the core
Factory , which handles the whole process thatconnects to the database and assembles objects. An object containing the order of properties to be built ispassed as an argument to the instantiated factory. In the example of Listing 3, the member factory providesthe first and last name as well as the email address of a member.When a specific new object needs to be created, a group of three files is needed: the
Factory , the
Builder ,and the
Model files. The first one stores the inventory of factories a specific
Factory connects to andsets which
Builder it uses. The next stores the relation between the database structure and the
Model ,assigning table columns to its
Setters . Finally, the
Model is the class that is populated by the
Builder and stores the information in structured objects that can be accessed through
Getters .From a perspective opposite to that of the paragraph above,
Model s are classes that serve as orientedobject representations of the information. They define several set and get methods that handle specificproperties of the object. These models are used in
Builders , where the actual query is set and databasecolumns are associated with a model set method. Finally, a
Factory calls its corresponding
Builder andcontains an inventory, which may be empty, of other
Factories that are related to this object.
FENCE
The
FENCE framework is based on configuration files that provide the necessary parameters and propertiesto build interfaces. The main goal of this infrastructure is to simplify many aspects of web systemrequirements. The configuration files are in
JSON , a lightweight format for storing and transporting data,11nd since those can be transformed in structured objects, developers can easily define a group of propertieswithin specific contexts. For instance, it is possible to set up which groups of users can have access toa certain interface. Another benefit of using configuration files is that major classes that have severalarguments and environment parameters can be instantiated in a cleaner way, with just a configuration filepath as argument. With that, developers feel encouraged to develop more generic and robust features, sincethey can be easily reused in the future.Along with the configuration file concept, additional utilities were developed to guarantee the feasibilityof this idea. One of these tools is the class
JReader , which provides functionality for template variablesubstitution and
JSON schema validation. Another one is the
FENCE Content , which gets some defaultinformation from configuration files to handle common interface needs, such as access control, constants,and rendering outline formats.Most of the time, when a new interface is created using
FENCE , the class that generates the particularcontent of this page inherits the
Content . At the same time, as is described in the Unified Model Language(UML) in Figure 3, the
Content has a configuration file path as argument. This configuration file path ispassed to an instance of
JReader constructed within
Content . The
JReader method parse_contents makes available for
Content the corresponding configuration file content.
Figure 3:
Content
UML diagram describing its interaction with the
JReader class.
To automate the process of conception, evolution, review, and approval of publications described inSection 2, five web-based systems were developed using the
FENCE framework:
PAPER s, CONF notes,PUB notes , PLOTs and
Phase 0 , see Figure 1. Together they are called the Analysis Web systems. Thefirst four are described here briefly, while the last one is presented in detail.The relationships among the five Analysis web-based systems are represented in Figure 4. The
Phase 0 system has been implemented to support the publication process. The evolution of the process from thecreation of a
Phase 0 to the other publication systems (
PAPER , CONF , and
PUB ) is described in Section 2.For the review and approval process of a publication set of plots, there is also the
PLOTs system, which canbe used during all phases whenever a new plot is sent for circulation to and review by the collaboration.12 igure 4: The relationship between the Analysis Web systems and their phases.
The
PAPER features functionalities for inserting, retrieving, editing, and deleting the properties of apaper in a database, through managing the activity flow of its three phases:
Phase 1 , Phase 2 , and
Submission .The
CONF notes system incorporates notes that should be presented at a conference. The
PUB notes system incorporates public notes that should be presented to the scientific community without beingsubmitted to a journal or presented at a conference. The
PLOTs system handles the plots that are usedto present results in all other types of publications mentioned so far. Those three systems presentfunctionalities for inserting, retrieving, editing, and deleting the properties of their entities in a databaseand also manage the workflow of each system’s
Phase 1 .The
PAPER , CONF notes,
PUB notes, and
PLOTs systems are quite similar, di ff ering only in the numberof phases and the workflow / steps involved in each one. They also resemble the actions related to eachphase’s steps, which can be: saving data in the database, sending automatic emails, or creating or updatingE-groups.The need for the Phase 0 system arose in 2017, when the ATLAS IT department downgraded the ApacheSubversion (SVN) [2] version control system and encouraged its members and authors to use Git [8]because of its decentralised characteristic, which is better adapted to the situation of the collaborators.The experiment started to use the repository platform GitLab [7] because of its continuous integrationfunctionality, the possibility of storing repositories in private servers, and the provision of an API withmany services.The transition period has triggered the need for a tool that can communicate with the GitLab API andcreate automatically-configured Git repositories with each publication’s unique metadata. To formalisethe creation of repositories at the beginning of the publication writing process, the concept of
Phase 0 emerged. It was recognized that this could also include the flow of tasks during the preliminary stage ofthe editorial process, when it is not yet known whether scientific content will materialise into a paper,conference note, or public note. So, in March 2017,
Phase 0 web-based system development waslaunched.The system provides functionalities to support and formalise the initial stages that may lead to a publication,before accessing the
PAPER , CONF , and
PUB notes production process.
Phase 0 can trigger di ff erent13ypes of processes, including an Analysis workflow towards a
PAPER or a
CONF note which gathers allthe physics and combined performance analysis activities (
PHY , CP ). One can also skip the AnalysisWorkflow towards a
CONF/PAPER or a
PUB note. This is allowed for a
PUB note, which is usually asimulation work or an instrumental description. It is also allowed for a
PAPER/CONF intended for aninstrumental description purpose, or for a physics
CONF note that should proceed as quickly as possiblethrough internal review so that it may be used at a conference.
Phase 0 is the common stage for
PAPER , CONF , and
PUB note workflows, before
Phase 1 . It storessome metadata divided into steps, e.g. meeting dates, comments, links, groups of people such as AnalysisContacts, target dates for analysis finalisation, editorial board members and meetings, and approval sign-o ff dates. As is described in Section 2, each of those metadata should be filled in a specific order by userswith the appropriate permissions and should trigger automatic emails or E-group updates all along theprocess. Phase 0 repository
The first step of
Phase 0 system implementation was the data modelling to identify the system’s entitieswith their attributes and relationships. A simplified version of this study will be presented next.The main entity of the system is a
Publication , which has attributes such as title, reference code, andcreation date. A publication is always related to a
Group and, most of the time, a
Subgroup , whoseattributes are name and description.
Members of the ATLAS experiment are related to a publication by one or more
Roles such as AnalysisTeam or Editorial Board member. A
Member has attributes such as his / her first name, last name, andprimary email address. The attributes of a Role are its name, type, start date, and end date.A publication contains
Phases (in this system, only
Phase 0 ), whose attributes are the start date andits status. During
Phase 0 steps, many
Meetings take place, and their attributes are title, date, andcomments.Some external
Contents are associated with
Phase 0 , such as notes containing supporting documentationfor the publication and meeting minutes that are stored on the CERN document server. This entity has asits attributes the name of the content, its type, and its web address.Phase 0 is also related to
Deadlines by which people finish their activities. A
Deadline has as attributesits type and its date.
Phase 0 main functionalities
The
Phase 0 system has three main functions. The first refers to the insertion of a new publication, whenthe members of the ATLAS experiment decide to publish the results of their work and need to define theprincipal data of the article or public note in order to start writing. The interface presents a web form thatcontains several fields that define the main information of the new publication. These include its title,reference code, groups, subgroups, and keywords. The second interface presents the search functionality.With this a user can search for publications by setting filters, and can write reports through the results table.The third interface allows editing of the information about a publication, facilitates the monitoring andevolution of
Phase 0 , and enables the automatic creation of Git repositories.14 igure 5: The analysis submission functionality in the
Phase 0 system. On the left is a summary of all steps neededto complete the data submission. On the right are the fields that belong to the first step.
The functionality to submit a new analysis can be seen in Figure 5. Through this, a member fills out a formin steps. The mandatory fields in each step are indicated by asterisks (*). Information on how to fill eachfield are defined by the ‘i’ icon next to the field name. At the end of all steps, there is a confirmation stepwhere the user can verify whether all fields have been filled in correctly. If so, the form information canbe stored in the database, which now gathers the information that defines an analysis such as its title andreference code.The advanced search functionality of the
Phase 0 system, shown in Figure 6, allows a user to definecriteria through three fields. The first defines a publication attribute, the second selects an operator, and thethird allows a value to be entered. One or more search criteria can be selected and arranged by forminglogical expressions using the AND and OR operators. Users can also configure the search results by settingthe ordering of the records in ascending or descending order, grouping them by attributes, selecting thevisible attributes, and saving those configurations for use in a future search. Search result reports can alsobe exported in CSV file format.Finally, the publication details interface, the main interface of the system shown in Figure 7, presentsmetadata and allows editing of it. The interface also controls the workflow of
Phase 0 activities, providingan overview of all its stages and highlighting the previous, current, and upcoming ones. A transitionbetween
Phase 0 steps triggers actions. The most common is storing data in the database. If allowed, auser has the option of saving the data to the repository and staying at the same step by pressing the ‘Save’button; or saving the data and going to the next step by pressing the ‘Proceed’ button. When one movesforward in the workflow, the system triggers automatic messages that alert and provide instructions to theperson responsible for the next step.An example of a
Phase 0 step that is part of a workflow is the Editorial Board “request meeting andformation data" step which is illustrated in Figure 8. The group convener is responsible for adding theEditorial Board “request meeting" title, date, comments, and links. The Publication Committee Chair is15 igure 6: Advanced search functionality of the
Phase 0 system. It presents fields to define search criteria to beadded in the Logic Workspace area forming logical expressions. responsible for appointing the Editorial Board members and filling in the date on which they are appointed.Once all this information is in the system, the Publication Committee Chair can proceed to the next
Analysis workflow step. Subsequently the Editorial Board E-group is automatically created, includinginformation for all its members, and an email is sent, informing them that they were appointed and shouldproceed to the next step of the
Analysis workflow.The
Workflow , Messenger , EgroupManager , and
User FENCE classes (mentioned in Section 3) andthe
MBF infrastructure made possible the development of the
Phase 0 system workflow. They do not,however, include the GitLab Integration, a key feature of the system, which is explained in detail in thenext sections.
As was mentioned in Section 4, the
FENCE Phase 0 system was designed and implemented to provideautomatic creation of Git repositories to simplify the analysis and the editing of any type of draft to supportthe analysis. The
Phase 0 functionalities include some features that trigger the GitLab commands. Theintegration of the software framework and the collaborative repository platform is described below.
At any
Phase 0 creation, Git repositories are created in GitLab under the atlas-physics-office group.Each leading
Physics or Combined Performance group or System Detector / Activity e ff ort is labelled16 igure 7: Phase 0 system main interface. On the right is a summary with the most important information about anactivity. On the left are the steps corresponding to the
Phase 0 activity flow. as a category with four letters in the
FENCE systems related to the analysis and the documentation creation.The full list of Physics and Combined Performance groups is shown in Table 1.For example, the leading Top Quark physics group is
TOPQ while the Electron / Gamma Combined Perform-ance group is
EGAM . The identifier (ID) of a
Phase 0 FENCE entry is therefore labelled:
ANA-GROUP-YEAR-NN where GROUP can be
TOPQ , HIGG , or
EGAM while
YEAR is the year the document was created and NN isa two-digit counter. For instance, ANA-SUSY-2019-04 represents the fourth analysis the
FENCE entrycreated in the SUSY group in 2019.An analysis group may evolve into a
PAPER , a
CONF note, or a
PUB note. The identifiers (IDs) of thosedocuments are therefore
GROUP-YEAR-NN , CONF-GROUP-YEAR-NN , or
PUB-GROUP-YEAR-NN , respectively.This naming convention preserves backward compatibility with the di ff erent entries used for each type ofdocument before Phase 0 creation.In PO-Gitlab, an e ff ort has been made to make the document IDs more logical. They are labelled: • ANA-GROUP-YEAR-NN-INTn for internal notes, • ANA-GROUP-YEAR-NN-PAPER for a paper, • ANA-GROUP-YEAR-NN-CONF for a CONF note, and • ANA-GROUP-YEAR-NN-PUB for a PUB note. 17 igure 8: Screenshot of the "Editorial Board request meeting and formation data" step in the
FENCE Phase 0 system.
For example, in the Higgs category, for a given
Phase 0 analysis entry
ANA-HIGG-2017-08 , PO-GitLabwill host
ANA-HIGG-2017-08-INT1,2..n, ANA-HIGG-2017-08-PAPER, ANA-HIGG-2017-08-CONF,and ANA-HIGG-2017-08-PUB . Each repository is connected to the appropriate
FENCE interface. Thisis illustrated in Figure 9 where the GitLab interface for the atlas-physics-office subgroups andrepositories is shown.
ANA-HIGG-2017-08 , a subgroup of
HIGG , contains for example one paper and oneinternal note repository, respectively
ANA-HIGG-2017-08-PAPER and
ANA-HIGG-2017-08-INT1 . API
A set of classes was created with the original aim of making the use of the GitLab
API easier between the
FENCE systems. In fact, it is mostly used by the Analysis systems within the Analysis GitLab integration.Through the main class, called
Gitlab , it is possible to handle all the basic operations o ff ered by the API :create, get, and customise settings for projects, groups, and branches, handle commits, and carry out manyother actions defined and explained in the GitLab REST
API documentation [9].Each
API endpoint can be accessed by one of the following HTTP methods:
GET , POST , DELETE , and
PUT .The
FENCE –GitLab class uses them through methods detailed in Appendix B.1. Each of those methodsmakes a call to execMethod (see Appendix B.2), which configures the endpoint using the PHP CURLmethods [10] and executes one of the HTTP methods, returning the REST
API answer. This can be a
JSON file with metadata, or just a success, or an error message.18 able 1: List of the Physics Activity leading groups and their acronyms. The WG and CP abbreviations indicateWorking Group and Combined Performance, respectively.
Acronym GroupBPHY B-physics WGEGAM e / gamma CPEXOT Exotics WGFTAG Flavour tag CPHDBS Higgs & Diboson Searches WGHIGG Higgs WGHION Heavy Ions WGIDTR Inner Detector Tracking CPJETM Jet / Etmiss CPMUON Muon CPPMGR Physics Modelling GroupSIMU SimulationSTAT Statistics CommitteeSTDM Standard Model WGSUSY SUSY WGTAUP Tau CPTOPQ Top WGUPPH Upgrade PhysicsThe metadata returned by the execMethod are then used to populate the attributes of many classesrepresenting GitLab elements, including
Branch, File, Commit, Project, Group, Label , and
Member . These can then be manipulated by any
FENCE system.An example is the creation of a paper repository. The createProject method (see Appendix B.3), iscalled with the project name as the first argument (or an instance of the
Project class ) and the projectparameters (such as path , namespace , default branch , and description ) as the second argument. Themethod calls the POST method mentioned above and stores the new repository metadata in a FENCEProject object, which can be used for further manipulations.
The first interaction between
FENCE and GitLab happens when a
Phase 0 entry is created. A group withits reference code is automatically formed containing the first internal note repository. The content ofthis repository’s first commit is obtained from a source repository, which is the package containing filetemplates called atlaslatex . FENCE is responsible for substituting all the necessary variables into all thefile templates according to the metadata inserted when creating the entry in the system. After the commit,
FENCE automatically de-protects the master branch, creates the protected PO-ready branch, and creates thePO-Publication label. The last step is to set the developer permission to the Analysis Team E-group usingLDAP synchronisation.Another
FENCE and GitLab integration process is executed when
Phase 0 is finished or is skipped, thusproceeding to
PAPER , CONF note, or
PUB note
Phase 1 . FENCE automatically creates an internal note19 igure 9: Screenshot of the substructure of a
HIGG
GitLab repository subgroup. The main group, atlas-physics-office , is shown at the top. The
HIGG subgroup is selected, and its
ANA-HIGG-2017-08 subgroup is expanded. A repository for the
PAPER , and one for the Internal Note (
INT1 ), are created under
ANA-HIGG-2017-08 .Figure 10: A view of a paper’s author list section: At first circulation in
Phase 1 , the “Create and push to GitLab"button generates the author list and triggers the push action on the GitLab repository. The button will change its labelat
Phase 2 and
SUBMISSION to “Generate and push to GitLab ".repository setting all the configuration elements that are needed. It is possible to append additional internalnote repositories at any time. The creation of the configuration of the repositories holding the document isdone without any input from the editor’s side, allowing for a streamlined process.
FENCE and Gitlab also interact while handling the author list of a publication. Creating the author list atfirst circulation triggers a request for the existence of the GitLab repository associated with the publicationthrough the Gitlab
API . The act of clicking on the button labeled "Create and push to Gitlab" (see Figure 10)creates the author list according to its reference date in all the formats (including xml and tex ). It thenstarts a dialog between the two platforms,
FENCE and GitLab, to push the files through the GitLab
API . Onfirst circulation, the files are added to GitLab, while on subsequent circulations, as they already exist, theyare simply updated. 20
PO-GitLab and CI tools
The ATLAS Physics O ffi ce GitLab tools (PO-GitLab) simplify the publication process of ATLAS docu-ments by using the features provided by the CERN GitLab platform.The previous publication workflow involved a heavy email exchange between ATLAS editors and thePhysics O ffi ce in order to ensure that ATLAS rules were being followed up to submission of the paperto the arXiv or the journal. This approach led, usually, to modifications implemented by di ff erent parties(o ffi cers and editors), which were sometimes not properly implemented and which slowed the publicationprocess down. Due to the uniform and repetitive nature of the tasks required to submit a publication, theimplementation of an automatic tool was favoured.Three main tasks are handled by the PO-GitLab up to the final submission. They are: the automatic creationof GitLab repositories (Git repositories centralised in the remote platform), the real-time verification oftechnical rules by the GitLab Continuous Integration (CI) tools, and the automatic processing of thedocument itself. These tasks are described in this section. A centralised area controlled by the ATLAS Physics O ffi ce needed to be designed first. Control is thekey, in order to allow the Physics O ffi ce to maintain the quality of the document being accepted forpublication.A basic structure is set in GitLab to store the groups related to an analysis. The main GitLab group iscalled atlas-physics-office , and this represents the root of the group hierarchy tree. Each of itssubgroups belongs to a leading group, for example HIGG , EXOT , SUSY , etc., as is mentioned in Section 5.In the case of the publication shown in Figure 7, a subgroup of
GENR called
ANA-GENR-2018-01 wouldbe created. Inside
ANA-GENR-2018-01 there would exist specific repositories for each type of analysis,designated
ANA-GENR-2018-01-INT1 , ANA-GENR-2018-01-PAPER , ANA-GENR-2018-01-PUB , and / or ANA-GENR-2018-01-CONF .With this structure defined, it is possible to create documents automatically through
FENCE , via the com-munication link between the framework and the GitLab
API . This is explained in more detail in Section 5.This amortisation relies on file templates that have their variables substituted according to requirementsof the related publication. This way, all created repositories contain the default documents correctlyformatted to start writing a
PAPER , CONF , PUB , or Internal Note. The repository is also configured witha new protected branch named
PO-ready , which means that only members with the role
Maintainers are allowed to push and merge. This special branch is used to run the final submission pipeline when thedocument is ready and has been reviewed by the relevant parties. The master branch is used as the mainwork branch, unprotected at the time of the repository creation, allowing all editors to push new commitsand interact with the repository.
GitLab CI tools are designed to automatically execute a set of tasks every time a new modification isintroduced into the document (i.e. a new commit is pushed to the document repository). The approachfrom the Physics O ffi ce was to develop a package that is able to run di ff erent jobs on a given document,21erifying distinct aspects, which are executed by a PO-GitLab Python package. Given the modularity ofthe system, new and more complex tasks can be added, ensuring scalability.GitLab’s CI is organised using pipelines. A pipeline is a set of jobs grouped in stages. All the jobs in thesame stage are executed in parallel, while each stage is only executed after the previous one has completed.The dependencies among the jobs’ executions can be configured in di ff erent ways according to the status.For example, it is possible in some cases to start the jobs of the next stage only if the previous ones havefinished successfully, and in other cases only execute them if the previous stage failed. Each time a newcommit is pushed to the repository, a pipeline is triggered.Di ff erent sets of checks are performed in each step of the publication process. For editors, all work donebefore the paper submission (detailed in Section 6.3) is monitored by the edit-pipelines as shown inFigure 11. These pipelines are triggered by any push made from branches whose name does not start with PO- . The special branches using the
PO- prefix are tracked by the submit-pipelines when a paper isconsidered ready for submission to the arXiv and the peer-review journal.
Figure 11: Screenshot of the edit-pipelines. These four stages do checks before the publication is ready for submission,with the first stage checking the version of the PO-GitLab package, the second stage running checks related toL A TEX formatting, the third one ensuring that the ATLAS rules are being followed, and the last stage testing if thedocument builds correctly.
Figure 11 presents an example of an edit-pipeline that consists of the following set of stages: • Preparation : This consists of only one job that checks the current version of the PO-GitLabpackage. • Technical checks : This stage includes checks related to L A TEX: – Figures exist: checks if all figures used in the document are present in the repository. – Files exist: checks if all the tex files included in the document are present. – Repeated commands: checks for repeated user-defined commands. It is not wise to use thesame command for di ff erent purposes. This can present a problem when captions for figuresand tables are being generated for the ATLAS public pages. – Repeated labels: checks for duplicate labels in all tex files. – Undefined references: checks for undefined references.22
Unused labels: warns if a L A TEX label has been defined but not used. Although this is not aproblem, it might point to an improper reference. • ATLAS checks : These are checks related to ATLAS rules and style: – Bibliography: checks that the bibliography files are included. – Cover logo: checks that the proper logo is being used in the ATLAS template. – Figures labels: checks the ATLAS labels (e.g. ‘ATLAS Internal’) in the legends of figuresdepending on the type of document. Table 2 shows the labels that are allowed or not allowedin di ff erent file types. – Oversized figures: checks for figures larger than 2 MB. – Preprint ID: checks that the preprint ID is included in the document. – Template version: checks that the version of the ATLAS L A TEX template is the latest oneavailable. – Title and Abstract: checks that no user-defined commands (i.e. non-L A TEX commands) arebeing used in the title and in the abstract. • Build : this stage builds the document itself. As these pipelines will be active on each commit, thepdf file of the document is not stored as an artifact. Whether or not the pdf file is to be generated bya manual job (editors can trigger it by clicking on the play button on the interface) is indicated by agear that produces and saves the document as an artifact for a user to download.
Table 2: Types of labels that are allowed and not allowed to be used in figure captions, depending on the documenttype.
Document type Preliminary label Internal labelPAPER Not allowed Not allowedBOOK Not allowed Not allowedCONF Allowed Not allowedPUB Allowed Not allowedNOTE Allowed Allowed
The CI also produces the required files for paper submission, using dedicated pipelines similar to theediting ones. These are called submit-pipelines . A protected Git branch, named
PO-ready , is createdby default at the time of the setup of the paper repository. When a paper is ready for submission, an editorcreates a Merge Request from the
Master to the
PO-ready branch. When this request is accepted by aPhysics O ffi ce o ffi cer, the paper submission pipelines are triggered. In addition, any branch or tag createdfollowing the pattern PO-* triggers the paper submission pipelines. These pipelines have the previouslydescribed tests but subsequently, at the build stage, a flattening of the LaTeX document occurs, with thefollowing actions:1. all the source files are merged into a single L A TEX source file;23. all the comments in the L A TEX source file are removed;3. all the figures are renamed following the convention required by the journals;4. any directory structure is removed.The various actions are shown in Figure 12.Tarballs suitable for submission to the arXiv and journals are created using TEX Live 2016 and 2017,respectively. The two di ff erent versions are required by the journals because of di ff erences in handling thebibliography and to avoid incompatibilities. The arXiv favours TEX Live 2016, while some APS journals,for example, require TEX Live 2017. The tarballs also contain files with plots and tables for the public webpage. These tarballs are created as GitLab artifacts and can be downloaded by the corresponding editorsand members of the Physics O ffi ce. In the submission tarballs, the auxiliary material (figures and tablesnot for submission) are not included. Figure 12: Screenshot of submit-pipelines. From left to right, the jobs check the version of the CI tools and copy bib and sty files along with the flattened L A TEX document to a special folder. The routine then handles all figuresand tables, renaming and labelling them according to the journal’s specifications. At the final stage, the flatteneddocument is updated with the newly named figures and tables. In the last two steps, the document is built, producingthe bbl file needed for the journal and the tarballs for the public web pages.
The author list, often written authorlist for convenience, is the inventory of qualified authors at agiven date, which is called the reference date. Every paper has a related list of qualified authors with areference date that corresponds to the creation date of that list at the
PAPER Phase 1 , just before the firstcirculation of the draft document to the collaboration. Qualified authors are active physicists contributingto the maintenance and operation of the experiment. Some of them are retired people applying theirpre-data credits (obtained before the data-taking era); they are called signing-only authors. Between
FENCEPhase 1 and
Phase 2 , some people may receive exceptional authorship because of their involvement inthe analysis or the paper, even if they are not yet qualified as authors though the usual process. Thereforethe author list is updated to include “exceptional" authors. The special cases are studied by the AuthorshipCommittee and proposed for approval to the Spokesperson, who will agree or not with each exceptionafter reviewing the proposal from the Authorship Committee.24his information is stored in the ATLAS database and managed by
FENCE . Figure 13 shows the full list ofmembers (active and retired), their a ffi liations, and the related metadata that are needed to generate the fullreport of members and institutes. Figure 13: The
FENCE author list generation interface. On the left, a list of institutions used for the a ffi liations; inthe center of the screen, all the authors are listed (yellow ones are signing-only authors); at the top of the page, theinterface allows the users to view the author list on a selected date (top left) or for a given paper (top center), bytyping the ATLAS paper ID. The acknowledgements are incorporated in a legal paragraph that the collaboration agrees to include ineach paper to thank funding agencies for their financial support. They do not change very often, but theymay include or suppress a funding agency or a foundation at a given date. Therefore, similarly to theauthor list, the acknowledgement file is built for each paper at the reference date.Both files, the author list and the acknowledgements, are built using the
FENCE framework (see Figure 14)and are automatically pushed to the appropriate Gitlab repository, using the
FENCE –GitLab integration(Section 5.3). Their integration into the paper is straightforward at the time of submission to a journal.
FENCE provides an elegant way to retrieve the required information from the database (see Section 3.2)and build all the files.The author list is built by the
FENCE framework into an xml file. This is composed of three main blocks: • Header: stores the paper’s main information (Appendix C.1) • Institutes: the list of institutes and their InSPIRE-HEP references (Appendix C.2) • Authors: the list of authors and their information, including names, initials, a ffi liations, and ORCID(Appendix C.3).The xml file is used as a role, since it contains all the information needed to build the other files. It is thefirst one to be generated. A backup version of the first release of the author list is stored.The acknowledgement tex file is built using a standard template and is filled using the FENCE frameworkto retrieve the required information about the ATLAS funding agencies.25 igure 14:
FENCE author list interface: this list contains all the author lists generated, with information about theirpaper’s reference and updates. The first five entries in the list are GitLab projects; the others are stored into
AFS . The
FENCE author list interface, Figure 14, shows the complete set of author lists created for every ATLASpaper that is being submitted or has been published since 2009. They are easily filtered using the
SEARCH box. All the columns are self-explanatory; in the last column the drop-down menu gives access to theauthor list location, which can be distinguished by the icon. A download icon ( I ) means the files arestored in AFS and can be downloaded. A GitLab icon ( (cid:223) ) means the paper and the files are located in aPO-GitLab repository. The author lists can be downloaded or displayed in GitLab in the following fileformats: • tex : used by the editors to include the author list into the draft publication; • xml : a structured file containing all the author list information. It is used by both the arXiv and thejournal as the main database of the paper; • csv : a comma-separated values file used to export authorlist metadata; • pdf : a view of the author list; • cds : a simple text file with the author list information in the format author : institute . Once the author list has been sent to the journal with the publication, a check is made to determine whetherthe publisher has correctly used the information provided at the paper production step. This check involves26 comparison of the journal pdf file that was sent back to the ATLAS Collaboration for a proof review, tothe original xml/tex file. This process used to be done by hand, requiring the o ffi cer to verify that each ofthe ( ∼ ∼ pdf file) of author lists andacknowledgements provided by the journal with the ATLAS data ( xml ) file. A report of this comparison,one for every version of the proof, is available to ATLAS PO-Pub o ffi cers who check the results. The proofchecker follows this process: • retrieve the information from the xml file, containing the authors and their a ffi liations; • extract the text from the journal’s pdf file; • parse the text from the pdf file, creating the target reference; • compare the o ffi cial reference obtained from the xml file with the target reference; • create a report with the di ff erences found between the original and the target reference; • link the report to the main report page, see Figure 15. Figure 15: ATLAS collaboration proofs main page. This web page includes all the information about an ATLASauthor list: its ID, the reference number, the xml file link, the proof sent to the journal, and the proof checker report.
The main di ffi culty with this process is involved with extracting the content from the pdf file; the text isnot easily retrieved, for a variety of reasons. One is that many elements have to be identified and ignored,such as row numbers, watermarks, footers, and headings. Another reason is that words extracted froma pdf file don’t follow a specific coding convention; the file can contain non-ASCII characters that canbe output in many di ff erent ways. The pdf file can specify a predefined encoding standard to use, orprovide a lookup table of di ff erences between a predefined and a built-in encoding standard; for fonts withuncommon Latin characters, which are routine in this kind of publication, special encoding is used. It isnecessary to provide a ToUnicode table where semantic information about the characters is preserved.27lso the proof checker has to pass through all the publication text and recognize where the author liststarts, where it ends, where the institute list starts, and where it ends. All this is made more di ffi cult by thefact that di ff erent publishers have di ff erent layouts and create di ff erent versions of pdf files. This makesthe above problems not generic, but often specific to a particular publisher.After the target reference is created, the comparison looks for: • authors that seem to be missing from the pdf file. Here, false positives are often due to characterencoding and spaces; • authors with inconsistent punctuation. This section points out di ff erences between original andtarget references authors’ first name punctuation, which can follow the rules X . or X.Y. or X.-Y. or X-Y. with or without space; • institutes that seem to be missing from the pdf file. Here false positives are often due to non-standardcharacters that break the entry; • institutes with close matches. All the entries that look like the original but have some inconsistenciesland in this group. Some publishers replace USA with United States of America (or vice versa).Sometimes there is a new character that does not break the institute entry, but makes it so that thematch is not perfect, for example, “Università" and “Universit‘ a"; • mismatched authors. All the authors collaborate through one or more institutes. It is checked thatthe link between the author and the institute is consistent. This sometimes results in a false positive,because it is not always easy to extract from the pdf file the index number of an institute, mainlybecause the text coming from the pdf file also includes other elements such as line numbers of thedocument. For this reason an author originally assigned to institute number X can end up matchedwith target institute YX , because in the text extracted from the pdf the number X might be precededby a Y line number; institute YX may not exist; • deceased authors. In some cases, ATLAS has tagged authors as deceased but the publication forgotto mark them as such, or vice versa; • missing funding agencies, or those wrongly added by the publisher.In early 2019, due to changes in CERN systems, the component written in PROLOG which ran thecomparison went out of service. This implied an urgent need for a new tool for this task. PROLOG takes adi ff erent approach in a generic problem-solving situation: the expression of the problem is translated in alogic stream without working directly on its resolution algorithm. PROLOG is a language that is di ffi cult tomaintain, due to the fact that few developers work with it and its logic programming paradigm. Pythonwas chosen to replace this role.A way to obtain the best match among all the items of an array of institutes and authors was sought,because one cannot rely on finding an author or institute in the same position of the sequences in the xml and pdf files. For this purpose the concept of Levenshtein distance (Appendix D.1) was applied, so that aweighted index of similarity can be obtained to decide what is matched with what, and to then e ff ectivelycheck for anomalies.A feature was developed to help the script evaluate as perfect matches some that would not otherwiseappear to be such. A list of synonyms (Section 7.3.1) is created for every entry, author or institute, to teachthe proof checker to validate similar strings when the di ff erences are due only to problems we have whendecoding the text from the pdf file. So, for instance, if author X. Nonameˇciˇc is not found in the target28eference, but from the pdf entries we extracted an author with name
X. Nonamež ciž c , then, as it hasbeen previously verified that in the pdf file the name appears as expected, the proof checker considersit a perfect match, and skips the problem. A very long list of false positives can be found in the reportpage as “skipped items". The list of synonyms is updated manually, but a tool, the Synonym web page(Section 7.3.2), has been created to allow users to update this list themselves.
As introduced in Section 7.3, the comparison between the pdf file and the xml file can generate falsepositives. To minimize the list of false positives in the report page, the new version of the proof checkerincludes a synonyms list that allows the comparison script to understand if the di ff erence is a real error oranother correct way to display the same information.An example of a working synonym is: Institute as stored into ATLAS DB & xml file Physics Department, SUNY Albany, Albany NY, United States of America
Institute as written on the journal’s author list
Physics Department, SUNY Albany, Albany, New York, USAThese di ff erences are acceptable, since the main information is correctly displayed and no real errors arefound.All the synonym records are managed using a JSON file and are separated into institutes and authors(Appendices D.2 and D.3). Having this as a
JSON file allows the proof checker script to parse the recordseasily and understand if the faults must be marked as journal errors or can be skipped.
To manage the list of proof checker synonyms, ATLAS provides a web page that allows users to search foran existing entry and manage the recorded synonyms. Searching for an institute or author will display thelist of records that match the search criteria, see Figure 16. This allows users to edit the synonyms for therecord. Clicking the edit icon shows a new page section where users can insert their own known synonymfor the record. After confirmation, this is added to the list of synonyms and is taken into account by thenext run of the proof checker.
The proof checker provides a report after its run, one for each paper and draft version. This report isprovided and stored in a
JSON file and must be parsed to show the report results in a human-readable way.This is done by the proof_report web page, see Figure 17. The report contains all the paper informationplus the comparison results sorted by topic (see Appendix D.4). The
JSON file contains more informationthan that which is displayed; this is done to allow the web page to optimize the display of the huge amountof information and to retain data for future improvements. The web page contains some hidden sectionsthat are produced by the proof checker via the known synonyms. These can be displayed by clicking29 igure 16: Proof checker synonyms web page. on ‘Skipped + ’. Here the page will show all the false positive results that the proof checker found on itscomparison, but that are ignored after association with the synonyms.The proof checker helps the Physics O ffi ce sta ff in a tedious task, but it is far from being a perfect tool.It needs to be continuously maintained and updated for new cases, changes in publication layouts, andnew conventions in the author lists and their format. Further improvements are planned, with the goal ofreducing the number of cases to be checked manually by the user to just a couple of dozen. The ATLAS database stores data of various kinds that are displayed in di ff erent ways via web pages. The FENCE framework provides an
API to retrieve this information. A call to the
API , allowed after a userauthentication, provides the results in a
JSON format. This kind of information is easily parsed by mostcommon programming languages and is standard for
API results.There are 3 main ways ATLAS provides web pages: • standard HTML pages; 30 igure 17: Proof report web page. • include files for TWiki pages; • FENCE web pages.The first two options run on an ATLAS PO Virtual Machine, which provides scripts, cron-jobs, or
HTML pages to the users. This
Virtual Machine is directly connected to the
FENCE framework to use its
API and retrieve data, parse it, and store it in the EOS ATLAS file system.
FENCE also allows members who do not belong to the ATLAS Collaboration to access some of theinformation stored in its database. It provides various ways to retrieve and show the information, through a cron-job that runs on the ATLAS PO
Virtual Machine and extracts the data, parses it, and shows it tothe user.An example in which the data are retrieved using the
FENCE API with a cron-job on the ATLAS POVirtual Machine is the ATLAS map web page, Figure 18, where users can see a map of all active memberinstitutes of the ATLAS collaboration.This dynamic web page filters the results to use only the active institutes. This process is done onthe ATLAS PO
Virtual Machine by a Python script, which makes a request to the
API , parses theresults, and builds its own
JSON file. This file contains all the institute information (name, country, links,31 igure 18: The ATLAS map web page. The numbers displayed at a regional level represent the count of institutions. coordinates, etc.) and the layers to build the map. Once the Python script builds the
JSON file, the output isinterpreted by the web page, which takes care of displaying the layers and the markers for the institutes.
TWiki pages
Displaying information on a
TWiki page requires the use of the
TWiki %INCLUDE
TWiki page. Thecontent can be
TWiki code or
HTML code.
HTML code allows a more dynamic page to be created. One canuse javascript , jQuery , and other web development tools to make the content more intuitive to the user.In this case the page will be loaded on-demand and the data will be displayed in real time via the FENCE framework.The public results page, Figure 19, is an example of an include using an
HTML page which retrievesdata using the
FENCE API and displays it to the user into a
TWiki page. This page shows the full list ofpapers,
CONF notes, and PUB notes stored in the ATLAS database and managed by the
FENCE framework.It also allows users to filter results using the buttons on the top of the page. This page loads ∼ FENCE public pages
Although normally the
FENCE web pages are under restrictions based on users’ roles, the
FENCE framework also allows web pages that should be displayed publicly to be generated. This solution allows the de-velopers to use all the powerful
FENCE functionalities (MBF, for example) and to simplify the data retrieval32 igure 19: Public results
TWiki page. process. In addition, it grants the information to be loaded on demand, without cron-jobs or passingthrough the API in a way that will increase the web page loading time.An example of a public web page completely built using the
FENCE framework is the ATLAS Conferenceand Talks page, as shown on Figure 20. This retrieves all the talks, grouped by their conference andregistered within ATLAS, and displays a summary of all the information for each talk and conference,including speaker, institute, conference name, date, and location. All the table’s columns have searchfields in order to allow the users to easily find the talk they are looking for without parsing all the ∼
11 000records displayed on the page. There is an option to filter the results as future, past, or all (the defaultoption). The page also contains some internal links that point to
FENCE web pages (such as the link tothe speaker profile). Such links are marked as internal because they demand authentication for thesenon-public data.This page was built using the MBF infrastructure described in Section 3.2. Before it, developers had tocreate public web pages on
TWiki or from scratch and retrieve all the data by accessing the databasedirectly. 33 igure 20: The ATLAS Talks by Conference web page.
This article summarises the tools that have been set up to support the publication of documents by theATLAS Collaboration. While the emphasis is on papers published in refereed journals, the technologycreated also supports internal documents and other public documents such as Conference and Publicnotes.The
FENCE framework is used as the backbone of the whole setup and is also used to interface the web-based tracking of the status of an analysis with the documentation in GitLab. Extensive use is made of theContinuous Integration tools available in GitLab to ensure that documents can easily be submitted to thearXiv and journals as soon as they have been approved by the collaboration.The software solutions described in this document are now used to accompany the whole of a physicsanalysis, from the expressions of interest by research groups, to the final journal publication. They alsoinclude the generation of the appropriate author list and process the proof-reading.34he tools are used by the whole collaboration and minimise the amount of manual work required forrepetitive procedures, easing the workload of editors, editorial boards, Management, and the Physics O ffi ce.At the same time, all documents connected to an analysis can now be accessed from a central tool wherethe experiment’s rules and knowledge are codified and made available in an intuitive way. Acknowledgements
The authors are indebted to the ATLAS Collaboration for the support provided to achieve the resultsdescribed in this paper. We are grateful to ATLAS collaborators who provided invaluable commentsand input to the paper and the framework it presents. Special acknowledgements go to Marzio Nessi forhelping initiate the Glance project in ATLAS and for supporting its development, and to Kathy Pommesfor supervising the Glance team at CERN. Special thanks to Giordon Stark for thoroughly reviewing thispaper. 35 eferences [1] ATLAS Collaboration,
The ATLAS Experiment at the CERN Large Hadron Collider , JINST (2008)S08003.[2] Apache Subversion Documentation , url : https: // subversion.apache.org / docs / .[3] B. Lange, An object-oriented approach to deploying highly configurable Web interfaces for theATLAS experiment , J. Phys.: Conf. Ser. (2015) 062026.[4]
HEP inSpire , url : http: // inspirehep.net / .[5] The Latex project , url : https: // / .[6] PHP SplObjectStorage class documentation , url : http: // php.net / manual / en / class.splobjectstorage.php.[7] GitLab o ffi cial website , url : https: // gitlab.cern.ch.[8] Git o ffi cial website , url : https: // git-scm.com / .[9] Gitlab REST API documentation , url : https: // docs.gitlab.com / ee / api / .[10] PHP CURL methods , url : https: // / manual / pt_BR / book.curl.php.367 ppendix A Classes for analysis and paper phases
A.1
Graph class abstract class Graph{abstract public function addNode(Node $node);abstract public function deleteNode(Node $node);abstract public function addEdge(Edge $edge);abstract public function deleteEdge(Edge $edge);abstract public function clean();}
A.2
Action class class Action{protected $inputs;protected $outputs;protected $callback;public function __construct(){$this ->inputs = array();$this ->outputs = array();}public function setCallback($callback){$this ->callback = $callback;}public function getCallback(){return $this ->callback;}public function setInputs($inputs){$this ->inputs = $inputs;}public function trigger(){$this ->outputs = call_user_func_array($this ->callback , $this ->inputs);}public function getOutputs(){return $this ->outputs;}}
A.3
User authorisation class public function is_expert(){return $this ->has_egroup("fence -developers"); public function hasPermission($permission){return in_array($permission , $this ->permissions);} B FENCE and GitLab integration classes
B.1 Methods public function get($endPoint , $data = null){$this ->init($endPoint , $data);return $this ->exec();}public function post($endPoint , $data = null){return $this ->execMethod(’POST ’, $endPoint , $data);}private function delete($endPoint , $data = null){return $this ->execMethod(’DELETE ’, $endPoint , $data);}private function put($endPoint , $data = null){return $this ->execMethod(’PUT ’, $endPoint , $data);}
B.2 execMethod function private function execMethod($method , $endPoint , $data = null){$this ->init($endPoint);$this ->setMethod($method);if ($data) {$this ->setBodyData($data);}return $this ->exec();}
B.3 createProject function public function createProject($project , $parameters = []){$name = $project;if ($project instanceof Project) {$name = $project ->name();}\FENCE\Logger::debug("Creating project = {project}", ["project" => $project]);$payload = $this ->post(’projects ’, array_merge([’name ’ => $name], $parameters));if (! isset($payload[’id ’])) {throw new Exception\ProjectAlreadyExistsException( son_encode($payload));}return new Project($payload[’id’], $payload);} C Author list files
C.1 Author list XML file header
C.2 Author list XML file institutes
C.3 Author list XML file authors
D Proofs checks
D.1 Levenshtein distance
Mathematically, the Levenshtein distance between two strings a, b of length | a | and | b | respectively is givenby lev a,b( | a | , | b | ) where 40here 1(ai (cid:44) bj) is equal to 0 when ai = bj and equal to 1 otherwise, and leva,b(i,j) is the distance betweenthe first i characters of a and the first j characters of b. D.2 Institutes {"id": "2","original": "Department of Physics , University of Alberta , Edmonton AB, Canada""synonyms": ["Department of Physics , University of Alberta , Edmonton , Alberta , Canada"],}
D.3 Authors {"original": "A. B\\\"ubbbbbb","inspire": "INSPIRE -00000000" ,"foafName": "Aaaa Bubbbbbb""synonyms": ["A. B\u00f2bbbbbb", "A. B\u00a8 bbbbbb"],}
D.4 Report {"ref_code": "EXOT -2017-24","ref_date": "2018-07-31","creation_date": "29-Oct -2018","publisher": "’APS ’","document": "doc1053","filename": "LY15578_proof_v2","authors_missing_skip": [...],"authors_missing_list": [...],"authors_puntuation_list": [...]"institutes_missing_pdf_list": [...],"institutes_missing_pdf_skip": [...],"authors_mismatched_list": [...],"authors_not_deceased_list": [...],"authors_deceased_list": [...],"institutes_close_matches_list": [...],"founding_agencies_missing": [...],"founding_agencies_wrong": [...]}{"ref_code": "EXOT -2017-24","ref_date": "2018-07-31","creation_date": "29-Oct -2018","publisher": "’APS ’","document": "doc1053","filename": "LY15578_proof_v2","authors_missing_skip": [...],"authors_missing_list": [...],"authors_puntuation_list": [...]"institutes_missing_pdf_list": [...],"institutes_missing_pdf_skip": [...],"authors_mismatched_list": [...],"authors_not_deceased_list": [...],"authors_deceased_list": [...],"institutes_close_matches_list": [...],"founding_agencies_missing": [...],"founding_agencies_wrong": [...]}