[PDF] DINGO: an ontology for projects and grants linked data

Abstract

We present DINGO (Data INtegration for Grants Ontology), an ontology that provides a machine readable extensible framework to model data for semantically-enabled applications relative to projects, funding, actors, and, notably, funding policies in the research landscape. DINGO is designed to yield high modeling power and elasticity to cope with the huge variety in funding, research and policy practices, which makes it applicable also to other areas besides research where funding is an important aspect. We discuss its main features, the principles followed for its development, its community uptake, its maintenance and evolution.

Full PDF

DDINGO: an ontology for projects and grants linked data

Diego Chialva ∗ and Alexis-Michel Mugabushaka ERCEA † , Place Charles Rogier 16, 1210 Brussels, Belgium ‡ Abstract

We present DINGO (Data INtegration for Grants Ontology), an ontology that provides amachine readable extensible framework to model data for semantically-enabled applicationsrelative to projects, funding, actors, and, notably, funding policies in the research landscape.DINGO is designed to yield high modeling power and elasticity to cope with the huge varietyin funding, research and policy practices, which makes it applicable also to other areas besidesresearch where funding is an important aspect. We discuss its main features, the principlesfollowed for its development, its community uptake, its maintenance and evolution.

Keywords: ontology linked data, research funding, research projects, research policies

Services and resources built around Semantic Web, semantically-enabled applications and linked(open) data technologies have been increasingly impacting research and research-related activitiesin the last years. Development has been intense along several directions, for instance in “semanticpublishing” [36], but also in the aspects directed toward the reproducibility and attributionof research and scholarly outputs, leading also to the interest in having Open Science Graphsinterconnected at the global level [21]. All this has become more and more essential to researchpractices, also in light of the so-called reproducibility crisis aﬀecting a number of research ﬁelds(see, for instance, the huge list of latest studies at https://reproduciblescience.org/2019 ).In fact, the demand of easily and automatically parsable, interoperable and processable datagoes beyond the purely academic sphere. The research landscape comprises a vast number andtype of activities, with multiple and diverse stakeholders, actors and with impact on several aspectsand sectors of society. One aspect of huge relevance is the funding of research, together with therelated policies for science development and sustainability.Machine-actionable, inter-operable data is in huge demand in those respects. On the one hand,for instance, research funding agencies face increasing pressure to report on impact derived from theiractivities. This has to be seen in a broader context of the increased role that research assessmentplay in research policy debates. On the other hand, researchers and research organisations areasked more and more to conform to policy speciﬁcations in order to obtain and secure their funding.The compliance to funding and research polices is also part of the wider debate about best researchpractices such as Open Science, Open Access, FAIR data and sustainable research.Research assessment and compliance veriﬁcation at any level involves collection, managementand analysis of a great increasing deal of data of diﬀerent types and from multiple sources . Theclassical way to meet this demand has been to collect data directly from various research actors.This increases the burden on researchers, university administration and funding agencies, as thosedata has to be managed and curated. Moreover, the information, typically collected in an “adhoc” way and in isolation, is not available to others. This results also in duplication of eﬀorts, dueto the necessity to re-do the linking and processing of data. The diﬃculty of data linking and ∗ Corresponding author † Disclaimer.

The views expressed in this paper are the authors’. They do not necessarily reﬂect the views oroﬃcial positions of the European Commission, the ERC Executive Agency or the ERC Scientiﬁc Council. ‡ Emails:

[email protected], [email protected] a r X i v : . [ c s . D L ] J un emantic interpretation across diﬀerent realities and agencies also entails that data and analysisare of limited value when it comes to put them in broader perspective.Solving these problems entails having data that can be easily parsed, processed and interpretedcomputationally. This requires expressive shared machine-processable descriptions and models onthe Web. Technologies as RDF, RDFS, OWL, and SPARQL provide building blocks towards thatgoal and have favoured the development of ontologies to describe various aspects of the researchdomain.However, the development of ontologies for the funding aspects of research and their relationsto research activities, actors is still quite in its early stages. In particular, while few ontologiesexist (see section 2), they mostly envisage only some of the important semantic elements (typicallythose relative to projects and grants), as we will show.This note presents a novel ontology, developed to manage data on research grants and projects,but also notably to conceptualise funding policies and instruments, facilitating the integration andinteroperability of such information with other data and from various sources in the framework ofthe so-called Linked Data. The ontology has been dubbed DINGO (Data INtegration for GrantsOntology). It provides an extensible, interoperable framework for formally modeling the relevantparts of data in this knowledge area.DINGO particularly facilitates the eﬀort of putting analysis of funding activities and policiesin broader context and comparative perspectives, which is much needed when assessing research,policies and their impact. In this way, DINGO will be beneﬁcial in practice at several levels. Forinstance, by increasing the capacity of analysis to inform policy and strategic discussions, as wellas reducing the eﬀort of researchers and oﬃcers in giving evidence of policy compliance.Indeed, one speciﬁc characteristic of the knowledge area DINGO aims to describe is its variety.The existing funding activities and policies show a large spectrum of practices, with remarkablediversity and complex semantics. This constitutes a serious diﬃculty when trying to put fundingactivities and policies in context and comparative perspectives. DINGO has therefore been speciallydesigned to cope with this, by a rigorous conceptualisation of commonalities via a number of ontologyclasses and properties, together with other classes that allow tuning semantic specializations to thespeciﬁc cases when modelling data.This also allows DINGO not only to be eﬀectively used as a pure domain ontology speciﬁc toresearch activities, but in fact to perfectly model even other domains where funding activities playa relevant roles (such as the arts, cultural conservation, and many others). DINGO has therefore,in some respects, also the multi-domain usability typical of more upper ontologies (we use here theclassiﬁcation ad deﬁnitions of ontologies by Guarino [16]).DINGO is fully documented at https://w3id.org/dingo , and a machine readable version of theontology is available at https://w3id.org/dingo in RDF-Turtle by redirection when visiting with the“text/turtle” header (it is also available at https://dcodings.github.io/DINGO/DINGO-OWL.ttl ).This article is organised as follows. Section 2 discusses related work. Sections 3, 4, 5, 6 andsubsections thereof present the aims, development guidelines, community uptake, maintenance andevolution, and main features of DINGO (we leave the detailed description of the ontology to itsdocumentation, available online). We conclude in section 7, where we also comment on futurepotential directions of development. A few works exist modelling data related to funding and research, although to our knowledge nonehas been dealing with the aspects pertaining to research (funding) policies together with the rest.One of the earliest eﬀorts to create a data model for the management of research funding datais CERIF (Common European Research Information Format), [22]. It is an extremely rich anddetailed vocabulary for research management, with a considerable number of entities and relations,and a high granularity. However, it does not conceptualise aspects related to policies.CERIF, conceived for CRIS (Current Research Information Systems), has deep roots in relationaldatabase modeling more than in the semantic/knowledge graph one, as visible from some of itscharacteristics. For example, one of its main features is the presence of “link entities” such asproject-organisation, project-person, and so on. They are in fact relationships rooted in relational2atabase reiﬁcation practices (which diﬀer from what reiﬁcation is in the framework of knowledgegraphs and semantic web). Such “link entities” have however less straightforward interpretation interms of semantic concepts (they often represent couples of concepts), which would aﬀect inferences.We will show how DINGO avoids this problem and yet manages to capture the aspects of interest.Related to CERIF is the OpenAire data model [24].

OpenAire [23] is an infrastructure thatlinks research outcomes to their creators, enabling discoverability, transparency, reproducibilityand quality-assurance. The OpenAire data model uses part of the CERIF vocabulary (includingsome of the “link entities”) and combines them with the OpenAire guidelines.A few OWL-based ontologies exist describing funding in research. Compared to CERIF, theyare fully framed in semantic modeling. The most well-known ones (and in fact the only ones toour knowledge) are FRAPO (Funding, Research Administration and Projects Ontology) [14], [29],and the Springer Nature SciGraph Ontology [34].These are actually part of larger ontologies or ontology collections mainly aiming at categorizingscholarly data, such as publications and other similar outputs, rather than focusing exclusivelyon the funding and research landscape. They are thus tuned for those other purposes and havespeciﬁc limitations. For example, the SciGraph one does not appear to distinguish the conceptof “grant” as funding from the concept of “research project” and thus would not allow to easilymodel for many existing funding practices and uses cases (for instance, the case of projects withmultiple grants, either co-occurring or in sequence). FRAPO instead lacks classes and propertiesfor relevant concepts such as “principal investigators” and others . Moreover, neither ontologyconceptualises the domain of funding policies.In addition to these, there is a growing number of initiatives addressing other dimensions ofresearch data than the funding-project ones. To cite a few:

OpenCitations [30], which is dedicatedto open scholarship and open bibliographic and citation data;

SMS (Semantically Mapping Science)[3], a platform integrating heterogeneous datasets for science, technology, innovation studies;

VIVO [9] an open source software and ontology for representing scholarship and scholarly activity. Finallyone can mention also

CASRAI (Consortia Advancing Standards in Research AdministrationInformation) [7], which does not provide an ontology, but a glossary of research administrationinformation.We will discuss the part of schema.org [33] dealing with funding data in Section 3, as it was infact inspired by DINGO.We ﬁnally would like to mention the FP Ontologies [26]. They do not deal with researchfunding, but model some aspects of projects. Web-searching them points to the webpage at [26],but in fact we could not ﬁnd documentation nor download any serialisation from that page.

DINGO has been ﬁrst presented to the public in the late 2018, and has led to a number of uses,both directly for data modeling and knowledge bases creation, and as a basis or inspiration forrelated ontology modeling eﬀorts.The ﬁrst public presentation of DINGO has occurred at the workshop “Wikidata for research”,Berlin, 17-18 June 2018, where feedback and input were exchanged with a working group ofparticipants, which lead to the linking of DINGO with the Wikidata graph.DINGO also inspired the part of the schema.org model speciﬁc for grants and funding (asmentioned explicitly at the issue 343 of the schema.org release of 2019-04-01 . Schema.org ’s modelcovers however only a subset of DINGO’s .Furthermore, DINGO has been adopted to model the knowledge base of the European Com-mission data hosted and available now in the OpenAire LOD service (at http://lod.openaire.eu/eu-open-research-data ), and as one of the basis of the schema for the GRANTID initiative of CrossRef[10] (one of the authors of this article, D.C., has been a member of the technical group for theschema ). Visible at https://schema.org/docs/releases.html ). See ). Ontology Mapping, Reuse and Extensions in DINGO

Ontology mapping is a key challenge of the Sematic Web and of Linked (Open) Data for severalreasons. Ontology reuse is also a good knowledge engineering practice, increasing the interoperabilityof systems.In the framework of semantic modeling and the Semantic Web, reuse and mapping are particu-larly complex. On the one hand, the de-centralised nature of the web favours the development ofseveral ontologies and data models, which often overlap partially. On the other hand, the singleontologies are generally created with speciﬁc goals, and thus even when they are developed tomodel data from the same domain(s), they will generally present subtle semantic diﬀerences evenin seemingly general concepts.In the case of research data, mapping and reuse are further complicated by the multiplicity ofactors and the diversity of types of funding practices, policies and data. But on the other hand,this same issue prompts to maximise the semantic modeling power of an ontology by linking itwith overlapping ones in order to achieve maximum interoperability.DINGO was therefore built from the start with a particular attention to ontology mapping.Pure reuse has been possible only to a certain extent, because ontologies covering overlappingknowledge areas (such as those mentioned in Section 2) do indeed present subtle but relevantsemantic speciﬁcities.The mapping in DINGO makes use of the SKOS ontology/data model mapping properties(documented at [1]) and RDF and OWL class and properties axioms such as owl:equivalentClassand owl:equivalentProperty when applicable. In fact, the establishment of mapping using the latterowl axioms is generally quite complicated, as they require establishing that the full extension ofthe relative classes/properties are equal. This is typically a diﬃcult task in the case of a complexknowledge area such as the one of research, and has therefore being done carefully and ratherconservatively in DINGO.DINGO is presently mapped to the Wikidata data model, to schema.org and to the FRAPOontology. There is also interest in linking DINGO with the vocabulary provided by CERIF, andfuture developments have been already planned in that sense.Besides that, DINGO also reuses several other ontologies, such as SKOS, schema.org andDublinCore [11], and is inspired by the FAIR principles [39] for data publication.Finally, DINGO has been designed to be easily extensible to adapt to the various possibleuse cases and diversity of data and existing practices. The ontology presents “hook properties”(such as product or material produced ) that allow to extend DINGO linking, for instance, to datamodeled with the many ontologies dealing with scholarly and publishing data (such as the SPARontologies [29], the Semantic Web Journal (SWJ) ontology [20], the Semantic Web Conference(SWC) ontology [37], the Semantically Annotated LaTeX (SALT) ontologies [15], the NatureOntologies [18], the SciGraph Ontologies [19], the Conference Ontology [25], BIBFRAME [4] andbibkliotek-o [5]).

The principal aim of the DINGO ontology is to provide a machine readable extensible frameworkto model data relative to projects, funding, policies and actors. The original intended users forsuch frameworks were the stakeholders in the research landscape with their very diﬀerent use cases.As discussed in Section 1, semantical modeling of that knowledge area faces, among others, onemain diﬃculty: there exist a huge variety of funding, policies, practices and research activities.Due to the aim of being able to cope with this, as we illustrate also in Section 5.3, DINGO isﬁnally applicable also to domains diﬀerent than the one of pure research where the funding aspectsare relevant, for example in the arts, cultural conservation and the like.DINGO’s development was also driven by the goal of being rich enough to1. integrate and accommodate existing systems and data instances4. satisfy complex as well as simple use cases, also by straightforward extension.This set of principal design goals and requirements also allowed to work toward the realisationof additional (and important) objectives, such as promoting the opening up of funding data, andthe linking and re-using of data.Special care has been devoted to minimizing the eﬀorts in applying/adopting the model byusers. In particular, while the model has been created using Linked Data fundamentals, it is apt todiﬀerent implementations and integration in non-graph-type data bases, hence it does not addressspeciﬁcally the optimization of graph inference and graph-based queries.

Ontology generation is a complex process that has been scrutinised in the literature and has led tothe establishment of a number of engineering best practices, see for example [13], [38], [32], [16],[17], [31]. The design of DINGO has followed such best practices. The main guidelines followedhave been: • a mixture of middle-out and bottom-up approach: starting from actual data (such asfunding data from various agencies, see below), several main concepts have been designedand the ontology generation has proceeded by distinguishing a number of commonalities(generalisations) and speciﬁcities; the advise of domain experts has also been essential, mostlyproﬁting from the fact that DINGO has been developed at the ERC(EA) [12] • practical usability of the end results • interoperability/integration from the inception with other graphs (for instance, Wikidataand Schema.org) • suﬃcient granularity to allow for eﬃcient monitoring and evaluation purposes, but alsosuﬃcient generality to accommodate potentially all funding data, thus providing the wholebeneﬁt of a large Linked Data Graph. DINGO is straightforwardly extensible to provideadditional granularity • coverage of all areas of interest, also for non-academic actors and stakeholders.For DINGO’s data-based mixed middle-out and bottom-up development we have used variousresearch funding data, in particular looking at data freely provided by several funding agencies. Forinstance, we have used data from the European Union Funding (Research Framework Programmes),The Australian Research Council (ARC), the Swiss National Science Foundation (SNSF), theCroatian Science Foundation, the US National institute of Health (NIH), the US National ScienceFoundation (NSF), the various UK agencies coordinated by the Research Councils UK (RCUK).Finally, we have adopted elements of agile development, not dissimilarly from what proposedin [28], for instance concerning unit testing.The tools employed in the design and coding process of DINGO have been: UMLet [2] with somecustom diagrams elements for graphical representations, while the documentation has been buildusing a custom software written in Python (unpublished) to automatically generate human-readableHTML documentation from OWL ontologies serialisations (see section 5.4). Here we describe DINGO’s main components and their features, while the ontology full speciﬁcationis available at https://w3id.org/dingo .DINGO is an OWL-DL ontology comprising 40 classes and 68 properties. Its classes providean articulated conceptualisation of entities relevant for the characterisation of data in the research,funding and research-related domain. In particular, besides classes for Projects, Grants, FundingA-gency and others, there are speciﬁc classes for describing funding policies, with several speciﬁcsubclasses (which can straightforwardly be expanded).5s we said, the variety and diversity of funding realities (which we will also call “realisa-tions”) makes semantic conceptualisation particularly diﬃcult. For example, diﬀerent fundingagencies/funders classify their funding policies in various and discording ways, sometimes usingthe same word for diﬀerent things (for instance, the terms funding scheme/programme/action).Also the role and characterisation of the diﬀerent actors in projects and grant agreements arequite diverse. Such modeling complexity appears not only at the level of concepts, but also ofrelations/properties. Notably, the relationships between the funding and the research enterprisecan be various and rather complex.Furthermore, alongside concepts deﬁnition, additional complexity is given by the variety ofuse cases: besides the simple case of one grant funding one project, often multiple fundings areattached to a single project (either in sequence or at the same time), or a single grant funds several(sub)projects.Therefore, DINGO’s properties and classes have been designed to allow high modeling capabilityto represent such variety of concepts and realisations.DINGO’s main features are as follows: • it deﬁnes a number of principal classes: Project , Grant , Funding Agency , Fund-ingScheme , Role , Person , Organisation , Criterion , various subclasses of those andsome related specialised classes; • a Project is an organised endeavour (collective or individual) planned to reach a particularaim or achieve a result • a Grant is a disbursed fund paid to a recipient or beneﬁciary and the process for it; DINGOfocuses on the main deﬁnition of “funding” (which is deﬁned as “money for a particularpurpose; the act of providing money for such a purpose” both in the Cambridge, Oxford andCollins dictionaries [8], [6], [27]), but can be extended to other types of funding (non-monetaryones), see Section 6. • a Project may be funded by one or more Grants simultaneously or in sequence • a Grant may fund one or several Projects • Grants can be awarded to Person(s) and/or to Organisation(s) • Projects can be participated by

Person (s) or by

Organisation (s), hence a participant ,characterised by a

Role , can be a

Person or an

Organisation • the Role class can be used to specify the semantics of the participation to a Project or rolein a Grant. This class provides instruments to model a large variety of semantic types, toaccount for the variety of practices found in actual data • types of organisations can be speciﬁed using one of the several sub-classes of Organisation or creating new ones • a participant ( Person or Organisation ) in a

Project may not actually be beneﬁciary ofa speciﬁc

Grant funding the Project; accordingly, DINGO reﬂects that particular participantsof Project and beneﬁciaries of Grant funding the same Project may be diﬀerent • temporal aspects of the various concepts can be fully modeled, and are expressed by speciﬁcproperties ( start time , end time , inception , and so on) • Funding Agencies are the organisations materially disbursing and administering the Grantprocess • Funding Schemes are funding instruments accompanied by speciﬁcations of Grant coverage,eligibility, reimbursement rates, speciﬁc criteria for funding, grant population targets, andsimilar features. Such speciﬁcations constitute one or more

Criterion to award funds(Grants); 6

Funding Schemes may be sub-speciﬁcations of other Funding schemes; this recursiverelation allows to model existing complicated hierarchies of funding instruments. The word“Scheme” has diﬀerent meanings for diﬀerent funding agencies/funders. In fact, there existother related terms such as funding program and funding action, in particular in case of ahierarchy of funding instruments. DINGO represents the generalisation of such instrumentsvia the class FundingScheme, and expresses the taxonomy and relations among the variousinstruments via the Criterion class and subclasses and the FundingScheme (recursive ) classproperties • Criteria can be of diﬀerent nature, modeled in DINGO via diﬀerent sub-classes; multiplecriteria can coexist in a single funding scheme; they provide a conceptualisation (straight-forwardly extensible by sub-classing) to characterise funding policies in relation to fundingschemes and activities.We present in Figure 1 a graphical illustration of the main parts of the ontology, both classesand properties, portrayed respectively by ovals and arrows.Figure 1: Graphical representation of DINGO (main parts).

DINGO is documented at https://w3id.org/dingo. The documentation has been created usingcustom software written in Python (unpublished) that automatically extracts classes, properties,individuals, annotations, axioms and namespaces from OWL ontologies and produces human-readable HTML. 7he machine-readable serialisation of DINGO is provided in RDF-Turtle language, and availableat https://w3id.org/dingo by redirection when visited with the “text/turtle” header. We alsoprovide, at the same address, a Shape Expression [35] data model for validation of data triples.

DINGO’s maintenance is continuous and evolutive in nature, because DINGO aims at eﬀectivelymodeling funding and research practices, which continuously evolve by themselves. As mentioned,the evolution and extension of DINGO will be eased by the speciﬁc design choices made in creatingit, which provide for a high modeling power to cope with the variety of existing funding realities.Hence, in many cases the required evolution/extension will be minimal (just by subclassing fornew concepts).DINGO can however be straightforwardly extended even in more orthogonal directions. Forexample, as discussed in Section 5.3, DINGO focuses on the main deﬁnition of “funding” (themonetary one, see the Cambridge, Oxford and Collins dictionaries [8], [6], [27]), but it can beextended to non-monetary funding simply by providing parallel classes as Grant, with properties forthe speciﬁc resources provisions (and possibly a generalisation class to describe their commonalities).

We have presented an OWL-based ontology for research and funding called DINGO and illustratedits main features, uptake and evolutive maintenance.DINGO has the potential to constitute a key ingredient for a set of orthogonal and interoperableontologies for the knowledge area of funding, research and their impact. In particular, there is alack of ontological conceptualisations concerning the domain of impact and impact studies, hence,for instance, we have already planned the development of ontologies for data relative to impactindicators.Moreover, as we mentioned, DINGO has features that enable it to be both used for domainknowledge graphs speciﬁc to research, as well as in graphs for other domains where funding aspectsand policies are of interest (such as the arts, cultural conservation, and the like).DINGO has already been used in a number of projects, as described in Section 3. We planto engage further with relevant communities to create systems that oﬀer information on researchfunding in distributed manner using DINGO. This should eventually lead to a truly global OpenResearch Information Graph providing access to data in several interconnected research informationsystems.

Acknowledgements.

We would like to thank the co-organisers and the participants of theworkshop “Wikidata for research”, Berlin, 17-18 June 2018 for their feedback and input.

References (2020), https://doi.org/10.1162/qss a 0002331. Presutti, V., Gangemi, A.: Content ontology design patterns as practical building blocks forweb ontologies. In: Proceedings ER 2008. pp. 128–141 (2008), https://doi.org/10.1007/978-3-540-87877-3 1132. Reich, J.: Ontological Design Patterns for the Integration of Molecular Biological Information.In: Proceedings of the German Conference on Bioinformatics GCB’99. pp. 156–166 (1999)33. Schema.org: https://schema.org/34. SCIGRAPH: scigraph.springernature.com/explorer/datasets/ontology/35. Shape-Expressions: https://shex.io/ 96. Shotton, D.: Semantic publishing: the coming revolution in scientiﬁc journal publishing.Learned Publishing (2), 85–94 (2009)37. SWC: http://data.semanticweb.org/ns/swc/ontology38. Uschold, M.: Creating, integrating and maintaining local and global ontologies. In: Proceedingsof the First Workshop on Ontology Learning (OL-2000) (2000)39. Wilkinson, M. D. and Dumontier, M. and Aalbersberg, I. J. and Appleton, G. and et al.: TheFAIR Guiding Principles for scientiﬁc data management and stewardship. Scientiﬁc Data3