DINGO: an ontology for projects and grants linked data
DDINGO: an ontology for projects and grants linked data
Diego Chialva ∗ and Alexis-Michel Mugabushaka ERCEA † , Place Charles Rogier 16, 1210 Brussels, Belgium ‡ Abstract
We present DINGO (Data INtegration for Grants Ontology), an ontology that provides amachine readable extensible framework to model data for semantically-enabled applicationsrelative to projects, funding, actors, and, notably, funding policies in the research landscape.DINGO is designed to yield high modeling power and elasticity to cope with the huge varietyin funding, research and policy practices, which makes it applicable also to other areas besidesresearch where funding is an important aspect. We discuss its main features, the principlesfollowed for its development, its community uptake, its maintenance and evolution.
Keywords: ontology linked data, research funding, research projects, research policies
Services and resources built around Semantic Web, semantically-enabled applications and linked(open) data technologies have been increasingly impacting research and research-related activitiesin the last years. Development has been intense along several directions, for instance in “semanticpublishing” [36], but also in the aspects directed toward the reproducibility and attributionof research and scholarly outputs, leading also to the interest in having Open Science Graphsinterconnected at the global level [21]. All this has become more and more essential to researchpractices, also in light of the so-called reproducibility crisis affecting a number of research fields(see, for instance, the huge list of latest studies at https://reproduciblescience.org/2019 ).In fact, the demand of easily and automatically parsable, interoperable and processable datagoes beyond the purely academic sphere. The research landscape comprises a vast number andtype of activities, with multiple and diverse stakeholders, actors and with impact on several aspectsand sectors of society. One aspect of huge relevance is the funding of research, together with therelated policies for science development and sustainability.Machine-actionable, inter-operable data is in huge demand in those respects. On the one hand,for instance, research funding agencies face increasing pressure to report on impact derived from theiractivities. This has to be seen in a broader context of the increased role that research assessmentplay in research policy debates. On the other hand, researchers and research organisations areasked more and more to conform to policy specifications in order to obtain and secure their funding.The compliance to funding and research polices is also part of the wider debate about best researchpractices such as Open Science, Open Access, FAIR data and sustainable research.Research assessment and compliance verification at any level involves collection, managementand analysis of a great increasing deal of data of different types and from multiple sources . Theclassical way to meet this demand has been to collect data directly from various research actors.This increases the burden on researchers, university administration and funding agencies, as thosedata has to be managed and curated. Moreover, the information, typically collected in an “adhoc” way and in isolation, is not available to others. This results also in duplication of efforts, dueto the necessity to re-do the linking and processing of data. The difficulty of data linking and ∗ Corresponding author † Disclaimer.
The views expressed in this paper are the authors’. They do not necessarily reflect the views orofficial positions of the European Commission, the ERC Executive Agency or the ERC Scientific Council. ‡ Emails:
[email protected], [email protected] a r X i v : . [ c s . D L ] J un emantic interpretation across different realities and agencies also entails that data and analysisare of limited value when it comes to put them in broader perspective.Solving these problems entails having data that can be easily parsed, processed and interpretedcomputationally. This requires expressive shared machine-processable descriptions and models onthe Web. Technologies as RDF, RDFS, OWL, and SPARQL provide building blocks towards thatgoal and have favoured the development of ontologies to describe various aspects of the researchdomain.However, the development of ontologies for the funding aspects of research and their relationsto research activities, actors is still quite in its early stages. In particular, while few ontologiesexist (see section 2), they mostly envisage only some of the important semantic elements (typicallythose relative to projects and grants), as we will show.This note presents a novel ontology, developed to manage data on research grants and projects,but also notably to conceptualise funding policies and instruments, facilitating the integration andinteroperability of such information with other data and from various sources in the framework ofthe so-called Linked Data. The ontology has been dubbed DINGO (Data INtegration for GrantsOntology). It provides an extensible, interoperable framework for formally modeling the relevantparts of data in this knowledge area.DINGO particularly facilitates the effort of putting analysis of funding activities and policiesin broader context and comparative perspectives, which is much needed when assessing research,policies and their impact. In this way, DINGO will be beneficial in practice at several levels. Forinstance, by increasing the capacity of analysis to inform policy and strategic discussions, as wellas reducing the effort of researchers and officers in giving evidence of policy compliance.Indeed, one specific characteristic of the knowledge area DINGO aims to describe is its variety.The existing funding activities and policies show a large spectrum of practices, with remarkablediversity and complex semantics. This constitutes a serious difficulty when trying to put fundingactivities and policies in context and comparative perspectives. DINGO has therefore been speciallydesigned to cope with this, by a rigorous conceptualisation of commonalities via a number of ontologyclasses and properties, together with other classes that allow tuning semantic specializations to thespecific cases when modelling data.This also allows DINGO not only to be effectively used as a pure domain ontology specific toresearch activities, but in fact to perfectly model even other domains where funding activities playa relevant roles (such as the arts, cultural conservation, and many others). DINGO has therefore,in some respects, also the multi-domain usability typical of more upper ontologies (we use here theclassification ad definitions of ontologies by Guarino [16]).DINGO is fully documented at https://w3id.org/dingo , and a machine readable version of theontology is available at https://w3id.org/dingo in RDF-Turtle by redirection when visiting with the“text/turtle” header (it is also available at https://dcodings.github.io/DINGO/DINGO-OWL.ttl ).This article is organised as follows. Section 2 discusses related work. Sections 3, 4, 5, 6 andsubsections thereof present the aims, development guidelines, community uptake, maintenance andevolution, and main features of DINGO (we leave the detailed description of the ontology to itsdocumentation, available online). We conclude in section 7, where we also comment on futurepotential directions of development. A few works exist modelling data related to funding and research, although to our knowledge nonehas been dealing with the aspects pertaining to research (funding) policies together with the rest.One of the earliest efforts to create a data model for the management of research funding datais CERIF (Common European Research Information Format), [22]. It is an extremely rich anddetailed vocabulary for research management, with a considerable number of entities and relations,and a high granularity. However, it does not conceptualise aspects related to policies.CERIF, conceived for CRIS (Current Research Information Systems), has deep roots in relationaldatabase modeling more than in the semantic/knowledge graph one, as visible from some of itscharacteristics. For example, one of its main features is the presence of “link entities” such asproject-organisation, project-person, and so on. They are in fact relationships rooted in relational2atabase reification practices (which differ from what reification is in the framework of knowledgegraphs and semantic web). Such “link entities” have however less straightforward interpretation interms of semantic concepts (they often represent couples of concepts), which would affect inferences.We will show how DINGO avoids this problem and yet manages to capture the aspects of interest.Related to CERIF is the OpenAire data model [24].
OpenAire [23] is an infrastructure thatlinks research outcomes to their creators, enabling discoverability, transparency, reproducibilityand quality-assurance. The OpenAire data model uses part of the CERIF vocabulary (includingsome of the “link entities”) and combines them with the OpenAire guidelines.A few OWL-based ontologies exist describing funding in research. Compared to CERIF, theyare fully framed in semantic modeling. The most well-known ones (and in fact the only ones toour knowledge) are FRAPO (Funding, Research Administration and Projects Ontology) [14], [29],and the Springer Nature SciGraph Ontology [34].These are actually part of larger ontologies or ontology collections mainly aiming at categorizingscholarly data, such as publications and other similar outputs, rather than focusing exclusivelyon the funding and research landscape. They are thus tuned for those other purposes and havespecific limitations. For example, the SciGraph one does not appear to distinguish the conceptof “grant” as funding from the concept of “research project” and thus would not allow to easilymodel for many existing funding practices and uses cases (for instance, the case of projects withmultiple grants, either co-occurring or in sequence). FRAPO instead lacks classes and propertiesfor relevant concepts such as “principal investigators” and others . Moreover, neither ontologyconceptualises the domain of funding policies.In addition to these, there is a growing number of initiatives addressing other dimensions ofresearch data than the funding-project ones. To cite a few:
OpenCitations [30], which is dedicatedto open scholarship and open bibliographic and citation data;
SMS (Semantically Mapping Science)[3], a platform integrating heterogeneous datasets for science, technology, innovation studies;
VIVO [9] an open source software and ontology for representing scholarship and scholarly activity. Finallyone can mention also
CASRAI (Consortia Advancing Standards in Research AdministrationInformation) [7], which does not provide an ontology, but a glossary of research administrationinformation.We will discuss the part of schema.org [33] dealing with funding data in Section 3, as it was infact inspired by DINGO.We finally would like to mention the FP Ontologies [26]. They do not deal with researchfunding, but model some aspects of projects. Web-searching them points to the webpage at [26],but in fact we could not find documentation nor download any serialisation from that page.
DINGO has been first presented to the public in the late 2018, and has led to a number of uses,both directly for data modeling and knowledge bases creation, and as a basis or inspiration forrelated ontology modeling efforts.The first public presentation of DINGO has occurred at the workshop “Wikidata for research”,Berlin, 17-18 June 2018, where feedback and input were exchanged with a working group ofparticipants, which lead to the linking of DINGO with the Wikidata graph.DINGO also inspired the part of the schema.org model specific for grants and funding (asmentioned explicitly at the issue 343 of the schema.org release of 2019-04-01 . Schema.org ’s modelcovers however only a subset of DINGO’s .Furthermore, DINGO has been adopted to model the knowledge base of the European Com-mission data hosted and available now in the OpenAire LOD service (at http://lod.openaire.eu/eu-open-research-data ), and as one of the basis of the schema for the GRANTID initiative of CrossRef[10] (one of the authors of this article, D.C., has been a member of the technical group for theschema ). Visible at https://schema.org/docs/releases.html ). See ). Ontology Mapping, Reuse and Extensions in DINGO
Ontology mapping is a key challenge of the Sematic Web and of Linked (Open) Data for severalreasons. Ontology reuse is also a good knowledge engineering practice, increasing the interoperabilityof systems.In the framework of semantic modeling and the Semantic Web, reuse and mapping are particu-larly complex. On the one hand, the de-centralised nature of the web favours the development ofseveral ontologies and data models, which often overlap partially. On the other hand, the singleontologies are generally created with specific goals, and thus even when they are developed tomodel data from the same domain(s), they will generally present subtle semantic differences evenin seemingly general concepts.In the case of research data, mapping and reuse are further complicated by the multiplicity ofactors and the diversity of types of funding practices, policies and data. But on the other hand,this same issue prompts to maximise the semantic modeling power of an ontology by linking itwith overlapping ones in order to achieve maximum interoperability.DINGO was therefore built from the start with a particular attention to ontology mapping.Pure reuse has been possible only to a certain extent, because ontologies covering overlappingknowledge areas (such as those mentioned in Section 2) do indeed present subtle but relevantsemantic specificities.The mapping in DINGO makes use of the SKOS ontology/data model mapping properties(documented at [1]) and RDF and OWL class and properties axioms such as owl:equivalentClassand owl:equivalentProperty when applicable. In fact, the establishment of mapping using the latterowl axioms is generally quite complicated, as they require establishing that the full extension ofthe relative classes/properties are equal. This is typically a difficult task in the case of a complexknowledge area such as the one of research, and has therefore being done carefully and ratherconservatively in DINGO.DINGO is presently mapped to the Wikidata data model, to schema.org and to the FRAPOontology. There is also interest in linking DINGO with the vocabulary provided by CERIF, andfuture developments have been already planned in that sense.Besides that, DINGO also reuses several other ontologies, such as SKOS, schema.org andDublinCore [11], and is inspired by the FAIR principles [39] for data publication.Finally, DINGO has been designed to be easily extensible to adapt to the various possibleuse cases and diversity of data and existing practices. The ontology presents “hook properties”(such as product or material produced ) that allow to extend DINGO linking, for instance, to datamodeled with the many ontologies dealing with scholarly and publishing data (such as the SPARontologies [29], the Semantic Web Journal (SWJ) ontology [20], the Semantic Web Conference(SWC) ontology [37], the Semantically Annotated LaTeX (SALT) ontologies [15], the NatureOntologies [18], the SciGraph Ontologies [19], the Conference Ontology [25], BIBFRAME [4] andbibkliotek-o [5]).
The principal aim of the DINGO ontology is to provide a machine readable extensible frameworkto model data relative to projects, funding, policies and actors. The original intended users forsuch frameworks were the stakeholders in the research landscape with their very different use cases.As discussed in Section 1, semantical modeling of that knowledge area faces, among others, onemain difficulty: there exist a huge variety of funding, policies, practices and research activities.Due to the aim of being able to cope with this, as we illustrate also in Section 5.3, DINGO isfinally applicable also to domains different than the one of pure research where the funding aspectsare relevant, for example in the arts, cultural conservation and the like.DINGO’s development was also driven by the goal of being rich enough to1. integrate and accommodate existing systems and data instances4. satisfy complex as well as simple use cases, also by straightforward extension.This set of principal design goals and requirements also allowed to work toward the realisationof additional (and important) objectives, such as promoting the opening up of funding data, andthe linking and re-using of data.Special care has been devoted to minimizing the efforts in applying/adopting the model byusers. In particular, while the model has been created using Linked Data fundamentals, it is apt todifferent implementations and integration in non-graph-type data bases, hence it does not addressspecifically the optimization of graph inference and graph-based queries.
Ontology generation is a complex process that has been scrutinised in the literature and has led tothe establishment of a number of engineering best practices, see for example [13], [38], [32], [16],[17], [31]. The design of DINGO has followed such best practices. The main guidelines followedhave been: • a mixture of middle-out and bottom-up approach: starting from actual data (such asfunding data from various agencies, see below), several main concepts have been designedand the ontology generation has proceeded by distinguishing a number of commonalities(generalisations) and specificities; the advise of domain experts has also been essential, mostlyprofiting from the fact that DINGO has been developed at the ERC(EA) [12] • practical usability of the end results • interoperability/integration from the inception with other graphs (for instance, Wikidataand Schema.org) • sufficient granularity to allow for efficient monitoring and evaluation purposes, but alsosufficient generality to accommodate potentially all funding data, thus providing the wholebenefit of a large Linked Data Graph. DINGO is straightforwardly extensible to provideadditional granularity • coverage of all areas of interest, also for non-academic actors and stakeholders.For DINGO’s data-based mixed middle-out and bottom-up development we have used variousresearch funding data, in particular looking at data freely provided by several funding agencies. Forinstance, we have used data from the European Union Funding (Research Framework Programmes),The Australian Research Council (ARC), the Swiss National Science Foundation (SNSF), theCroatian Science Foundation, the US National institute of Health (NIH), the US National ScienceFoundation (NSF), the various UK agencies coordinated by the Research Councils UK (RCUK).Finally, we have adopted elements of agile development, not dissimilarly from what proposedin [28], for instance concerning unit testing.The tools employed in the design and coding process of DINGO have been: UMLet [2] with somecustom diagrams elements for graphical representations, while the documentation has been buildusing a custom software written in Python (unpublished) to automatically generate human-readableHTML documentation from OWL ontologies serialisations (see section 5.4). Here we describe DINGO’s main components and their features, while the ontology full specificationis available at https://w3id.org/dingo .DINGO is an OWL-DL ontology comprising 40 classes and 68 properties. Its classes providean articulated conceptualisation of entities relevant for the characterisation of data in the research,funding and research-related domain. In particular, besides classes for Projects, Grants, FundingA-gency and others, there are specific classes for describing funding policies, with several specificsubclasses (which can straightforwardly be expanded).5s we said, the variety and diversity of funding realities (which we will also call “realisa-tions”) makes semantic conceptualisation particularly difficult. For example, different fundingagencies/funders classify their funding policies in various and discording ways, sometimes usingthe same word for different things (for instance, the terms funding scheme/programme/action).Also the role and characterisation of the different actors in projects and grant agreements arequite diverse. Such modeling complexity appears not only at the level of concepts, but also ofrelations/properties. Notably, the relationships between the funding and the research enterprisecan be various and rather complex.Furthermore, alongside concepts definition, additional complexity is given by the variety ofuse cases: besides the simple case of one grant funding one project, often multiple fundings areattached to a single project (either in sequence or at the same time), or a single grant funds several(sub)projects.Therefore, DINGO’s properties and classes have been designed to allow high modeling capabilityto represent such variety of concepts and realisations.DINGO’s main features are as follows: • it defines a number of principal classes: Project , Grant , Funding Agency , Fund-ingScheme , Role , Person , Organisation , Criterion , various subclasses of those andsome related specialised classes; • a Project is an organised endeavour (collective or individual) planned to reach a particularaim or achieve a result • a Grant is a disbursed fund paid to a recipient or beneficiary and the process for it; DINGOfocuses on the main definition of “funding” (which is defined as “money for a particularpurpose; the act of providing money for such a purpose” both in the Cambridge, Oxford andCollins dictionaries [8], [6], [27]), but can be extended to other types of funding (non-monetaryones), see Section 6. • a Project may be funded by one or more Grants simultaneously or in sequence • a Grant may fund one or several Projects • Grants can be awarded to Person(s) and/or to Organisation(s) • Projects can be participated by
Person (s) or by
Organisation (s), hence a participant ,characterised by a
Role , can be a
Person or an
Organisation • the Role class can be used to specify the semantics of the participation to a Project or rolein a Grant. This class provides instruments to model a large variety of semantic types, toaccount for the variety of practices found in actual data • types of organisations can be specified using one of the several sub-classes of Organisation or creating new ones • a participant ( Person or Organisation ) in a
Project may not actually be beneficiary ofa specific
Grant funding the Project; accordingly, DINGO reflects that particular participantsof Project and beneficiaries of Grant funding the same Project may be different • temporal aspects of the various concepts can be fully modeled, and are expressed by specificproperties ( start time , end time , inception , and so on) • Funding Agencies are the organisations materially disbursing and administering the Grantprocess • Funding Schemes are funding instruments accompanied by specifications of Grant coverage,eligibility, reimbursement rates, specific criteria for funding, grant population targets, andsimilar features. Such specifications constitute one or more
Criterion to award funds(Grants); 6
Funding Schemes may be sub-specifications of other Funding schemes; this recursiverelation allows to model existing complicated hierarchies of funding instruments. The word“Scheme” has different meanings for different funding agencies/funders. In fact, there existother related terms such as funding program and funding action, in particular in case of ahierarchy of funding instruments. DINGO represents the generalisation of such instrumentsvia the class FundingScheme, and expresses the taxonomy and relations among the variousinstruments via the Criterion class and subclasses and the FundingScheme (recursive ) classproperties • Criteria can be of different nature, modeled in DINGO via different sub-classes; multiplecriteria can coexist in a single funding scheme; they provide a conceptualisation (straight-forwardly extensible by sub-classing) to characterise funding policies in relation to fundingschemes and activities.We present in Figure 1 a graphical illustration of the main parts of the ontology, both classesand properties, portrayed respectively by ovals and arrows.Figure 1: Graphical representation of DINGO (main parts).
DINGO is documented at https://w3id.org/dingo. The documentation has been created usingcustom software written in Python (unpublished) that automatically extracts classes, properties,individuals, annotations, axioms and namespaces from OWL ontologies and produces human-readable HTML. 7he machine-readable serialisation of DINGO is provided in RDF-Turtle language, and availableat https://w3id.org/dingo by redirection when visited with the “text/turtle” header. We alsoprovide, at the same address, a Shape Expression [35] data model for validation of data triples.
DINGO’s maintenance is continuous and evolutive in nature, because DINGO aims at effectivelymodeling funding and research practices, which continuously evolve by themselves. As mentioned,the evolution and extension of DINGO will be eased by the specific design choices made in creatingit, which provide for a high modeling power to cope with the variety of existing funding realities.Hence, in many cases the required evolution/extension will be minimal (just by subclassing fornew concepts).DINGO can however be straightforwardly extended even in more orthogonal directions. Forexample, as discussed in Section 5.3, DINGO focuses on the main definition of “funding” (themonetary one, see the Cambridge, Oxford and Collins dictionaries [8], [6], [27]), but it can beextended to non-monetary funding simply by providing parallel classes as Grant, with properties forthe specific resources provisions (and possibly a generalisation class to describe their commonalities).
We have presented an OWL-based ontology for research and funding called DINGO and illustratedits main features, uptake and evolutive maintenance.DINGO has the potential to constitute a key ingredient for a set of orthogonal and interoperableontologies for the knowledge area of funding, research and their impact. In particular, there is alack of ontological conceptualisations concerning the domain of impact and impact studies, hence,for instance, we have already planned the development of ontologies for data relative to impactindicators.Moreover, as we mentioned, DINGO has features that enable it to be both used for domainknowledge graphs specific to research, as well as in graphs for other domains where funding aspectsand policies are of interest (such as the arts, cultural conservation, and the like).DINGO has already been used in a number of projects, as described in Section 3. We planto engage further with relevant communities to create systems that offer information on researchfunding in distributed manner using DINGO. This should eventually lead to a truly global OpenResearch Information Graph providing access to data in several interconnected research informationsystems.
Acknowledgements.
We would like to thank the co-organisers and the participants of theworkshop “Wikidata for research”, Berlin, 17-18 June 2018 for their feedback and input.
References (2020), https://doi.org/10.1162/qss a 0002331. Presutti, V., Gangemi, A.: Content ontology design patterns as practical building blocks forweb ontologies. In: Proceedings ER 2008. pp. 128–141 (2008), https://doi.org/10.1007/978-3-540-87877-3 1132. Reich, J.: Ontological Design Patterns for the Integration of Molecular Biological Information.In: Proceedings of the German Conference on Bioinformatics GCB’99. pp. 156–166 (1999)33. Schema.org: https://schema.org/34. SCIGRAPH: scigraph.springernature.com/explorer/datasets/ontology/35. Shape-Expressions: https://shex.io/ 96. Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing (2), 85–94 (2009)37. SWC: http://data.semanticweb.org/ns/swc/ontology38. Uschold, M.: Creating, integrating and maintaining local and global ontologies. In: Proceedingsof the First Workshop on Ontology Learning (OL-2000) (2000)39. Wilkinson, M. D. and Dumontier, M. and Aalbersberg, I. J. and Appleton, G. and et al.: TheFAIR Guiding Principles for scientific data management and stewardship. Scientific Data3