An Ontology for the Materials Design Domain
AAn Ontology for the Materials Design Domain
Huanyu Li , − − − , Rickard Armiento , − − − ,and Patrick Lambrix , (cid:0) [0000 − − − Department of Computer and Information Science,Link¨oping University, 581 83 Link¨oping, Sweden Department of Physics, Chemistry and Biology,Link¨oping University, 581 83 Link¨oping, Sweden The Swedish e-Science Research Centre, Link¨oping University,581 83 Link¨oping, Sweden [email protected]
Abstract.
In the materials design domain, much of the data from ma-terials calculations are stored in different heterogeneous databases. Ma-terials databases usually have different data models. Therefore, the usershave to face the challenges to find the data from adequate sources and in-tegrate data from multiple sources. Ontologies and ontology-based tech-niques can address such problems as the formal representation of domainknowledge can make data more available and interoperable among differ-ent systems. In this paper, we introduce the Materials Design Ontology(MDO), which defines concepts and relations to cover knowledge in thefield of materials design. MDO is designed using domain knowledge inmaterials science (especially in solid-state physics), and is guided by thedata from several databases in the materials design field. We show theapplication of the MDO to materials data retrieved from well-knownmaterials databases.
Resource Type:
Ontology
IRI: https://w3id.org/mdo/full/1.0
Keywords: ontology · materials science · materials design · OPTIMADE · database More and more researchers in the field of materials science have realized thatdata-driven techniques have the potential to accelerate the discovery and designof new materials. Therefore, a large number of research groups and communi-ties have developed data-driven workflows, including data repositories (for anoverview see [13]) and task-specific analytical tools. Materials design is a tech-nological process with many applications. The goal is often to achieve a set ofdesired materials properties for an application under certain limitations in e.g.,avoiding or eliminating toxic or critical raw materials. The development of con-densed matter theory and materials modeling, has made it possible to achieve a r X i v : . [ c s . D B ] J un uantum mechanics-based simulations that can generate reliable materials databy using computer programs [16]. For instance, in [1] a flow of databases-drivenhigh-throughput materials design in which the database is used to find materialswith desirable properties, is shown. A global effort, the Materials Genome Initia-tive , has been proposed to govern databases that contain both experimentally-known and computationally-predicted material properties. The basic idea of thiseffort is that searching materials databases with desired combinations of prop-erties could help to address some of the challenges of materials design. As thesedatabases are heterogeneous in nature, there are a number of challenges to usingthem in the materials design workflow. For instance, retrieving data from morethan one database means that users have to understand and use different appli-cation programming interfaces (APIs) or even different data models to reach anagreement. Nowadays, materials design interoperability is achieved mainly viafile-based exchange involving specific formats and, at best, some partial meta-data, which is not always adequately documented as it is not guided by anontology. The second author is closely involved with another ongoing effort, theOpen Databases Integration for Materials Design (OPTIMADE ) project whichaims at making materials databases interoperational by developing a commonAPI. Also this effort would benefit from semantically enabling the system us-ing an ontology, both for search as well as for integrating information from theunderlying databases.These issues relate to the FAIR principles (Findable, Accessible, Interoper-able, and Reusable), with the purpose of enabling machines to automaticallyfind and use the data, and individuals to easily reuse the data [22]. Also in thematerials science domain, recently, an awareness regarding the importance ofsuch principles for data storage and management is developing and research inthis area is starting [6].To address these challenges and make data FAIR, ontologies and ontology-based techniques have been proposed to play a significant role. For the materialsdesign field there is, therefore, a need for an ontology to represent solid-statephysics concepts such as materials’ properties, microscopic structure as well ascalculations, which are the basis for materials design. Thus, in this paper, wepresent the Materials Design Ontology (MDO). The development of MDO wasguided by the schemas of OPTIMADE as they are based on a consensus reachedby several of the materials database providers in the field. Further, we show theuse of MDO for data obtained via the OPTIMADE API and via database-specificAPIs in the materials design field.The paper is organized as follows. We introduce some well-known databasesand existing ontologies in the materials science domain in Section 2. In Section 3,we present the development of MDO and introduce the concepts, relations andthe axiomatization of the ontology. In Section 4, we introduce the envisionedusage of MDO as well as a current implementation. In Section 5, we discuss such hings as the impact, availability and extendability of MDO as well as futurework. Finally, the paper concludes in Section 6 with a small summary. Availability:
MDO is developed and maintained on a GitHub repository ,and is available from a permanent w3id URL . In this section we discuss briefly well-known databases as well as ontologies inthe materials science field. Further, we briefly introduce OPTIMADE.
In the search for designing new materials, the calculation of electronic struc-tures is an important tool. Calculations take data representing the structureand property of materials as input and generate new such data. A common crys-tallographic data representation that is widely used by researchers and softwarevendors for materials design, is CIF . It was developed by the InternationalUnion of Crystallography Working Party on Crystallographic Information andwas first online in 2006. One of the widely used databases is the Inorganic CrystalStructure Database (ICSD) . ICSD provides data that is used as an importantstarting point in many calculations in the materials design domain.As the size of computed data grows, and more and more machine learningand data mining techniques are being used in materials design, frameworks areappearing that not only provide data but also tools. Materials Project, AFLOWand OQMD are well-known examples of such frameworks that are publicly avail-able. Materials Project [12] is a central program of the Materials GenomeInitiative, focusing on predicting the properties of all known inorganic materi-als through computations. It provides open web-based data access to computedinformation on materials, as well as tools to design new materials. To makethe data publicly available, the Materials Project provides open Materials APIand an open-source python-based programming package (pymatgen).
AFLOW [4] (Automatic Flow for Materials Discovery) is an automatic framework forhigh-throughput materials discovery, especially for crystal structure propertiesof alloys, intermetallics, and inorganic compounds. AFLOW provides a RESTAPI and a python-based programming package (aflow).
OQMD [18] (The OpenQuantum Materials Database) is also a high-throughput database consisting ofover 600,000 crystal structures calculated based on density functional theory .OQMD is designed based on a relational data model. OQMD supports a RESTAPI and a python-based programming package (qmpy). https://github.com/huanyu-li/Materials-Design-Ontology https://w3id.org/mdo Crystallographic Information Framework, https://icsd.products.fiz-karlsruhe.de/ http://oqmd.org .2 Ontologies and Standards Within the materials science domain, the use of semantic technologies is in itsinfancy with the development of ontologies and standards. The ontologies havebeen developed, focusing on representing general materials domain knowledgeand specific sub-domains, respectively.Two ontologies representing general materials domain knowledge and towhich our ontology connects are ChEBI and EMMO.
ChEBI [5] (ChemicalEntities of Biological Interest) is a freely available data set of molecular en-tities focusing on chemical compounds. The representation of such molecularentities as atom, molecule ion, etc. is the basis in both chemistry and physics.The ChEBI ontology is widely used and integrated into other domain ontolo-gies.
EMMO (European Materials & Modelling Ontology) aims at developinga standard representational ontology framework based on current knowledge ofmaterials modeling and characterization. The EMMO development started fromthe very bottom level, using the actual picture of the physical world comingfrom applied sciences, and in particular from physics and material sciences. Al-though EMMO already covers some sub-domains in materials science, manysub-domains are still lacking, including the domain MDO targets.Further, a number of ontologies from the materials science domain focus onspecific sub-domains (e.g., metals, ceramics, thermal properties, nanotechnol-ogy), and have been developed with a specific use in mind (e.g., search, dataintegration) [13]. For instance, the Materials Ontology [2] was developed fordata exchange among thermal property databases, and MatOnto ontology [3]for oxygen ion conducting materials in the fuel cell domain, NanoParticle On-tology [20] represents properties of nanoparticles with the purpose of designingnew nanoparticles, while the eNanoMapper ontology [10] focuses on assessingrisks related to the use of nanomaterials from the engineering point of view.Extensions to these ontologies in the nanoparticle domain are presented in [17].An ontology that represents formal knowledge for simulation, modeling, andoptimization in computational molecular engineering is presented in [11]. Fur-ther, an ontology design pattern to model material transformation in the fieldof sustainable construction, is proposed in [21].There are also efforts on building standards for data export from databasesand data integration among tools. To some extent the standards formalize thedescription of materials knowledge and thereby create ontological knowledge. Arecent approach is Novel Materials Discovery (NOMAD ) [7] of which the meta-data structure is defined to be independent of specific material science theory ormethods that could be used as an exchange format [9]. OPTIMADE aims at enabling interoperability between materials databases througha common REST API. During the development, OPTIMADE takes widely used ig. 1: The query model in OPTIMADE. materials databases such as those introduced in section 2.1 into account. It hasa data model that represents basic entities in the materials design domain.The process of querying OPTIMADE is shown in Figure 1. Users specify thename of a table, the response fields and filtering conditions. OPTIMADE parsesthe filtering conditions syntactically, and translates the parsing result, the table’sname and response fields into a query to the underlying database back-end (e.g.,an SQL or Mongo-DB query).
The development of MDO followed the NeOn ontology engineering methodology[19]. It consists of a number of scenarios mapped from a set of common ontologydevelopment activities. In particular, we focused on applying scenario 1 (
FromSpecification to Implementation ), scenario 2 (
Reusing and re-engineering non-ontological resources ), scenario 3 (
Reusing ontological resources ) and scenario8 (
Restructuring ontological resources ). We used OWL2 DL as representationlanguage for MDO. During the whole process, two knowledge engineers, and onedomain expert from the materials design domain were involved. In the remainderof this section, we introduce the key aspects of the development of MDO.
Requirements Analysis.
During this step, we clarified the requirements byproposing Use Cases (UC), Competency Questions (CQ) and additional restric-tions.The use cases, which were identified through literature study and discussionbetween the domain expert and the knowledge engineers based on experiencewith the development of OPTIMADE and the use of materials science databases,are listed below. – UC1: MDO will be used for representing knowledge in basic materials sciencesuch as solid-state physics and condensed matter theory.
UC2: MDO will be used for representing materials calculation and standard-izing the publication of the materials calculation data. – UC3: MDO will be used as a standard to improve the interoperability amongheterogeneous databases in the materials design domain. – UC4: MDO will be mapped to OPTIMADE’s schema to improve OPTI-MADE’s search functionality.The competency questions are listed below. – CQ1: What are the calculated properties and their values produced by amaterials calculation? – CQ2: What are the input and output structures of a materials calculation? – CQ3: What is the space group type of a structure? – CQ4: What is the lattice type of a structure? – CQ5: What is the chemical formula of a structure? – CQ6: For a series of materials calculations, what are the compositions ofmaterials with a specific range of calculated property (e.g., band gap)? – CQ7: For a specific material and a given range of a calculated property (e.g.,band gap), what is the lattice type of the structure? – CQ8: For a specific material and an expected lattice type of output structure,what are the values of calculated properties of the calculations? – CQ9: What is the computational method used in a materials calculation? – CQ10: What is the value for a specific parameter (e.g., cutoff energy) of themethod used for the calculation? – CQ11: Which software produced the result of a calculation? – CQ12: Who are the authors of the calculation? – CQ13: Which software or code does the calculation run with? – CQ14: When was the calculation data published to the database?Further, we proposed a list of additional restrictions that help in definingconcepts. Some examples are shown below. The full list of additional restrictionscan be found at the GitHub repository . – A materials property can relate to a structure. – A materials calculation has exactly one corresponding computational method. – A structure corresponds to one specific space group. – A materials calculation is performed by some software programs or codes.
Reusing and re-engineering non-ontological resources.
To obtain theknowledge for building the ontology, we followed two steps: (1) the collectionand analysis of non-ontological resources that are relevant to the materials designdomain, and (2) discussions with the domain expert regarding the concepts andrelationships to be modeled in the ontology. The collection of non-ontologicalresources comes from: (1) the dictionaries of CIF and International Tables forCrystallography; (2) the APIs from different databases (e.g., Materials Project,AFLOW, OQMD) and OPTIMADE. https://github.com/huanyu-li/Materials-Design-Ontology/blob/master/requirements.md odular development aiming at building design patterns. We identifieda pattern related to provenance information in the repository of Ontology DesignPatterns (ODP) that could be reused or re-engineered for MDO. This has ledto the reuse of entities in PROV-O [14]. Further, we built MDO in modulesconsidering the possibility for each module to be an ontology design pattern,e.g., the calculation module.
Connection and Integration of Existing Ontologies.
MDO is connected toEMMO by reusing the concept ‘Material’, and to ChEBI by reusing the concept‘atom’. Further, concepts from PROV-O are used. We use the metadata termsfrom the Dublin Core Metadata Initiative (DCMI) to represent the metadataof MDO. MDO consists of one basic module,
Core , and two domain-specific modules,
Structure and
Calculation , importing the
Core module. In addition, the
Prove-nance module, which also imports
Core , models the provenance information ofmaterials calculations. In total, the OWL2 DL representation of the ontologycontains 33 classes, 25 object properties, and 37 data properties. Figure 6 showsan overview of the ontology while Figures 2–5 show the different modules. Figure7 shows the description logic axioms for MDO. The ontology specification is alsopublicly accessible at w3id.org . The competency questions can be answeredusing the concepts and relations in the different modules (CQ1 and CQ2 by Core , CQ3 to CQ8 by
Structure , CQ9 and CQ10 by
Calculation , and CQ11 toCQ14 by
Provenance ).The
Core module (Figure 2) consists of the top-level concepts and relationsof MDO, which are also reused in other modules. The module represents generalinformation of materials calculations. The concepts
Calculation and
Structure represent materials calculations and materials’ structures, respectively, while
Property represents materials properties.
Property is specialized into the dis-joint concepts
CalculatedProperty and
PhysicalProperty (Core1, Core2, Core3 inFigure 7). When a calculation is applied on materials structures, each calculation takes some structures and properties as input, and may output structures and calculated properties (Core4, Core5). Further, we use EMMO’s concept
Material and state that each structure is related to some material (Core6)).
Properties are also related to structures (Core7).The
Structure (Figure 3) module represents the structural informationof materials. Each structure has exact one composition which represents whatchemical elements compose the structure and the ratio of elements in the struc-ture (Struc1). The composition has different representations of chemical for-mulas. The occupancy of a structure relates the sites with the species , i.e. thespecific chemical elements, that occupy the site (Struc2 - Struc5). Each site has http://purl.org/dc/terms/ https://w3id.org/mdo/full/1.0 ig. 2: Concepts and relations in the Core module. no more than one representation of coordinates in Cartesian format and one infractional format (Struc6, Struc7). The spatial information regarding structuresis essential to reflect physical characteristics such as melting point and strengthof materials. To represent this spatial information, we state that each structure is represented by some bases and a (periodic) structure can also be representedby one or more lattices (Struc8). Each basis and each lattice can be identified byone axis-vectors set or one length triple together with one angle triple (Struc9,Struc10). In crystallography, point groups and space groups are used to repre-sent information of the symmetry of a structure. The space group represents asymmetry group of patterns in three dimensions of a structure and the pointgroup represents a group of linear mappings which correspond to the group ofmotions in space to determine the symmetry of a structure . Each structure hasone corresponding space group (Struc11). Based on the definition from Interna-tional Tables for Crystallography, each space group also has some corresponding point groups (Struc12).
Fig. 3:
Concepts and relations in the Structure module. he Calculation module (Figure 4) represents the classification of differentcomputational methods. Each calculation is achieved by a specific computationalmethod (Cal1). Each computational method has some parameters (Cal2). In thecurrent version of this module, we represent two different methods, the densityfunctional theory method and the
Hartree-Fock method (Cal3, Cal4). In particu-lar, the density functional theory method is frequently used in materials designto investigate the electronic structure. Such method has at least one correspond-ing exchange correlation energy functional (Cal5) which is used to calculateexchange–correlation energy of a system. There are different kinds of functionalsto calculate exchange–correlation energy (Cal6 - Cal11).
Fig. 4:
Concepts and relations in the Calculation module.
The
Provenance module (Figure 5) represents the provenance informationof materials data and calculation. We reuse part of PROV-O and defined a newconcept
ReferenceAgent as a sub-concept of PROV-O’s agent (Prov1). We statethat each structure and property can be published by reference agents whichcould be databases or publications (Prov2, Prov3). Each calculation is producedby a specific software (Prov4).
Fig. 5:
Concepts and relations in the Provenance module. i g . : O n t o l og i c a l E n t i t i e s i n M D O . T h e f o u r m o du l e s , C o r e , S t r u c t u r e , C a l c u l a t i o n a nd P r o ve n a n ce a r e s h o w n w i t hd i ff e r e n t c o l o r s , b o l d c o n t e n t s h o w s t h e c o nn e c t i o n s t oo t h e r o n t o l og i e s . Core1) CalculatedProperty (cid:118)
Property(Core2) PhysicalProperty (cid:118)
Property(Core3) CalculatedProperty (cid:117)
PhysicalProperty (cid:118) ⊥ (Core4) Calculation (cid:118) ∃ hasInputStructure.Structure (cid:117) ∀ hasInputStructure.Structure (cid:117) ∀ hasOutputStructure.Structure(Core5) Calculation (cid:118) ∃ hasInputProperty.Property (cid:117) ∀ hasInputProperty.Property (cid:117) ∀ hasOutputCalculatedProperty.CalculatedProperty(Core6) Structure (cid:118) ∃ relatedToMaterial.Material (cid:117) ∀ relatedToMaterial.Material(Core7) Property (cid:118) ∀ relatesToStructure.Structure(Struc1) Structure (cid:118) = 1 hasComposition.Composition (cid:117) ∀ hasComposition.Composition(Struc2) Structure (cid:118) ∃ hasOccupancy.Occupancy (cid:117) ∀ hasOccupancy.Occupancy(Struc3) Occupancy (cid:118) ∃ hasSpecies.Species (cid:117) ∀ hasSpecies.Species(Struc4) Occupancy (cid:118) ∃ hasSite.Site (cid:117) ∀ hasSite.Site(Struc5) Species (cid:118) = 1 hasElement.Atom(Struc6) Site (cid:118) ≤ (cid:117) ∀ hasCartesianCoordinates.CoordinateVector(Struc7) Site (cid:118) ≤ (cid:117) ∀ hasCartesianCoordinates.CoordinateVector(Struc8) Structure (cid:118) ∃ hasBasis.Basis (cid:117) ∀ hasBasis.Basis (cid:117) ∀ hasLattice.Lattice(Struc9) Basis (cid:118) ∃ = 1 hasAxisVectors.AxisVectors (cid:116) (= 1 hasLengthTriple.LengthTriple (cid:117) = 1 hasAngleTriple.AngleTriple)(Struc10) Lattice (cid:118) ∃ = 1 hasAxisVectors.AxisVectors (cid:116) (= 1 hasLengthTriple.LengthTriple (cid:117) = 1 hasAngleTriple.AngleTriple)(Struc11) Structure (cid:118) = 1 hasSpaceGroup.SpaceGroup (cid:117) ∀ hasSpaceGroup.SpaceGroup(Struc12) SpaceGroup (cid:118) ∃ hasPointGroup.PointGroup (cid:117) ∀ hasPointGroup.PointGroup(Cal1) Calculation (cid:118) = 1 hasComputationalMethod.ComputationalMethod(Cal2) ComputationalMethod (cid:118) ∃ hasParameter.ComputationalMethodParameter (cid:117) ∀ hasParameter.ComputationalMethodParameter(Cal3) DensityFunctionalTheoryMethod (cid:118)
ComputationalMethod(Cal4) HartreeFockMethod (cid:118)
ComputationalMethod(Cal5) DensityFunctionalTheoryMethod (cid:118)∃ hasXCFunctional.ExchangeCorrelationFunctional (cid:117) ∀ hasXCFunctional.ExchangeCorrelationFunctional(Cal6) GeneralizedGradientApproximation (cid:118)
ExchangeCorrelationFunctional(Cal7) LocalDensityApproximation (cid:118)
ExchangeCorrelationFunctional(Cal8) metaGeneralizedGradientApproximation (cid:118)
ExchangeCorrelationFunctional(Cal9) HybridFunctional (cid:118)
ExchangeCorrelationFunctional(Cal10) HybridGeneralizedGradientApproximation (cid:118)
HybridFunctional(Cal11) HybridmetaGeneralizedGradientApproximation (cid:118)
HybridFunctional(Prov1) ReferenceAgent (cid:118)
Agent(Prov2) Structure (cid:118) ∀ wasAttributedTo.ReferenceAgent(Prov3) Property (cid:118) ∀ wasAttributedTo.ReferenceAgent(Prov4) Calculation (cid:118) ∃ wasAssociatedwith.SoftwareAgent
Fig. 7:
Description logic axioms for MDO.
MDO Usage
In Figure 8, we show the vision for the use of MDO for semantic search overOPTIMADE and materials science databases. By generating mappings betweenMDO and the schemas of materials databases, we can create MDO-enabled queryinterfaces. The querying can occur, for instance, via MDO-based query expan-sion, MDO-based mediation or through MDO-enabled data warehouses.As a proof of concept (full lines in the figure), we created mappings betweenMDO and the schemas of OPTIMADE and part of Materials Project. Usingthe mappings we created an RDF data set with data from Materials project.Further, we built a SPARQL query application which can be used to query theRDF data set using MDO terminology. Examples are given below.
Fig. 8:
The vision of the use of MDO. The full-lined components in the figure arecurrently implemented in a prototype.
Instantiating a materials calculation using MDO.
In Figure 9 we exem-plify the use of MDO to represent a specific materials calculation and relateddata in an instantiation. The example is from one of the 85 stable materials pub-lished in Materials Project in [8]. The calculation is about one kind of elpasolites,with the composition Rb Li Ti Cl . To not overcrowd the figure, we only showthe instances corresponding the calculation’s output structure, and for multiplecalculated properties, species and sites, we only show one instance respectively.Connected to the instances of the Core module’s concepts, are instances repre-senting the structural information of the output structure, the provenance in-formation of the output structure and calculated property, and the informationabout the computational method used for the calculation. Mapping the data from a materials database to RDF using MDO.
As presented in section 2.1, data from many materials databases are providedthrough the providers’ APIs. A commonly used format is JSON. Our current ig. 9:
An instantiated materials calculation. implementation mapped all JSON data related to the 85 stable materials from[8] to RDF. We constructed the mappings by using SPARQL-Generate [15].Listing 1.1 shows a simple example on how to write the mappings on ‘bandgap’ which is a
CalculatedProperty . The result is shown in Listing 1.2. The finalRDF dataset contains 40,066 triples. The SPARQL-generate script and the RDFdataset are available from the GitHub repository . This RDF dataset is usedfor executing SPARQL queries such as the one presented below. Listing 1.1:
A simple example of mapping
BASE
Listing 1.2:
RDF data https://github.com/huanyu-li/Materials-Design-Ontology/tree/master/mapping_generator SPARQL Query Example.
As example we show a SPARQL query relatedto CQ6 in Listing 1.3. The result contains 7 records which are shown in Table1. The query is: – “What are the materials of which the value of band gap is higher than 5eV?”(The result should contain the formula, and the value of band gap.) Listing 1.3:
A SPARQL query example onMaterials Project’s dataset
Table 1:
The result of the query formula valueCs Rb In F Rb Ga F K In F Na In F Rb Ga F Na Ga F K Ga F We show more SPARQL query examples and the corresponding result in theGitHub repository . To our knowledge, MDO is the first OWL ontology representing solid-statephysics concepts, which are the basis for materials design.The ontology fills a need for semantically enabling access to and integrationof materials databases, and for realizing FAIR data in the materials design field.This will have a large impact on the effectiveness and efficiency of finding relevantmaterials data and calculations, thereby augmenting the speed and the qualityof the materials design process. Through our connection with OPTIMADE andbecause of the fact that we have created mappings between MDO and somemajor materials databases, the potential for impact is large.The development of MDO followed well-known practices from the ontologyengineering point of view (NeOn methodology, modular design, and the use ofODPs). Further, we reused concepts from PROV-O, ChEBI, and EMMO.A permanent URL is reserved from w3id.org for MDO. MDO is maintainedon a GitHub repository from where the ontology in OWL2 DL, visualizations ofthe ontology and modules, UCs, CQs and restrictions are available. It is licensedvia an MIT license .Due to our modular approach MDO can be extended with other modules, forinstance, regarding different types of calculations and their specific properties.We identified, for instance, the need for an X Ray Diffraction module to model https://github.com/huanyu-li/Materials-Design-Ontology/tree/master/sparql_query https://github.com/huanyu-li/Materials-Design-Ontology/blob/master/LICENSE he experimental data of the diffraction used to explore the structural informa-tion of materials, and an Elastic Tensor module to model data in a calculationthat represents a structure’s elasticity. We may also refine the current ontology.For instance, it may be interesting to model workflows containing multiple calcu-lations . We will also consider to publish modules as ODPs as they have encodedthe practice of modeling the knowledge in the domain.
In this paper, we presented MDO, an ontology which defines concepts and re-lations to cover the knowledge in the field of materials design and which reusesconcepts from other ontologies. We discussed the ontology development processshowing use cases and competency questions. Further, we showed the use of MDOfor semantically enabling materials database search. As a proof of concept, wemapped MDO to OPTIMADE and part of Materials Project and showed query-ing functionality using SPARQL on a dataset from Materials Project.
Acknowledgements.
This work has been financially supported by the Swedishe-Science Research Centre (SeRC), the Swedish National Graduate School inComputer Science (CUGS), and the Swedish Research Council (Vetenskapsr˚adet,dnr 2018-04147).
References
1. Armiento, R.: Database-driven high-throughput calculations and machine learningmodels for materials design. arXiv preprint arXiv:1910.02336 (2019)2. Ashino, T.: Materials Ontology: An Infrastructure for Exchanging Materials Infor-mation and Knowledge. Data Science Journal , 54–61 (2010)3. Cheung, K., Drennan, J., Hunter, J.: Towards an ontology for data-driven discoveryof new materials. In: AAAI Spring Symposium: Semantic Scientific KnowledgeIntegration. pp. 9–14 (2008)4. Curtarolo, S., et al.: AFLOW: an automatic framework for high-throughput ma-terials discovery. Computational Materials Science , 218–226 (2012)5. Degtyarenko, K., et al.: ChEBI: a database and ontology for chemical entities ofbiological interest. Nucleic acids research (suppl 1), D344–D350 (2008)6. Draxl, C., Scheffler, M.: NOMAD: The FAIR concept for big data-driven materialsscience. MRS Bulletin (9), 676–682 (2018)7. Draxl, C., Scheffler, M.: The NOMAD laboratory: from data sharing to artificialintelligence. Journal of Physics: Materials (3), 036001 (2019)8. Faber, F.A., Lindmaa, A., Von Lilienfeld, O.A., Armiento, R.: Machine learningenergies of 2 million elpasolite (a b c 2 d 6) crystals. Physical review letters (13),135502 (2016)9. Ghiringhelli, L.M., et al.: Towards a Common Format for Computational MaterialsScience Data. PSI-K Scientific Highlights July (2016)10. Hastings, J., et al.: eNanoMapper: harnessing ontologies to enable data integrationfor nanomaterial risk assessment. Journal of biomedical semantics (1), 10 (2015)1. Horsch, M.T., et al.: Semantic interoperability and characterization of data prove-nance in computational molecular engineering. Journal of Chemical & EngineeringData (3), 1313–1329 (2020)12. Jain, A., et al.: The Materials Project: A materials genome approach to acceleratingmaterials innovation. APL Materials (1), 011002 (2013)13. Lambrix, P., Armiento, R., Delin, A., Li, H.: Big semantic data processing inthe materials design domain. In: Encyclopedia of Big Data Technologies. Springer(2019)14. Lebo, T., et al.: Prov-o: The prov ontology. W3C recommendation (2013)15. Lefran¸cois, M., Zimmermann, A., Bakerally, N.: A SPARQL extension for gener-ating RDF from heterogeneous formats. In: European Semantic Web Conference.pp. 35–50 (2017)16. Lejaeghere, K., et al.: Reproducibility in density functional theory calculations ofsolids. Science (6280), aad3000 (2016)17. Li, H., Armiento, R., Lambrix, P.: A method for extending ontologies with appli-cation to the materials science domain. Data Science Journal (1) (2019)18. Saal, J.E., et al.: Materials design and discovery with high-throughput densityfunctional theory: the open quantum materials database (OQMD). JOM (11),1501–1509 (2013)19. Su´arez-Figueroa, M.C., G´omez-P´erez, A., Fern´andez-L´opez, M.: The NeOnmethodology for ontology engineering. In: Ontology engineering in a networkedworld, pp. 9–34. Springer (2012)20. Thomas, D.G., Pappu, R.V., Baker, N.A.: Nanoparticle ontology for cancer nan-otechnology research. Journal of Biomedical Informatics (1), 59–74 (2011)21. Vardeman II, C.F., et al.: An ontology design pattern and its use case for modelingmaterial transformation. Semantic Web (5), 719–731 (2017)22. Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data manage-ment and stewardship. Scientific data3