A Simple and Efficient Framework for Identifying Relation-gaps in Ontologies
aa r X i v : . [ c s . D B ] S e p A Simple and Efficient Framework forIdentifying Relation-gaps in Ontologies
Subhashree S and P Sreenivasa Kumar
Department of Computer Science and Engineering, Indian Institute of Technology -Madras, Chennai, India. { ssshree,psk } @cse.iitm.ac.in Abstract.
Though many ontologies have huge number of classes, onecannot find a good number of object properties connecting the classes inmost of the cases. Adding object properties makes an ontology richer andmore applicable for tasks such as Question Answering. In this context,the question of which two classes should be considered for discovering ob-ject properties becomes very important. We address the above questionin this paper. We propose a simple machine learning framework whichexhibits low time complexity and yet gives promising results with respectto both precision as well as number of class-pairs retrieved.
Keywords:
Relation-gaps · Object Properties · Ontology Enrichment
In this work, we propose a novel and simple approach to identify relation-gapsin an ontology with the main focus of achieving low response-times. The goal isto find potential pairs of classes that could be connected by an object propertybut have not yet been connected i.e. they are relation-gaps in the ontology.Note that the focus is not on discovering the object properties which connecta given pair of classes. While there are systems such as OntExt [4] and DARO[5] for the above task, our goal in this work is to identify the pairs of classeswhich could serve as input to these systems. Identifying relation-gaps becomesimportant because, feeding every non-connected class-pair as input to DAROwould be inefficient. This is especially true in the case of large knowledge graphs(KGs) such as YAGO3 which has 488,469 classes but only 77 object properties[3]. In order to add more object properties to YAGO3, one has to consider ahuge number ( ≈ ) of class pairs, unless a better approach is devised. Prophet [2] predicts pairs of classes to be connected by object properties, mainlyin the NELL KG. Predicting links between nodes using the count of commonneighbours between them is very popular in social network settings. Prophetbases its working upon this notion. Given a pair of classes, Prophet computes a
Subhashree et al. score as the sum of common neighbours of all pairs of instances in the two classes,normalized by the number of instance pairs. The class pairs having a score above10 are output by Prophet. The disadvantages of this approach are: (1) If thegiven ontology is not rich enough, we cannot expect Prophet to output manynew class-pair connections. (2) It has a high response-time as it considers everypair of instances in the given two classes and computes their common neighbours.In our experiments, we observed that Prophet (when we implemented it on amachine with 16 GB main memory) takes three hours on an average to identifypotential partners for one class in the DBpedia dataset. In our previous work[5], we had proposed a solution based on word embeddings for the problem ofidentifying relation-gaps. We claimed and experimentally proved that looking forcommon neighbours between two classes using external sources leads to richerand more diverse connections in the KG. We used Word2Vec for this purposeas the word vectors learnt by the Word2Vec algorithm are such that two wordswhich have high number of common neighbouring words have highly similarrepresentations. This system has low response-time (around 5 seconds on a 32GB main memory system). The major disadvantage of this system is that it doesnot give good results for very generic classes like “Person”. For “Person”, thesystem outputs classes such as “Name”, “Year” and for more specific classes like“Athlete”, the system returns meaningful partners such as “SportsLeague”. (Allthe names in quotes are class names in DBpedia ontology.)
We propose a machine learning framework for identifying relation-gaps in anontology. The major goal of our system is to achieve low-response time. We de-sign our features such that they do not rely upon the instances of the inputclasses as this tends to increase the runtime of the system. For example, wecheck for common neighbours between 2 given classes at the class-level whileProphet does this at an instance level. In our previous work [5] we observedthat the best results were given by three techniques - using Word2Vec, findingcommon neighbours and using the Adamic-Adar index. We also observed thatthe results given by our Word2Vec-based method were complementary to thosegiven by the other 2 techniques. Hence in this work, we build an SVM classifierwhich takes these 3 quantities as its features. The features used are as below:
Common-Neighbours (CN):
This measure captures the number of shared neigh-bors between both the nodes. A neighbour of a class is a class that is alreadylinked to it by an object property. Let Γ ( x ) denote the set of neighbours of a node x . Then cn xy = | Γ ( x ) ∩ Γ ( y ) | . Adamic-Adar Index (AA):
This index is similar tothe above feature, but assigns more weight to the less connected neighbours [1].It is defined as aa xy = P zǫΓ ( x ) ∩ Γ ( y ) 1 log | Γ ( z ) | . GloVe embeddings:
In our previouswork [5] we had used Word2Vec vectors for generating relation-gaps. However,GloVe directly focuses on word co-occurrences over the available corpus and itsembeddings relate to the probabilities that two words appear together. Since dentifying Relation-gaps in Ontologies 3
GloVe’s mechanism is more directly associated with finding common neighboursbased on their co-occurrences, we use GloVe embeddings in this work. We consider DBpedia version 2016-10 for extracting the positive instances ofour training data. There are 1105 object properties, and 708 among them havedomain and range assigned . Among these, we eliminate duplicate domain-rangeconnections and obtain 335 domain-range pairs as positive instances. In order toobtain negative instances, we manually identify 279 class pairs in the DBpediaontology as those which cannot be related by any object property (for e.g. Cheeseand Mountain). We test our classifier on six ontologies (details are in Table 1)- four ontologies have been built by our own research group and two are frompublic repositories . We have chosen the test ontologies such that: a major frac-tion of the object properties do not have their domain and range specified (HP,Pet, WM ontologies); large number of individuals are present (PP, MHBT, DSAontologies). These characteristics have a direct impact on the working of ourcompeting systems. We manually evaluate the positive class-pairs newly-foundby our proposed approach, for each ontology. Three ontology engineers (non-authors) checked whether the pairing of classes makes sense. They were asked tomark the pair as: correct or incorrect . For a class-pair to be counted as correct,two out of the three evaluators should have agreed on it. Table 2 shows sampleclass-pairs generated and time taken by the proposed system (for the entire on-tology, when run on a system with 16 GB main memory) and the precision value(ratio of correct class-pairs to the total class-pairs) for all the three systems - theproposed approach, our earlier work called here as WV-based[5] and Prophet[2].List of all class-pairs generated can be seen in the project web page . From Table2, it can be seen that the proposed system significantly outperforms the othertwo systems with respect to the number of relation-gaps identified. Prophet gen-erates results only for ontologies which have high number of instances (MHBTand PP) as its mechanism is based on finding common neighbours at the instancelevel. Though the DSA ontology has high number of individuals, Prophet failsto produce results because it lacks many relation instances. For ontologies whichdon’t have domain and range specified for many of the object properties (HP, Petand WM), the features based on common-neighbours and Adamic-Adar indexfail to predict any result. However, the GloVe-based feature of our model playsa major role in such input cases to give good results. Though the WV-basedsystem produces results for all ontologies, it generates lesser number of resultscompared to the currently proposed system for most of the cases. pre-trained embeddings of 100 d - http://nlp.stanford.edu/data/wordvecs/glove.6B.zip obtained by querying the DBpedia SPARQL endpoint on 31st May 2020 DSA, WM, MHBT, HP ontologies - https://sites.google.com/site/ontoworks/ontologies Pet ontology -
PP ontology - https://sites.google.com/site/ppontology/ https://sites.google.com/site/ontoworks/projects Subhashree et al.
Table 1.
Specifications of Test OntologiesDataset Classes Individuals Object Properties (OP) OP w/o domain and rangeData Struct. and Algo. (DSA) 107 154 26 2WikiMovie (WM) 35 104 14 5Mahabharata (MHBT) 22 249 33 11Harry Potter (HP) 17 12 5 5People & Pets (Pet) 60 21 14 13Plant Protection (PP) 92 548 15 0
Table 2.
Sample pairs, time (proposed system) and precision (correct/produced pairs)Dataset Sample results by the proposed approach Time-taken Comparison of Precision(in seconds) Proposed WV-based ProphetDSA (Graph Traversal, Undirected Graph) 9 165/176(0.94) 125/136(0.92) no resultsWM (Film producer, Genres); (Actor, Language) 6 127/127(1) 246/246(1) no resultsMHBT (Pandava, Kaurava); (Events, Places) 6 39/41(0.95) 5/5(1) 16/16(1)HP (Gryffindor, Slytherin) 5 21/22(0.95) 8/10(0.8) no resultsPet (pet+owner, pet); (truck, bicycle) 6 165/176(0.94) 123/130(0.95) no resultsPP (Disorder, Abnormality); (Pest, Pesticide) 8 175/178(0.98) 141/172(0.82) 14/14(1)
In this paper, we have proposed a low response-time framework for identifyingrelation-gaps in an ontology. Using the insights gained from our previous work,we have carefully picked the most useful features to build our classifier. Theproposed system gives low response-time on all the tested ontologies mainlybecause the chosen features are not dependent on the number of class instances.The proposed system substantially beats the competing systems with respect tonumber of class-pairs returned while maintaining very good precision.
References
1. Adamic, L.A., Adar, E.: Friends and neighbors on the web. Social Networks (3),211–230 (2003)2. Appel, A.P., Hruschka Junior, E.R.: Prophet - a link-predictor to learn new ruleson nell. In: 2011 IEEE 11th International Conference on Data Mining Workshops.pp. 917–924 (2011)3. Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: A knowledge base from multi-lingual wikipedias. In: CIDR 2015, Seventh Biennial Conference on Innovative DataSystems Research, Asilomar, CA, USA, January 4-7, 2015 (2015)4. Mohamed, T., Jr., E.R.H., Mitchell, T.M.: Discovering relations between noun cat-egories. In: Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, EMNLP 2011, 27-31 July 2011. pp. 1447–1455. ACL5. Subhashree, S., Kumar, P.S.: Augmenting linked data ontologies with new objectproperties. New Gener. Comput.38