J. Web Semant. | 2021

Knowledge graph embeddings for dealing with concept drift in machine learning

 
 
 
 
 

Abstract


Stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. However, efforts to understand whether knowledge representation and reasoning are useful for addressing concept drift1, one of the core challenges from the stream learning community, particularly those due to dramatic changes in knowledge, have been limited and scattered. In this work, we propose to study the problem in the context of the semantic representation of data streams in the Semantic Web, i.e., ontology streams. Such streams are ordered sequences of data annotated with an ontological schema. A fundamental challenge is to understand what knowledge should be encoded and how it can be integrated with stream learning methods. To address this, we show that at least three levels of knowledge encoded in ontology streams are needed to deal with concept drifts: (i) existence of novel knowledge gained from stream dynamics, (ii) significance of knowledge change and evolution, and (iii) (in)consistency of knowledge evolution. We propose an approach to encoding such knowledge via schema-enabled knowledge graph embeddings through a combination of novel ∗Corresponding author Email addresses: [email protected] (Jiaoyan Chen), [email protected] (Freddy Lécué), [email protected] (Jeff Z. Pan), [email protected] (Shumin Deng), [email protected] (Huajun Chen) 1This work is addressing the challenge of concept drift in machine learning as opposed to concept drift in the Semantic Web community where “concept” (class) meaning in ontology TBox shifts from versioning, iterations or modifications. Note that changes in ABox alone can also lead to concept drift in learning from ontology streams. Preprint submitted to Journal of Web Semantics December 31, 2020 representations: entailment vectors, entailment weights, and a consistency vector. We illustrate our approach on supervised classification tasks. Our main findings are that: (i) It is possible to develop a general purpose framework to address concept drifts in ontology streams by coupling any machine learning classification algorithms with our proposed schema-enabled knowledge graph embeddings method; (ii) Our proposed method is robust to significant concept drift (up to 51% of stream update ratio) and out-performs state of the art methods with 12% to 35% improvement on the Macro-F1 score in the tested scenarios; (iii) Only a small part of the ontological entailment (less than 20%) play an important role in determining the consistency between two snapshots; (iv) Predictions with consistent models outperform those with inconsistent models by over 300% in the two use cases. Our findings could help future work on applications of stream learning, such as autonomous driving, which demand high accuracy of stream learning in the presence of sudden and disruptive changes.

Volume 67
Pages 100625
DOI 10.1016/j.websem.2020.100625
Language English
Journal J. Web Semant.

Full Text