Mikhail Galkin
University of Bonn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikhail Galkin.
OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2016
Diego Collarana; Mikhail Galkin; Christoph Lange; Irlán Grangel-González; Maria-Esther Vidal; Sören Auer
A vast amount of information about various types of entities is spread across the Web, e.g., people or organizations on the Social Web, product offers on the Deep Web or on the Dark Web. These data sources can comprise heterogeneous data and are equipped with different search capabilities e.g., Search API. End users such as investigators from law enforcement institutions searching for traces and connections of organized crime have to deal with these interoperability problems not only during search time but also while merging data collected from different sources. We devise FuhSen, a keyword-based federated engine that exploits the search capabilities of heterogeneous sources during query processing and generates knowledge graphs on-demand applying an RDF-Molecule integration approach in response to keyword-based queries. The resulting knowledge graph describes the semantics of entities collected from the integrated sources, as well as relationships among these entities. Furthermore, FuhSen utilizes ontologies to describe the available sources in terms of content and search capabilities and exploits this knowledge to select the sources relevant for answering a keyword-based query. We conducted a user evaluation where FuhSen is compared to traditional search engines. FuhSen semantic search capabilities allow users to complete search tasks that could not be accomplished with traditional Web search engines during the evaluation study.
arXiv: Information Retrieval | 2015
Mikhail Galkin; Dmitry Mouromtsev; Sören Auer
The abundance of the data in the Internet facilitates the improvement of extraction and processing tools. The trend in the open data publishing encourages the adoption of structured formats like CSV and RDF. However, there is still a plethora of unstructured data on the Web which we assume contain semantics. For this reason, we propose an approach to derive semantics from web tables which are still the most popular publishing tool on the Web. The paper also discusses methods and services of unstructured data extraction and processing as well as machine learning techniques to enhance such a workflow. The eventual result is a framework to process, publish and visualize linked open data. The software enables tables extraction from various open data sources in the HTML format and an automatic export to the RDF format making the data linked. The paper also gives the evaluation of machine learning techniques in conjunction with string similarity functions to be applied in a tables recognition task.
ieee international conference semantic computing | 2017
Diego Collarana; Mikhail Galkin; Ignacio Traverso-Ribón; Christoph Lange; Maria-Esther Vidal; Sören Auer
The evolution of the Web of documents into a Web of services and data has resulted in an increased availability of data from almost any domain. For example, general domain knowledge bases such as DBpedia or Wikidata, or domain specific Web sources like the Oxford Art archive, allow for accessing knowledge about a wide variety of entities including people, organizations, or art paintings. However, these data sources publish data in different ways, and they may be equipped with different search capabilities, e.g., SPARQL endpoints or REST services, thus requiring data integration techniques that provide a unified view of the published data. We devise a semantic data integration approach named FuhSen that exploits keyword and structured search capabilities of Web data sources and generates on-demand knowledge graphs merging data collected from available Web sources. Resulting knowledge graphs model semantics or meaning of merged data in terms of entities that satisfy keyword queries, and relationships among those entities. FuhSen relies on both RDF to semantically describe the collected entities, and on semantic similarity measures to decide on relatedness among entities that should be merged. We empirically evaluate the results of FuhSen data integration techniques on data from the DBpedia knowledge base. The experimental results suggest that FuhSen data integration techniques accurately integrate similar entities semantically into knowledge graphs.
ieee international conference semantic computing | 2016
Mikhail Galkin; Sören Auer; Hak Lae Kim; Simon Scerri
Semantic computing and enterprise Linked Data have recently gained traction in enterprises. Although the concept of Enterprise Knowledge Graphs (EKGs) has meanwhile received some attention, a formal conceptual framework for designing such graphs has not yet been developed. By EKG we refer to a semantic network of concepts, properties, individuals and links representing and referencing foundational and domain knowledge relevant for an enterprise. Through the efforts reported in this paper, we aim to bridge the gap between the increasing need for EKGs and the lack of formal methods for realising them. We present a thorough study of the key concepts of knowledge graphs design along with an analysis of the advantages and disadvantages of various design decisions. In particular, we distinguish between two polar approaches towards data fusion, i.e., the unified and the federated approach, describe their benefits and point out shortages.
14th Conference of Open Innovation Association FRUCT | 2013
Dmitry Mouromtsev; Vitaly Vlasov; Olga Parkhimovich; Mikhail Galkin; Vitaly Knyazev
This paper discusses the Russian projects publishing open government data. The article also describes the development of the open linked data portal and its approach to convert open government data in the open linked data. Information Workbench is used to build this system. It allows storing, visualizing and converting data files in Semantic Web formats.
web intelligence, mining and semantics | 2017
Diego Collarana; Mikhail Galkin; Ignacio Traverso-Ribón; Maria-Esther Vidal; Christoph Lange; Sören Auer
The nature of the RDF data model allows for numerous descriptions of the same entity. For example, different RDF vocabularies may be utilized to describe pharmacogenomic data, and the same drug or gene is represented by different RDF graphs in DBpedia or Drug-bank. To provide a unified representation of the same real-world entity, RDF graphs need to be semantically integrated. Semantic integration requires the management of knowledge encoded in RDF vocabularies to determine the relatedness of different RDF representations of the same entity, e.g., axiomatic definition of vocabulary properties or resource equivalences. We devise MINTE, an integration technique that relies on both: knowledge stated in RDF vocabularies and semantic similarity measures to merge semantically equivalent RDF graphs, i.e., graphs corresponding to the same real-world entity. MINTE follows a two-fold approach to solve the problem of integrating RDF graphs. In the first step, MINTE implements a 1--1 weighted perfect matching algorithm to identify semantically equivalent RDF entities in different graphs. Then, MINTE relies on different fusion policies to merge triples from these semantically equivalent RDF entities. We empirically evaluate the performance of MINTE on data from DBpedia, Wiki-data, and Drugbank. The experimental results suggest that MINTE is able to accurately integrate semantically equivalent RDF graphs.
international conference on semantic systems | 2017
Mikhail Galkin; Kemele M. Endris; Maribel Acosta; Diego Collarana; Maria-Esther Vidal; Sören Auer
Join operators are particularly important in SPARQL query engines that collect RDF data using Web access interfaces. State-of-the-art SPARQL query engines rely on binary join operators tailored for merging results from SPARQL queries over Web access interfaces. However, in queries with a large number of triple patterns, binary joins constitute a significant burden on the query performance. Multi-way joins that handle more than two inputs are able to reduce the complexity of pre-processing stages and reduce the execution time. Whereas in the relational databases field multi-way joins have already received some attention, the applicability of multi-way joins in SPARQL query processing remains unexplored. We devise SMJoin, a multi-way non-blocking join operator tailored for independently merging results from more than two RDF data sources. SMJoin implements intra-operator adaptivity, i.e., it is able to adjust join execution schedulers to the conditions of Web access interfaces; thus, query answers are produced as soon as they are computed and can be continuously generated even if one of the sources becomes blocked. We empirically study the behavior of SMJoin in two benchmarks with queries of different selectivity; state-of-the-art SPARQL query engines are included in the study. Experimental results suggest that SMJoin outperforms existing approaches in very selective queries, and produces first answers as fast as compared adaptive query engines in non-selective queries.
database and expert systems applications | 2017
Kemele M. Endris; Mikhail Galkin; Ioanna Lytra; Mohamed Nadjib Mami; Maria-Esther Vidal; Sören Auer
The increasing number of RDF data sources that allow for querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data sources, and rely on source descriptions for selecting the data sources over which unified queries will be executed. Albeit efficient, existing federated SPARQL query engines usually ignore the meaning of data accessible from a data source, and describe sources only in terms of the vocabularies utilized in the data source. Lack of source description may conduce to the erroneous selection of data sources for a query, thus affecting the performance of query processing over the federation. We tackle the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of RDF molecule templates, i.e., abstract descriptions of entities belonging to the same RDF class. Moreover, MULDER utilizes RDF molecule templates for source selection, and query decomposition and optimization. We empirically study the performance of MULDER on existing benchmarks, and compare MULDER performance with state-of-the-art federated SPARQL query engines. Experimental results suggest that RDF molecule templates empower MULDER federated query processing, and allow for the selection of RDF data sources that not only reduce execution time, but also increase answer completeness.
web intelligence | 2016
Mikhail Galkin; Sören Auer; Simon Scerri
Semantic technologies in enterprises have recently received increasing attention from both the research and industrial side. The concept of Linked Enterprise Data (LED) describes a framework to incorporate benefits of semantic technologies into enterprise IT environments. However, LED still remains an abstract idea lacking a point of origin, i.e., station zero from which it comes to existence. In this paper we argue and demonstrate that Enterprise Knowledge Graphs (EKGs) might be considered as an embodiment of LED lifting corporate information management to a semantic level which ultimately allows for real artificial intelligence applications. By EKG we refer to a semantic network of concepts, properties, individuals and links representing and referencing foundational and domain knowledge relevant for an enterprise. Although the concept of EKGs was not invented yesterday, both enterprise and semantic communities have not yet come up with a formal comprehensive framework for designing such graphs. In this paper we aim to join the dots between the expanding interest in EKGs expressed by those communities and the lack of blueprints for realizing the EKGs. A thorough study of the key design concepts provides a multi-dimensional aspects matrix from which an enterprise is able to choose specific features of the highest priority. We emphasize the importance of various data fusion approaches, e.g., unified and federated. In the extensive evaluation section we investigate the effect of the chosen approach on the EKG performance along several dimensions, e.g., basic reasoning and OWL entailment which account for machine understanding of the EKG data, and access control subsystem which is of the utmost importance in large enterprises.
international semantic web conference | 2018
Diego Collarana; Mikhail Galkin; Christoph Lange; Simon Scerri; Sören Auer; Maria-Esther Vidal
Institutions from different domains require the integration of data coming from heterogeneous Web sources. Typical use cases include Knowledge Search, Knowledge Building, and Knowledge Completion. We report on the implementation of the RDF Molecule-Based Integration Framework MINTE\(^+\) in three domain-specific applications: Law Enforcement, Job Market Analysis, and Manufacturing. The use of RDF molecules as data representation and a core element in the framework gives MINTE\(^+\) enough flexibility to synthesize knowledge graphs in different domains. We first describe the challenges in each domain-specific application, then the implementation and configuration of the framework to solve the particular problems of each domain. We show how the parameters defined in the framework allow to tune the integration process with the best values according to each domain. Finally, we present the main results, and the lessons learned from each application.