Diego Collarana
University of Bonn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diego Collarana.
ieee international conference semantic computing | 2016
Irlán Grangel-González; Lavdim Halilaj; Gökhan Coskun; Sören Auer; Diego Collarana; Michael Hoffmeister
In the engineering and manufacturing domain, there is currently an atmosphere of departure to a new era of digitized production. In different regions, initiatives in these directions are known under different names, such as industrie du futur in France, industrial internet in the US or Industrie 4.0 in Germany. While the vision of digitizing production and manufacturing gained much traction lately, it is still relatively unclear how this vision can actually be implemented with concrete standards and technologies. Within the German Industry 4.0 initiative, the concept of an Administrative Shell was devised to respond to these requirements. The Administrative Shell is planned to provide a digital representation of all information being available about and from an object which can be a hardware system or a software platform. In this paper, we present an approach to develop such a digital re presentation based on semantic knowledge representation formalisms such as RDF, RDF Schema and OWL. We present our concept of a Semantic I4.0 Component which addresses the communication and comprehension challenges in Industry 4.0 scenarios using semantic technologies. Our approach is illustrated with a concrete example showing its benefits in a real-world use case.
international world wide web conferences | 2016
Diego Collarana; Christoph Lange; Sören Auer
The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Webs diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires new methods for horizontal search. FuhSen is a federated, RDF-based, hybrid search platform that searches, integrates and summarizes information about entities from distributed heterogeneous information sources using Linked Data. As a use case, we present scenarios where law enforcement institutions search and integrate data spread across these different Web segments to identify cases of organized crime. We present the architecture and implementation of FuhSen and explain the queries that can be addressed with this new approach.
OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2016
Diego Collarana; Mikhail Galkin; Christoph Lange; Irlán Grangel-González; Maria-Esther Vidal; Sören Auer
A vast amount of information about various types of entities is spread across the Web, e.g., people or organizations on the Social Web, product offers on the Deep Web or on the Dark Web. These data sources can comprise heterogeneous data and are equipped with different search capabilities e.g., Search API. End users such as investigators from law enforcement institutions searching for traces and connections of organized crime have to deal with these interoperability problems not only during search time but also while merging data collected from different sources. We devise FuhSen, a keyword-based federated engine that exploits the search capabilities of heterogeneous sources during query processing and generates knowledge graphs on-demand applying an RDF-Molecule integration approach in response to keyword-based queries. The resulting knowledge graph describes the semantics of entities collected from the integrated sources, as well as relationships among these entities. Furthermore, FuhSen utilizes ontologies to describe the available sources in terms of content and search capabilities and exploits this knowledge to select the sources relevant for answering a keyword-based query. We conducted a user evaluation where FuhSen is compared to traditional search engines. FuhSen semantic search capabilities allow users to complete search tasks that could not be accomplished with traditional Web search engines during the evaluation study.
emerging technologies and factory automation | 2016
Irlán Grangel-González; Lavdim Halilaj; Sören Auer; Steffen Lohmann; Christoph Lange; Diego Collarana
Industry 4.0 is a global endeavor of automation and data exchange to create smart factories maximizing production capabilities and allowing for new business models. The Reference Architecture Model for Industry 4.0 (RAMI 4.0) describes the core aspects of Industry 4.0 and defines Administration Shells as digital representations of Industry 4.0 components. In this paper, we present an approach to model and implement Industry 4.0 components with the Resource Description Framework (RDF). The approach addresses the challenges of interoperable communication and machine comprehension in Industry 4.0 settings using semantic technologies. We show how related standards and vocabularies, such as IEC 62264, eCl@ss, and the Ontology of Units of Measure (OM), can be utilized along with the RDF-based representation of the RAMI 4.0 concepts. Finally, we demonstrate the applicability and benefits of the approach using an example from a real-world use case.
knowledge acquisition, modeling and management | 2016
Irlán Grangel-González; Diego Collarana; Lavdim Halilaj; Steffen Lohmann; Christoph Lange; Maria-Esther Vidal; Sören Auer
Industry 4.0 standards, such as AutomationML, are used to specify properties of mechatronic elements in terms of views, such as electrical and mechanical views of a motor engine. These views have to be integrated in order to obtain a complete model of the artifact. Currently, the integration requires user knowledge to manually identify elements in the views that refer to the same element in the integrated model. Existing approaches are not able to scale upi¾?to large models where a potentially large number of conflicts may exist across the different views of an element. To overcome this limitation, we developed Alligator, a deductive rule-based system able to identify conflicts between AutomationML documents. We define a Datalog-based representation of the AutomationML input documents, and a set of rules for identifying conflicts. A deductive engine is used to resolve the conflicts, to merge the input documents and produce an integrated AutomationML document. Our empirical evaluation of the quality of Alligator against a benchmark of AutomationML documents suggest that Alligator accurately identifies various types of conflicts between AutomationML documents, and thus helps increasing the scalability, efficiency, and coherence of models for Industry 4.0 manufacturing environments.
ieee international conference semantic computing | 2017
Diego Collarana; Mikhail Galkin; Ignacio Traverso-Ribón; Christoph Lange; Maria-Esther Vidal; Sören Auer
The evolution of the Web of documents into a Web of services and data has resulted in an increased availability of data from almost any domain. For example, general domain knowledge bases such as DBpedia or Wikidata, or domain specific Web sources like the Oxford Art archive, allow for accessing knowledge about a wide variety of entities including people, organizations, or art paintings. However, these data sources publish data in different ways, and they may be equipped with different search capabilities, e.g., SPARQL endpoints or REST services, thus requiring data integration techniques that provide a unified view of the published data. We devise a semantic data integration approach named FuhSen that exploits keyword and structured search capabilities of Web data sources and generates on-demand knowledge graphs merging data collected from available Web sources. Resulting knowledge graphs model semantics or meaning of merged data in terms of entities that satisfy keyword queries, and relationships among those entities. FuhSen relies on both RDF to semantically describe the collected entities, and on semantic similarity measures to decide on relatedness among entities that should be merged. We empirically evaluate the results of FuhSen data integration techniques on data from the DBpedia knowledge base. The experimental results suggest that FuhSen data integration techniques accurately integrate similar entities semantically into knowledge graphs.
international conference on semantic systems | 2017
Elisa Sibarani; Simon Scerri; Sören Auer; Diego Collarana
The rapid changes in the job market, including a continuous year-on-year increase in new skills in sectors like information technology, has resulted in new challenges for job seekers and educators alike. The former feel less informed about which skills they should acquire to raise their competitiveness, whereas the latter are inadequately prepared to offer courses that meet the expectations by fast-evolving sectors like data science. In this paper, we describe efforts to obtain job demand data and employ a information extraction method guided by a purposely-designed vocabulary to identify skills requested by the job vacancies. The Ontology-based Information Extraction (OBIE) method employed relies on the Skills and Recruitment Ontology (SARO), which we developed to represent job postings in the context of skills and competencies needed to fill a job role. Skill demand by employers is then abstracted using co-word analysis based on a set of skill keywords and their co-occurrences in the job posts. This method reveals the technical skills in demand together with their structure for revealing significant linkages. In an evaluation, the performance of the OBIE method for automatic skill annotation is estimated (strict F-measure) at 79%, which is satisfactory given that human inter-annotator agreement was found to be automatic keyword indexing with an overall strict F-measure at 94%. In a secondary study, sample skill maps generated from the matrix of co-occurrences and correlation are presented and discussed as proof-of-concept, highlighting the potential of using the extracted OBIE data for more advanced analysis that we plan as future work, including time series analysis.
web intelligence, mining and semantics | 2017
Diego Collarana; Mikhail Galkin; Ignacio Traverso-Ribón; Maria-Esther Vidal; Christoph Lange; Sören Auer
The nature of the RDF data model allows for numerous descriptions of the same entity. For example, different RDF vocabularies may be utilized to describe pharmacogenomic data, and the same drug or gene is represented by different RDF graphs in DBpedia or Drug-bank. To provide a unified representation of the same real-world entity, RDF graphs need to be semantically integrated. Semantic integration requires the management of knowledge encoded in RDF vocabularies to determine the relatedness of different RDF representations of the same entity, e.g., axiomatic definition of vocabulary properties or resource equivalences. We devise MINTE, an integration technique that relies on both: knowledge stated in RDF vocabularies and semantic similarity measures to merge semantically equivalent RDF graphs, i.e., graphs corresponding to the same real-world entity. MINTE follows a two-fold approach to solve the problem of integrating RDF graphs. In the first step, MINTE implements a 1--1 weighted perfect matching algorithm to identify semantically equivalent RDF entities in different graphs. Then, MINTE relies on different fusion policies to merge triples from these semantically equivalent RDF entities. We empirically evaluate the performance of MINTE on data from DBpedia, Wiki-data, and Drugbank. The experimental results suggest that MINTE is able to accurately integrate semantically equivalent RDF graphs.
international conference on web engineering | 2017
Diego Collarana; Maria-Esther Vidal; Sören Auer
Large Knowledge Graphs (KGs), e.g., DBpedia or Wikidata, are created with the goal of providing structure to unstructured or semi-structured data. Having these special datasets constantly evolving, the challenge is to utilize them in a meaningful, accurate, and efficient way. Further, exploiting semantics encoded in KGs, e.g., class and property hierarchies, provides the basis for addressing this challenge and producing a more accurate analysis of KG data. Thus, we focus on the problem of determining relatedness among entities in KGs, which corresponds to a fundamental building block for any semantic data integration task. We devise MateTee, a semantic similarity measure that combines the gradient descent optimization method with semantics encoded in ontologies, to precisely compute values of similarity between entities in KGs. We empirically study the accuracy of MateTee with respect to state-of-the-art methods. The observed results show that MateTee is competitive in terms of accuracy with respect to existing methods, with the advantage that background domain knowledge is not required.
international conference on semantic systems | 2017
Mikhail Galkin; Kemele M. Endris; Maribel Acosta; Diego Collarana; Maria-Esther Vidal; Sören Auer
Join operators are particularly important in SPARQL query engines that collect RDF data using Web access interfaces. State-of-the-art SPARQL query engines rely on binary join operators tailored for merging results from SPARQL queries over Web access interfaces. However, in queries with a large number of triple patterns, binary joins constitute a significant burden on the query performance. Multi-way joins that handle more than two inputs are able to reduce the complexity of pre-processing stages and reduce the execution time. Whereas in the relational databases field multi-way joins have already received some attention, the applicability of multi-way joins in SPARQL query processing remains unexplored. We devise SMJoin, a multi-way non-blocking join operator tailored for independently merging results from more than two RDF data sources. SMJoin implements intra-operator adaptivity, i.e., it is able to adjust join execution schedulers to the conditions of Web access interfaces; thus, query answers are produced as soon as they are computed and can be continuously generated even if one of the sources becomes blocked. We empirically study the behavior of SMJoin in two benchmarks with queries of different selectivity; state-of-the-art SPARQL query engines are included in the study. Experimental results suggest that SMJoin outperforms existing approaches in very selective queries, and produces first answers as fast as compared adaptive query engines in non-selective queries.