Rihan Hai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rihan Hai is active.

Explore More

Publication

Featured researches published by Rihan Hai.

Complex Systems Informatics and Modeling Quarterly | 2016

Metadata Extraction and Management in Data Lakes With GEMMS

Christoph Quix; Rihan Hai; Ivan Vatov

In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lakes should reduce the upfront integration costs and provide a more flexible way for data integration and analysis, as source data is loaded in its original structure to the data lake repository. Some syntactic transformation might be applied to enable access to the data in one common repository; however, a deep semantic integration is done only after the initial loading of the data into the data lake. Thereby, data is easily made available and can be restructured, aggregated, and transformed as required by later applications. Metadata management is a crucial component in a data lake, as the source data needs to be described by metadata to capture its semantics. We developed a Generic and Extensible Metadata Management System for data lakes (called GEMMS) that aims at the automatic extraction of metadata from a wide variety of data sources. Furthermore, the metadata is managed in an extensible metamodel that distinguishes structural and semantical metadata. The use case applied for evaluation is from the life science domain where the data is often stored only in files which hinders data access and efficient querying. The GEMMS framework has been proven to be useful in this domain. Especially, the extensibility and flexibility of the framework are important, as data and metadata structures in scientific experiments cannot be defined a priori .

advances in databases and information systems | 2018

Query Rewriting for Heterogeneous Data Lakes

Rihan Hai; Christoph Quix; Chen Zhou

The increasing popularity of NoSQL systems has lead to the model of polyglot persistence, in which several data management systems with different data models are used. Data lakes realize the polyglot persistence model by collecting data from various sources, by storing the data in its original structure, and by providing the datasets for querying and analysis. Thus, one of the key tasks of data lakes is to provide a unified querying interface, which is able to rewrite queries expressed in a general data model into a union of queries for data sources spanning heterogeneous data stores. To address this challenge, we propose a novel framework for query rewriting that combines logical methods for data integration based on declarative mappings with a scalable big data query processing system (i.e., Apache Spark) to efficiently execute the rewritten queries and to reconcile the query results into an integrated dataset. Because of the diversity of NoSQL systems, our approach is based on a flexible and extensible architecture that currently supports the major data structures such as relational data, semi-structured data (e.g., JSON, XML), and graphs. We show the applicability of our query rewriting engine with six real world datasets and demonstrate its scalability using an artificial data integration scenario with multiple storage systems.

international conference on knowledge engineering and ontology development | 2015

An Ontology-based Collaboration Recommender System using Patents

Sandra Geisler; Rihan Hai; Christoph Quix

Successful research and development projects start with finding the right partners for the venture. Especially for interdisciplinary projects, this is a difficult task as experts from foreign domains are not known. Furthermore, the transfer of knowledge from research into practice is becoming more important in research projects to enable the quick application of research results. This is in particular relevant for projects in medical engineering. Patents and publications contain technical knowledge which can be exploited to find suitable experts. Patents are usually more product-oriented as the inventors have to describe an application area and products might be protected by patents. On the other hand, scientific publications represent the state-of-the-art in research. The challenge is finding the right mixture of researchor application-oriented experts from different domains. Hence, we propose a recommender system for experts for a certain topic based on patent topic clustering, ontologies, and ontology matching, which maps patents to corresponding innovation fields. The medical engineering domain serves as a first test bed, since projects in this area are highly interdisciplinary.

international conference on conceptual modeling | 2018

Nested Schema Mappings for Integrating JSON

Rihan Hai; Christoph Quix; David Kensche

JSON has become one of the most popular data formats. Yet studies on JSON data integration (DI) are scarce. In this work, we study one of the key DI tasks, nested mapping generation in the context of integrating heterogeneous JSON based data sources. We propose a novel mapping representation, namely bucket forest mappings that models the nested mappings in an efficient and native manner. We show experimentally the practicality of our approach over six real world data sets. Moreover, via intensive experiments over synthetic scenarios we demonstrate that our approach scales well to the increasing metadata complexity of DI scenarios.

data integration in the life sciences | 2017

An Integrated Ontology-Based Approach for Patent Classification in Medical Engineering

Sandra Geisler; Christoph Quix; Rihan Hai; Sanchit Alekh

Medical engineering (ME) is an interdisciplinary domain with short innovation cycles. Usually, researchers from several fields cooperate in ME research projects. To support the identification of suitable partners for a project, we present an integrated approach for patent classification combining ideas from topic modeling, ontology modeling & matching, bibliometric analysis, and data integration. First evaluation results show that the use of semantic technologies in patent classification can indeed increase the quality of the results.

australasian database conference | 2015

SCIT: A Schema Change Interpretation Tool for Dynamic-Schema Data Warehouses

Rihan Hai; Vasileios Theodorou; Maik Thiele; Wolfgang Lehner

Data Warehouses (DW) have to continuously adapt to evolving business requirements, which implies structure modification (schema changes) and data migration requirements in the system design. However, it is challenging for designers to control the performance and cost overhead of different schema change implementations. In this paper, we demonstrate SCIT, a tool for DW designers to test and implement different logical design alternatives in a two-fold manner. As a main functionality, SCIT translates common DW schema modifications into directly executable SQL scripts for relational database systems, facilitating design and testing automation. At the same time, SCIT assesses changes and recommends alternative design decisions to help designers improve logical designs and avoid common dimensional modeling pitfalls and mistakes. This paper serves as a walk-through of the system features, showcasing the interaction with the tool’s user interface in order to easily and effectively modify DW schemata and enable schema change analysis.

international conference on management of data | 2016