Katja Hose | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katja Hose is active.

Explore More

Publication

Featured researches published by Katja Hose.

international semantic web conference | 2011

FedX: optimization techniques for federated query processing on linked data

Andreas Schwarte; Peter Haase; Katja Hose; Ralf Schenkel; Michael Schmidt

Motivated by the ongoing success of Linked Data and the growing amount of semantic data sources available on theWeb, new challenges to query processing are emerging. Especially in distributed settings that require joining data provided by multiple sources, sophisticated optimization techniques are necessary for efficient query processing. We propose novel join processing and grouping techniques to minimize the number of remote requests, and develop an effective solution for source selection in the absence of preprocessed metadata. We present FedX, a practical framework that enables efficient SPARQL query processing on heterogeneous, virtually integrated Linked Data sources. In experiments, we demonstrate the practicability and efficiency of our framework on a set of real-world queries and data sources from the Linked Open Data cloud. With FedX we achieve a significant improvement in query performance over state-of-the-art federated query engines.

international world wide web conferences | 2010

Data summaries for on-demand queries over linked data

Andreas Harth; Katja Hose; Marcel Karnstedt; Axel Polleres; Kai-Uwe Sattler; Jürgen Umbrich

Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data -- where structured data is accessible live and up-to-date at distributed Web resources that may change constantly -- only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on theWeb exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.

international world wide web conferences | 2013

AMIE: association rule mining under incomplete evidence in ontological knowledge bases

Luis Galárraga; Christina Teflioudi; Katja Hose; Fabian M. Suchanek

Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add missing knowledge to the KB. While ILP is a mature field, mining logical rules from KBs is different in two aspects: First, current rule mining systems are easily overwhelmed by the amount of data (state-of-the art systems cannot even run on todays KBs). Second, ILP usually requires counterexamples. KBs, however, implement the open world assumption (OWA), meaning that absent data cannot be used as counterexamples. In this paper, we develop a rule mining model that is explicitly tailored to support the OWA scenario. It is inspired by association rule mining and introduces a novel measure for confidence. Our extensive experiments show that our approach outperforms state-of-the-art approaches in terms of precision and coverage. Furthermore, our system, AMIE, mines rules orders of magnitude faster than state-of-the-art approaches.

very large data bases | 2012

A survey of skyline processing in highly distributed environments

Katja Hose; Akrivi Vlachou

During the last decades, data management and storage have become increasingly distributed. Advanced query operators, such as skyline queries, are necessary in order to help users to handle the huge amount of available data by identifying a set of interesting data objects. Skyline query processing in highly distributed environments poses inherent challenges and demands and requires non-traditional techniques due to the distribution of content and the lack of global knowledge. This paper surveys this interesting and still evolving research area, so that readers can easily obtain an overview of the state-of-the-art. We outline the objectives and the main principles that any distributed skyline approach has to fulfill, leading to useful guidelines for developing algorithms for distributed skyline processing. We review in detail existing approaches that are applicable for highly distributed environments, clarify the assumptions of each approach, and provide a comparative performance analysis. Moreover, we study the skyline variants each approach supports. Our analysis leads to a taxonomy of existing approaches. Finally, we present interesting research topics on distributed skyline computation that have not yet been explored.

international world wide web conferences | 2014

Partout: a distributed engine for efficient RDF processing

Luis Galárraga; Katja Hose; Ralf Schenkel

The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications relying on efficient query processing. Confronted with such a trend, existing centralized state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for fast RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log and allocating the fragments to hosts in a cluster of machines. Furthermore, Partouts query optimizer produces efficient query execution plans for ad-hoc SPARQL queries.

extended semantic web conference | 2011

FedX: a federation layer for distributed query processing on linked open data

Andreas Schwarte; Peter Haase; Katja Hose; Ralf Schenkel; Michael Schmidt

Driven by the success of the Linked Open Data initiative todays Semantic Web is best characterized as a Web of interlinked datasets. Hand in hand with this structure new challenges to query processing are arising. Especially queries for which more than one data source can contribute results require advanced optimization and evaluation approaches, the major challenge lying in the nature of distribution: Heterogenous data sources have to be integrated into a federation to globally appear as a single repository. On the query level, though, techniques have to be developed to meet the requirements of efficient query computation in the distributed setting.We present FedX, a project which extends the Sesame Framework with a federation layer that enables efficient query processing on distributed Linked Open Data sources. We discuss key insights to its architecture and summarize our optimization techniques for the federated setting. The practicability of our system will be demonstrated in various scenarios using the Information Workbench.

conference on information and knowledge management | 2006

Processing relaxed skylines in PDMS using distributed data summaries

Katja Hose; Christian Lemke; Kai-Uwe Sattler

Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by applying approximate query answering techniques can significantly reduce costs. Since most users are not interested in the exact answers to their queries, rank-aware query operators like top-k or skyline play an important role in query processing. In this paper, we present the novel concept of relaxed skylines that combines the advantages of both rank-aware query operators and approximate query processing techniques. Furthermore, we propose a strategy for processing relaxed skylines in distributed environments that allows for giving guarantees for the completeness of the result using distributed data summaries as routing indexes.

World Wide Web | 2011

Comparing data summaries for processing live queries over Linked Data

Jürgen Umbrich; Katja Hose; Marcel Karnstedt; Andreas Harth; Axel Polleres

A growing amount of Linked Data—graph-structured data accessible at sources distributed across the Web—enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection—determining which sources contribute answers to a query—is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.

international conference on data engineering | 2013

WARP: Workload-aware replication and partitioning for RDF

Katja Hose; Ralf Schenkel

With the increasing popularity of the Semantic Web, more and more data becomes available in RDF with SPARQL as a query language. Data sets, however, can become too big to be managed and queried on a single server in a scalable way. Existing distributed RDF stores approach this problem using data partitioning, aiming at limiting the communication between servers and exploiting parallelism. This paper proposes a distributed SPARQL engine that combines a graph partitioning technique with workload-aware replication of triples across partitions, enabling efficient query execution even for complex queries from the workload. Furthermore, it discusses query optimization techniques for producing efficient execution plans for ad-hoc queries not contained in the workload.

european semantic web conference | 2015

FrameBase: Representing N-Ary Relations Using Semantic Frames

Jacobo Rouces; Gerard de Melo; Katja Hose

Large-scale knowledge graphs such as those in the Linked Data cloud are typically represented as subject-predicate-object triples. However, many facts about the world involve more than two entities. While n-ary relations can be converted to triples in a number of ways, unfortunately, the structurally different choices made in different knowledge sources significantly impede our ability to connect them. They also make it impossible to query the data concisely and without prior knowledge of each individual source. We present FrameBase, a wide-coverage knowledge-base schema that uses linguistic frames to seamlessly represent and query n-ary relations from other knowledge bases, at different levels of granularity connected by logical entailment. It also opens possibilities to draw on natural language processing techniques for querying and data mining.

Explore More