Anja Theobald | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anja Theobald is active.

Explore More

Publication

Featured researches published by Anja Theobald.

extending database technology | 2002

The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

Anja Theobald; Gerhard Weikum

Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

extending database technology | 2004

HOPI: An Efficient Connection Index for Complex XML Document Collections

Ralf Schenkel; Anja Theobald; Gerhard Weikum

In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.

international workshop on the web and databases | 2000

Adding Relevance to XML

Anja Theobald; Gerhard Weikum

XML query languages proposed so far are limited to Boolean retrieval in the sense that query results are sets of qualifying XML elements or subgraphs. This search paradigm is intriguing for closed collections of XML documents such as e-commerce catalogs, but we argue that it is inadequate for searching the Web where we would prefer ranked lists o results based on relevance estimation. IR-style Web search engines, on the other hand, are incapable of exploiting the additional information made explicit in the structure, element names, and attributes of XML documents. In this paper we present a compact query language, coined XXL for flexible XML search language, that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results. The paper describes the language design, sketches implementation issues, and presents preliminary experimental results.

international conference on data engineering | 2005

Efficient creation and incremental maintenance of the HOPI index for complex XML document collections

Ralf Schenkel; Anja Theobald; Gerhard Weikum

The HOPI index, a connection index for XML documents based on the concept of a 2-hop cover, provides space- and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates.

Information Retrieval | 2005

Semantic Similarity Search on Semistructured Data with the XXL Search Engine

Ralf Schenkel; Anja Theobald; Gerhard Weikum

Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

Lecture Notes in Computer Science | 2003

Ontology-Enabled XML Search

Ralf Schenkel; Anja Theobald; Gerhard Weikum

XML is rapidly evolving towards the standard for data integration and exchange over the Internet and within intranets, covering the complete spectrum from largely unstructured, ad hoc documents to highly structured, schematic data. However, established XML query languages like XML-QL [96] or XQuery [34] cannot cope with the rapid growth of information in open environments such as the Web or intranets of large corporations, as they are bound to boolean retrieval and do not provide any relevance ranking for the (typically numerous) results. Recent approaches such as XIRQL [128] or our own system XXL [295, 296] that are driven by techniques from information retrieval overcome the latter problem by considering the relevance of each potential hit for the query and returning the results in a ranked order, using similarity measures like the cosine measure. But they are still tied to keyword queries, which is no longer appropriate for highly heterogeneous XML data from different sources, as it is the case in the Web or in large intranets.

international conference on conceptual modeling | 2004

Query Refinement by Relevance Feedback in an XML Retrieval System

Hanglin Pan; Anja Theobald; Ralf Schenkel

In recent years, ranked retrieval systems for heterogeneous XML data with both structural search conditions and keyword conditions have been developed for digital libraries, federations of scientific data repositories, and hopefully portions of the ultimate Web. These systems, such as XXL [2], are based on pre-defined similarity measures for atomic conditions (using index structures on contents, paths and ontological relationships) and then use rank aggregation techniques to produce ranked result lists. An ontology can play a positive role for term expansion [2], by improving the average precision and recall in the INEX 2003 benchmark [3].

Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), 9. GI-Fachtagung, | 2001

Ähnlichkeitssuche auf XML-Daten

Sergej Sizov; Anja Theobald; Gerhard Weikum

Anfragesprachen fur XML, wie z.B. XPATH oder XML-QL, unterstutzen Boolesches Retrieval; Anfrageergebnisse sind dabei ungeordnete Mengen von XML-Elementen, die die regularen Suchmuster einer Anfrage erfullen. Dieses Suchparadigma ist fur stark schematisierte, “geschlossene“ XML-Dokumentkollektionen, z.B. elektronische Kataloge, geeignet. Fur die Suche nach Informationen im World Wide Web oder in “offenen“ Umgebungen, z.B. Intranets groser Unternehmen, ist jedoch Ranked Retrieval vorzuziehen; Anfrageergebnisse sind dabei Ranglisten von XML- Elementen, die nach absteigender Relevanz sortiert sind. Web-Suchmaschinen, die auf Information-Retrieval-Konzepten basieren, sind andererseits nicht in der Lage, die zusatzlichen Informationen, die sich aus der Struktur von XML-Dokumenten und der semantischen Annotation durch Elementnamen ergeben, effektiv auszunutzen. Im vorliegenden Beitrag werden Konzepte vorgestellt, die die Suchmoglichkeiten von XML-Anfragesprachen mit Ranked Retrieval verbinden. Insbesondere werden Moglichkeiten diskutiert, wie das Suchen auf XML-Daten mit Hilfe von Ontologien und speziellen Indexstrukturen in seiner Effektivitat und Effizienz verbessert werden kann. Die vorgestellten Konzepte werden in der laufenden Implementierung der Anfragesprache XXL verwendet.

Archive | 2004

Die XXL—Suchmaschine zur ontologiebasierten Ähnlichkeitssuche in XML—Dokumenten

Anja Theobald; Gerhard Weikum; Norbert Fuhr

The effective and efficient information retrieval in large sets of semistructured data using the XML format is the main theme of this thesis. This thesis presents the XXL search engine, which executes queries formulated in the XML query language XXL. An XXL query consists of search conditions on the structure and search conditions on the content of XML documents. The result is a ranked result list in descending order of relevance, where a result can be a relevant XML document or only the relevant part of an XML document. The relevance–based query evaluation uses methods from the vector space model and semantic knowledge from a quantified ontology. For this purpose, we combine database technologies and methods from information retrieval to improve the quality of search results in comparison to traditional keyword–based text retrieval. The presented concepts have been implemented and exhaustively evaluated.

international conference on management of data | 2002