Tim Furche
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tim Furche.
extending database technology | 2002
Dan Olteanu; Holger Meuss; Tim Furche; François Bry
The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath.
Lecture Notes in Computer Science | 2005
James Bailey; François Bry; Tim Furche; Sebastian Schaffert
A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion.
Lecture Notes in Computer Science | 2006
Tim Furche; Benedikt Linse; François Bry; Dimitris Plexousakis; Georg Gottlob
This article is firstly an introduction into query languages for the Semantic Web, secondly an in-depth comparison of the languages introduced. Only RDF query languages are considered because, as of the writing of this paper, query languages for other Semantic Web data modeling formalisms, especially OWL, are still an open research issue, and only a very small number of, furthermore incomplete, proposals for querying Semantic Web data modeled after other formalisms than RDF exist. The limitation to a few RDF query languages is motivated both by the objective of an in-depth comparison of the languages addressed and by space limitations. During the three years before the writing of this article, more than three dozen proposals for RDF query languages have been published! Not only such a large number, but also the often immature nature of the proposals makes the focus on few, but representative languages a necessary condition for a non-trivial comparison.
very large data bases | 2013
Tim Furche; Georg Gottlob; Giovanni Grasso; Christian Schallhart; Andrew Jon Sellers
The evolution of the web has outpaced itself: A growing wealth of information and increasingly sophisticated interfaces necessitate automated processing, yet existing automation and data extraction technologies have been overwhelmed by this very growth. To address this trend, we identify four key requirements for web data extraction, automation, and (focused) web crawling: (1) interact with sophisticated web application interfaces, (2) precisely capture the relevant data to be extracted, (3) scale with the number of visited pages, and (4) readily embed into existing web technologies. We introduce OXPath as an extension of XPath for interacting with web applications and extracting data thus revealed—matching all the above requirements. OXPath’s page-at-a-time evaluation guarantees memory use independent of the number of visited pages, yet remains polynomial in time. We experimentally validate the theoretical complexity and demonstrate that OXPath’s resource consumption is dominated by page rendering in the underlying browser. With an extensive study of sublanguages and properties of OXPath, we pinpoint the effect of specific features on evaluation performance. Our experiments show that OXPath outperforms existing commercial and academic data extraction tools by a wide margin.
International Journal on Semantic Web and Information Systems | 2005
François Bry; Christoph T. Koch; Tim Furche; Sebastian Schaffert; Liviu Badea; Sacha Berger
A decade of experience with research proposals as well as standardized query languages for the conventional Web and the recent emergence of query languages for the Semantic Web call for a reconsideration of design principles for Web and Semantic Web query languages. This chapter first argues that a new generation of versatile Web query languages is needed for solving the challenges posed by the changing Web: We call versatile those query languages able to cope with both Web and Semantic Web data expressed in any (Web or Semantic Web) markup language. This chapter further suggests that well-known referential transparency and novel answer-closedness are essential features of versatile query languages. Indeed, they allow queries to be considered like forms and answers like form-fillings in the spirit of the query-by-example paradigm. This chapter finally suggests that the decentralized and heterogeneous nature of the Web requires incomplete data specifications (or incomplete queries) and incomplete data selections (or incomplete answers); the form-like query can be specified without precise knowledge of the queried data, and answers can be restricted to contain only an excerpt of the queried data.
international conference on data engineering | 2005
François Bry; Fatih Coskun; Serap Durmaz; Tim Furche; Dan Olteanu; Markus Spannagel
Data streams are an emerging technology for data dissemination in cases where the data throughput or size makes it unfeasible to rely on the conventional approach based on storing the data before processing it. SPEX evaluates XPath queries against XML data streams. SPEX is built upon formal frameworks for (1) rewriting XPath queries into equivalent XPath queries without reverse axes and (2) correct query evaluation with polynomial combined complexity using networks of pushdown transducers. Such transducers are simple, independent, and can be connected in a flexible manner, thus allowing not only easy extensions but also extensive query optimization. Querying XML streams with SPEX consists in four steps: first, the input XPath query is rewritten into an XPath query without reverse axes. Second, the forward XPath query is compiled into a logical query plan abstracting out details of the concrete XPath syntax. Then, a physical query plan is generated by extending the logical query plan with operators for determination and collection of answers. In the last step, the XML stream is processed continuously with the physical query plan, and the output stream conveying the answers to the original query is generated progressively.
acm symposium on applied computing | 2004
Dan Olteanu; Tim Furche; François Bry
Data streams might be preferable to data stored in memory in contexts where the data is too large or volatile, or a standard approach to data processing based on data parsing and/or storing is too time or space consuming. Emerging applications such as publish-subscribe systems, data monitoring in sensor networks [6], financial and traffic monitoring, and routing of MPEG-7 [7] call for querying data streams. In many such applications, XML streams are arguably more appropriate than flat data streams, for XML data is record-like, though not precluding multiple occurrences of fields with the same name. Evaluating selection queries against XML streams is especially challenging because XML data is structured (like records) and might have unbounded size.This paper proposes an efficient single-pass evaluator of XPath queries against XML data streams unbounded (possibly infinite) in size. The evaluator is based on networks of independent deterministic pushdown transducers and it is especially suitable for implementation on devices with low-memory and simple logic as used, e.g., in mobile computing.
british national conference on databases | 2004
Dan Olteanu; Tim Furche; François Bry
Querying XML streams is receiving much attention due to its growing range of applications from traffic monitoring to routing of media streams. Existing approaches to querying XML streams consider restricted query language fragments, in most cases with exponential worst-case complexity in the size of the query. This paper gives correctness and complexity results for a query evaluator against XML streams called SPEX [8]. Its combined complexity is shown to be polynomial in the size of the data and the query. Extensive experimental evaluation with a prototype confirms the theoretical complexity results.
international world wide web conferences | 2012
Tim Furche; Georg Gottlob; Giovanni Grasso; Xiaonan Guo; Giorgio Orsi; Christian Schallhart
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design patterns. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches by a significant margin. For form interpretation, we introduce a template language to describe frequent form patterns. These two parts of OPAL combined yield form understanding with near perfect accuracy (> 98%).
web intelligence, mining and semantics | 2011
Tim Furche; Georg Gottlob; Giovanni Grasso; Xiaonan Guo; Giorgio Orsi; Christian Schallhart
Finding an apartment is a lengthy and tedious process. Once decided, one can never be sure not to have missed an even better offer which would have been just one click away. Form understanding is key to automatically access and process all the relevant---and nowadays readily available---data. We introduce opal (ontology-based web pattern analysis with logic), a novel, purely logical approach to web form understanding: opal labels, structures, and groups form fields according to a domain-specific ontology linked through phenomenological rules to a logical representation of a DOM. The phenomenological rules describe how ontological concepts appear on the web; the ontology formalizes and structures common patterns of web pages observed in a domain. A unique feature of opal is that all domain-independent assumptions about web forms are represented in rules, whereas domain-specific assumptions are represented in the ontology. This yields a coherent logical framework, robust in face of changing web trends. We apply opal to a significant, randomly selected sample of UK real estate sites, showing that straightforward rules suffice to achieve high precision form understanding.