Is this you? Create Your Porfile

Marcus Herzog

Vienna University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcus Herzog is active.

Explore More

Publication

Featured researches published by Marcus Herzog.

symposium on principles of database systems | 2004

The Lixto data extraction project: back and forth between theory and practice

Georg Gottlob; Christoph T. Koch; Robert Baumgartner; Marcus Herzog; Sergio Flesca

We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the projects main motivations and ideas, in particular the use of a logic-based framework for wrapping. Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system. These results include both a characterization of the expressive power and the complexity of these languages. We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping. We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic-based languages for wrapping. Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages. This is a natural requirement in complex services based on Web wrapping. Finally, we discuss industrial applications of Lixto and point to open problems for future study.

international world wide web conferences | 2005

Using visual cues for extraction of tabular data from arbitrary HTML documents

Bernhard Krüpl; Marcus Herzog; Wolfgang Gatterbauer

We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree, we also exploit visual cues in the rendered version of the document to extract data from tables which are not explicitly marked with an HTML table element. To detect tables, we rely on a variant of the well-known X-Y cut algorithm as used in the OCR community. We implemented the system by directly accessing Mozillas box model that contains the positional data for all HTML elements of a given web page.

very large data bases | 2009

Scalable web data extraction for online market intelligence

Robert Baumgartner; Georg Gottlob; Marcus Herzog

Online market intelligence (OMI), in particular competitive intelligence for product pricing, is a very important application area for Web data extraction. However, OMI presents non-trivial challenges to data extraction technology. Sophisticated and highly parameterized navigation and extraction tasks are required. On-the-fly data cleansing is necessary in order two identify identical products from different suppliers. It must be possible to smoothly define data flow scenarios that merge and filter streams of extracted data stemming from several Web sites and store the resulting data into a data warehouse, where the data is subjected to market intelligence analytics. Finally, the system must be highly scalable, in order to be able to extract and process massive amounts of data in a short time. Lixto (www.lixto.com), a company offering data extraction tools and services, has been providing OMI solutions for several customers. In this paper we show how Lixto has tackled each of the above challenges by improving and extending its original data extraction software. Most importantly, we show how high scalability is achieved through cloud computing. This paper also features a case study from the computers and electronics market.

european semantic web conference | 2005

The personal publication reader: illustrating web data extraction, personalization and reasoning for the semantic web

Robert Baumgartner; Nicola Henze; Marcus Herzog

This paper shows how Semantic Web technologies enable the design and implementation of advanced, personalized information systems. We demonstrate by means of an example application how personalized content syndication can be realized in the Semantic Web. Our approach consists of two main parts: The web data extraction part, providing the information system with real-time, dynamic data, and the personalization part, which deduces – with the aid of ontological domain knowledge – personalized views on the data. The prototype of the system has been realized using the Personal Reader Framework for designing, implementing, and maintaining Web content Readers.

Lecture Notes in Computer Science | 2001

InfoPipes: A Flexible Framework for M-Commerce Applications

Marcus Herzog; Georg Gottlob

M-Commerce applications are E-Commerce applications having at least at one end a mobile terminal. Therefore M-Commerce applications share a number of properties with E-Commerce applications while adding additional burdens on the application developer. In this paper we present a conceptual model of an application framework that provides services at the core of M-Commerce applications. We will also present an implementation of this framework and discuss the properties of the information processing involved.

international world wide web conferences | 2006

Visually guided bottom-up table detection and segmentation in web documents

Bernhard Krüpl; Marcus Herzog

In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. Our algorithm works bottom-up by grouping word bounding boxes into larger groups and uses a set of heuristics. It has already been implemented and a preliminary evaluation on about 6000 Web documents has been carried out.

symposium on applications and the internet | 2004

Interactively adding Web service interfaces to existing Web applications

Robert Baumgartner; Georg Gottlob; Marcus Herzog; Wolfgang Slany

The World Wide Web is the largest database of the world - a huge amount of knowledge, information and services primarily intended for human users. Unfortunately, data on the Web requires intelligent interpretation and cannot be easily used by programs. It requires advanced data extraction and integration techniques to automatically process data. Lixto technology addresses these issues and enables developers to interactively turn Web pages into Web services.

Lecture Notes in Computer Science | 2005

Information extraction for the semantic web

Robert Baumgartner; Thomas Eiter; Georg Gottlob; Marcus Herzog; Christoph T. Koch

The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.

international conference on computational science and its applications | 2005

Semantic web enabled information systems: personalized views on web data

Robert Baumgartner; Christian Enzi; Nicola Henze; Marc Herrlich; Marcus Herzog; Matthias Kriesell; Kai Tomaschewski

In this paper a methodology and a framework for personalized views on data available on the World Wide Web are proposed. We describe its main two ingredients, Web data extraction and ontology-based personalized content presentation. We exemplify the usage of these methodologies with a sample application for personalized publication browsing.

Search Computing | 2010

Chapter 6: web data extraction for service creation

Robert Baumgartner; Alessandro Campi; Georg Gottlob; Marcus Herzog

Web data extraction is an enabling technique in the search computing scenario. In this chapter, we first review the state of the art in wrapper technologies focusing on how wrapper generators can be used to create unified services that integrate data from Web Applications and Web services in various domains. Next, we describe the Lixto approach and we present the Lixto Suite as one example of Web Process Integration. Finally, application areas and future challenges and the usage of wrapper technologies in the search computing context is discussed.

Explore More