Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christian Clausner is active.

Publication


Featured researches published by Christian Clausner.


international conference on document analysis and recognition | 2011

Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments

Christian Clausner; Stefan Pletschacher; Apostolos Antonacopoulos

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms of productivity. Ground truth is not only crucial for training and evaluation at the development stage of tools but also for quality assurance in the scope of production workflows for digital libraries. This paper describes Aletheia, an advanced system for accurate and yet cost-effective ground truthing of large amounts of documents. It aids the user with a number of automated and semi-automated tools which were partly developed and improved based on feedback from major libraries across Europe and from their digitisation service providers which are using the tool in a production environment. Novel features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. Special features have been developed to support working with the complexities of historical documents. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth.


international conference on document analysis and recognition | 2011

Historical Document Layout Analysis Competition

Apostolos Antonacopoulos; Christian Clausner; Christos Papadopoulos; Stefan Pletschacher

This paper presents an objective comparative evaluation of layout analysis methods for scanned historical documents. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2011 and the International Workshop on Historical Document Imaging and Processing (HIP2011), presenting the results of the evaluation of four submitted methods. A commercial state-of-the-art system is also evaluated for comparison. Two scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions and the other evaluating the whole pipeline of segmentation and region classification (with a text extraction goal). The results indicate that there is a convergence to a certain methodology with some variations in the approach. However, there is still a considerable need to develop robust methods that deal with the idiosyncrasies of historical documents.


international conference on document analysis and recognition | 2011

Scenario Driven In-depth Performance Evaluation of Document Layout Analysis Methods

Christian Clausner; Stefan Pletschacher; Apostolos Antonacopoulos

This paper presents an advanced framework for evaluating the performance of layout analysis methods. It combines efficiency and accuracy by using a special interval based geometric representation of regions. A wide range of sophisticated evaluation measures provides the means for a deep insight into the analysed systems, which goes far beyond simple benchmarking. The support of user-defined profiles allows the tuning for practically any kind of evaluation scenario related to real world applications. The framework has been successfully delivered as part of a major EU-funded project (IMPACT) to evaluate large-scale digitisation projects and has been validated using the dataset from the ICDAR2009 Page Segmentation Competition.


international conference on document analysis and recognition | 2013

ICDAR 2013 Competition on Historical Newspaper Layout Analysis (HNLA 2013)

Apostolos Antonacopoulos; Christian Clausner; Christos Papadopoulos; Stefan Pletschacher

This paper presents an objective comparative evaluation of layout analysis methods for scanned historical newspapers. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2013 and the 2nd International Workshop on Historical Document Imaging and Processing (HIP2013), presenting the results of the evaluation of five submitted methods. Two state-of-the-art systems, one commercial and one open-source, are also evaluated for comparison. Two scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions and the other evaluating the whole pipeline of segmentation and region classification (with a text extraction goal). The results indicate that there is a convergence to a certain methodology with some variations in the approach. However, there is still a considerable need to develop robust methods that deal with the idiosyncrasies of historical newspapers.


international conference on document analysis and recognition | 2015

ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015

Apostolos Antonacopoulos; Christian Clausner; Christos Papadopoulos; Stefan Pletschacher

This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2015, presenting the results of the evaluation of eight methods - four submitted, two state-of-the-art systems (one commercial and one open-source) and their two immediately previous versions. Three scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions and two evaluating both segmentation and region classification (one with emphasis on text and the other focusing only on text). The results indicate that an innovative approach has a clear advantage but there is still a considerable need to develop robust methods that deal with layout challenges, especially with the non-text content.


international conference on document analysis and recognition | 2013

ICDAR 2013 Competition on Historical Book Recognition (HBR 2013)

Apostolos Antonacopoulos; Christian Clausner; Christos Papadopoulos; Stefan Pletschacher

This paper presents an objective comparative evaluation of layout analysis and recognition methods for scanned historical books. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2013 and the 2nd International Workshop on Historical Document Imaging and Processing (HIP2013), presenting the results of the evaluation of five methods - three submitted and two state-of-the-art systems (one commercial and one open-source). Three scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions, one evaluating segmentation and region classification (with a text extraction goal) and the other the whole pipeline including recognition. The results indicate that there is a convergence to a certain methodology, in terms of layout analysis, with some variations in the approach. However, there is still a considerable need to develop robust methods that deal with the idiosyncrasies of historical books, especially for OCR.


Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013

The IMPACT dataset of historical document images

Christos Papadopoulos; Stefan Pletschacher; Christian Clausner; Apostolos Antonacopoulos

Representative and comprehensive datasets are a prerequisite for any research activity, from studying specific types of problems through training of algorithms to evaluating results of actual implementations. This paper describes an invaluable resource which is the result of a large scale effort undertaken in the EU funded project IMPACT. A number of challenges faced during the creation phase but also the significant benefits and potential of this collection of printed historical documents are described. The dataset contains over 600,000 document images that originate from major European libraries and are representative of both their respective holdings and digitisation plans for the near to medium term. It is truly unique with regard to the very substantial amount of high-quality ground truth which is available for approximately 45,000 pages, capturing detailed layout, reading order and text content. The dataset is publicly available through the IMPACT Centre of Competence (www.digitisation.eu).


Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing | 2015

Europeana Newspapers OCR Workflow Evaluation

Stefan Pletschacher; Christian Clausner; Apostolos Antonacopoulos

This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed on a representative dataset of historical newspaper pages (for which ground truth was created) with regard to general text accuracy as well as layout-related factors which have an impact on how the material can be used in specific use scenarios. Specific types of errors are examined and evaluated in order to identify possible improvements related to the employed document image analysis and recognition methods. Moreover, alternatives to the standard production workflow are assessed to determine future directions and give advice on best practice related to OCR projects.


Proceedings of the 2011 Workshop on Historical Document Imaging and Processing | 2011

Grid-based modelling and correction of arbitrarily warped historical document images for large-scale digitisation

Po Yang; Apostolos Antonacopoulos; Christian Clausner; Stefan Pletschacher

Historical document images frequently show evidence of geometric distortions mostly due to storage conditions (arbitrary warping) but also due to the original printing process (non-straight text lines), the use of the document (folds) and scanning method (page curl). Correcting such distortions improves both recognition rate and visual appearance (e.g. for easier human reading or on-demand printing). However, the nature of the documents with layout irregularities and broken/touching characters of archaic fonts poses significant challenges. In addition, for large-scale digitisation of books and newspapers, methods need to be robust, efficient, reversible and must be able to be applied unsupervised on (possibly multi-columned) documents that may or may not be warped (no distortion should be introduced on unwarped images). No such method exists in the literature. In this paper, an effective grid-based method is presented to geometrically model and correct arbitrarily warped historical documents with relatively complex layout (multi column with graphics). A global grid with sub-grids for differing parts of a page is constructed by accurately determining text baselines. The warped image is corrected by transforming each quadrilateral sub-grid of the global grid into its intended rectangular form. Preliminary experimental results show that this method efficiently corrects arbitrarily warped historical documents, with an improved performance over a leading geometric correction method and the industry standard commercial system.


international conference on document analysis and recognition | 2015

The ENP image and ground truth dataset of historical newspapers

Christian Clausner; Christos Papadopoulos; Stefan Pletschacher; Apostolos Antonacopoulos

This paper presents a research dataset of historical newspapers comprising over 500 page images, uniquely representative of European cultural heritage from the digitization projects of 12 national and major European libraries, created within the scope of the large-scale digitisation Europeana Newspapers Project (ENP). Every image is accompanied by comprehensive ground truth (Unicode encoded full-text, layout information with precise region outlines, type labels, and reading order) in PAGE format and searchable metadata about document characteristics and artefacts. The first part of the paper describes the nature of the dataset, how it was built, and the challenges encountered. In the second part, a baseline for two state-of-the-art OCR systems (ABBYY FineReader Engine 11 and Tesseract 3.03) is given with regard to both text recognition and segmentation/layout analysis performance.

Collaboration


Dive into the Christian Clausner's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Po Yang

Liverpool John Moores University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ben Light

University of Salford

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun Qi

Liverpool John Moores University

View shared research outputs
Researchain Logo
Decentralizing Knowledge