Apostolos Antonacopoulos
University of Salford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Apostolos Antonacopoulos.
international conference on document analysis and recognition | 2003
Apostolos Antonacopoulos; Stefan Pletschacher; David Bridson; Christos Papadopoulos
This paper presents an objective comparative evalua-tion of layout analysis methods in realistic circumstances. It describes the Page Segmentation competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2009 and presents the results of the evaluation of four submitted methods. Two state-of-the art methods are also compared as well as the three methods from the ICDA2007 Page Segmentation competition. The results indicate that although methods continue to mature, there is still a considerable need to develop robust methods that deal with everyday documents.
international conference on document analysis and recognition | 2007
Basilios Gatos; Apostolos Antonacopoulos; Nikolaos Stamatopoulos
This paper presents the results of the handwriting segmentation contest that was organized in the context of ICDAR2007. The aim of this contest was to use well established evaluation practices and procedures in order to record recent advances in off-line handwriting segmentation. Two benchmarking datasets (one for text line and one for word segmentation) were used in a common evaluation platform in order to test and compare all submitted algorithms for handwritten document segmentation in realistic circumstances. The results of the evaluation of five algorithms submitted by participants as well as of two state-of-the-art algorithms are presented. The performance evaluation method is based on counting the number of matches between the text lines or words detected by the algorithms and the text line or words of the ground truth.
international conference on document analysis and recognition | 2007
Apostolos Antonacopoulos; Basilios Gatos; David Bridson
There is an established need for objective evaluation of layout analysis methods, in realistic circumstances. This paper describes the page segmentation competition (modus operandi, dataset and evaluation criteria) held in the context of ICDAR2005 and presents the results of the evaluation of four candidate methods. The main objective of the competition was to compare the performance of such methods using scanned documents from commonly-occurring publications. The results indicate that although methods seem to be maturing, there is still a considerable need to develop robust methods that deal with everyday documents.
international conference on pattern recognition | 1994
Apostolos Antonacopoulos; R. T. Ritchings
This paper introduces a new method for document page segmentation. This method is based on the analysis of the background white space that surrounds the printed regions on the page. It does not make any assumptions about the shape of the regions as opposed to most earlier approaches which assume that printed regions are rectangular. It is capable of identifying and describing regions of complex shapes more accurately than existing methods. It requires no a priori knowledge. The background white space is covered with tiles and the contour of each region is identified by tracing through these white tiles that encircle it. The method can segment page images with severe skew without skew correction. The white tiles on the image can also be used in subsequent document analysis processes such as the classification of the image regions.
Computer Vision and Image Understanding | 1998
Apostolos Antonacopoulos
There is an ever increasing number of publications which do not have the “traditional” layout where printed regions are rectangular. Text paragraphs and areas of graphic type may be of any shape, individually rotated and in any arrangement. Previous document analysis techniques are not well suited to such complex layouts. This paper introduces a new method for the segmentation of images of document pages having both traditional and complex layouts. The underlining idea is to efficiently produce a flexible description (by means of tiles) of the background space which surrounds the printed regions in the page image under all the above conditions. Using this description of space, the contours of printed regions are identified with significant accuracy. The new approach is fast as there is no need for skew detection and correction, and only few simple operations are performed on the description of the background (not on the pixel-based data).
international conference on document analysis and recognition | 2011
Christian Clausner; Stefan Pletschacher; Apostolos Antonacopoulos
Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms of productivity. Ground truth is not only crucial for training and evaluation at the development stage of tools but also for quality assurance in the scope of production workflows for digital libraries. This paper describes Aletheia, an advanced system for accurate and yet cost-effective ground truthing of large amounts of documents. It aids the user with a number of automated and semi-automated tools which were partly developed and improved based on feedback from major libraries across Europe and from their digitisation service providers which are using the tool in a production environment. Novel features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. Special features have been developed to support working with the complexities of historical documents. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth.
international conference on pattern recognition | 2010
Stefan Pletschacher; Apostolos Antonacopoulos
There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
International Journal on Document Analysis and Recognition | 2007
Apostolos Antonacopoulos; Andy C. Downton
There is an increasing need to digitally preserve and provide access to historical document collections residing (and possibly decaying) in libraries, museums and archives. Documents range from ancient manuscripts, through early printed books, to typewritten administrative documents of the twentieth century. A common thread is that the documents are typically valued for their physical appearance as much as their content. The documents to be analysed can be originals (paper, parchment, etc.) or in image form (already scanned, possibly using now outdated technology). The key requirement is to be able to process these unique manuscripts, whether they are presented as free flowing text (e.g., treatises and novels) or structured at different levels of physical-logical structure correspondence (e.g., letters, census lists, trade forms). Degradation may be caused by a lifetime of use and physical deterioration. In addition to the original content, access must also be preserved to user annotations and corrections, stamps and unique artwork. Each class of document requires a different approach throughout the conversion process and lends itself to different levels of information extraction and description. As the application of existing technology to the analysis of historical documents exposes a myriad of weaknesses, novel and more robust methods are being developed to cope with this challenging problem. The issues involved in the analysis of historical documents are highly topical, as is evident from
First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings. | 2004
Apostolos Antonacopoulos; Dimosthenis Karatzas
Complete collections of invaluable documents of unique historical and political significance are decaying and at the same time they are virtually inaccessible, necessitating the invention of robust and efficient methods for their conversion into a searchable electronic form. We present the issues encountered and problems addressed in the MEMORIAL project, whose goal is the establishment of a digital document workbench enabling the creation of distributed virtual archives based on documents existing in libraries, archives, museums, memorials, and public record offices. Successful approaches are described in the context of the chosen data class: a variety of typewritten documents containing personal information relating to the presence of individuals in World War II Nazi concentration camps.
international conference on document analysis and recognition | 2011
Apostolos Antonacopoulos; Christian Clausner; Christos Papadopoulos; Stefan Pletschacher
This paper presents an objective comparative evaluation of layout analysis methods for scanned historical documents. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2011 and the International Workshop on Historical Document Imaging and Processing (HIP2011), presenting the results of the evaluation of four submitted methods. A commercial state-of-the-art system is also evaluated for comparison. Two scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions and the other evaluating the whole pipeline of segmentation and region classification (with a text extraction goal). The results indicate that there is a convergence to a certain methodology with some variations in the approach. However, there is still a considerable need to develop robust methods that deal with the idiosyncrasies of historical documents.